专题演讲嘉宾 :刘新宇

LinkedIn Staff Software Engineer

Staff software engineer at LinkedIn Data Infra team, and Apache Samza committer.

Experienced in large-scale distributed systems, real-time stream processing, massive messaging platform, web services and RESTful middleware.

Graduated with Master degree in Computer Science at the University of Virginia.

演讲:Secret Ingredients of Massive Scale Stream Processing with Apache Samza

时间:07月07日 13:30
地点:大宴会厅1
所属专题:大数据框架

Event processing is a race against time: a race where seconds or even milliseconds provide greater relevancy and accuracy of the results than hours or days.

To lead this race, we've been running 400+ Samza applications reliably in production over the past 5 years at LinkedIn, processing over 1 trillion events each day. So, what's the secret ingredients behind it?

In this talk we will inspect some of them:

  a) a fluent API that allows the user to focus on the processing logic without worrying about the execution details;

  b) versatile deployment models that allows us to run Samza applications in Yarn cluster, as well as clusters like AWS'EC2;

  c) durable local state that can scale large stateful applications with ease;

  d) asynchronous processing that enables remote data I/O to match the throughput of event consumption.

Finally, we will also explore patterns that allow us to run the same application in both nearline and offline.

参考翻译:

事件处理是场争分夺秒的比赛,需要在秒级甚至毫秒级内提供更好的关联性和精确性。

为了赢得这场比赛,在过去领英5年的生产环境中,我们已经稳定地运行了400+Samza 应用程序,每天处理超过万亿事件,在这一切背后,领英拥有怎样的秘籍呢?

在这次分享中,我们将观察 Samza 的以下特性:

a)一个流处理API,允许用户专注于处理逻辑,而不用担心执行细节;

b)灵活的部署模式,我们能够在Yarn集群或AWS'EC2集群中运行Samza应用;

c)在本地持久化中间状态,从而轻松应对大规模的有状态应用的扩展

d)支持异步处理,因此能够允许远程数据I/O支持整个事件消费产生的吞吐量。

e)最后,我们还会探索Samza作为一个平台如何兼容流处理和批处理两种模式。

 

本专题下其他演讲

关注主办方(InfoQ)

InfoQ Qcon

交通指南