High Availability Streaming Across Multiple Data Centers – The Double Lambda Architecture
Dr. Robert Neumann talks about how streaming applications or the ability to process large amounts of data in real time is drastically gaining momentum in the IT landscape. More and more companies start developing their own streaming applications using a variety of tools and frameworks. Simple applications are built quickly, but with rising complexity, a lot of architectural questions come up.
Large-scale streaming applications impose various architectural, as well as operational challenges when high availability and zero downtime are required. A suitable architecture needs to be capable of tolerating partial or total loss of streaming components within one or more data centers. It needs to allow for seamlessly upgrading and patching of streaming components without affecting the system uptime. In short, it needs to be “always-on, always available”.
The largest customer implementation Ultra Tendency has implemented with Cloudera Technology has been operating as a critical production environment across multiple data centers for over 3.5 years now and has resulted in zero downtime. It even managed to exceed and beat the overall availability of the customer’s mainframe environment – which no other system has managed to do before.
In this webcast attendees will learn:
- How a Double Lambda Architecture outperforms the availability and robustness of an architecture that stretches components over multiple data centers.
- How to build an active-active Double Lambda Architecture based on Cloudera and deploy it to multiple data centers
- How to keep streaming logic consistent (idempotent), so that multiple concurrent streaming job replicas do not create duplicates
- How to deal with message delivery semantics when operating two Kafka clusters using the Kafka Parallel Producer
- How to create consistency on data consumption level using the HBase Multi Cluster Client (MCC)
- How Atlas tracks lineage across all active data centers, thereby providing a full lineage view of your data across all environments
- How to spot, track, and avoid bottlenecks in the Double Lambda Architecture (SMM)
- How to centrally adjust access controls using a single pane of glass (Ranger).