Tuning Kafka for Consistency

Having worked with Kafka for almost a year and half , I wanted to share my thoughts on Kafka’s consistency guarantees and how to achieve them. I will be skipping over the details of basic configurations and suggestions like having multiple partitions & setting the replication factor to be more than one because those are…

Read More

Choosing the correct Flink Restart Strategy & avoiding Production Gotchas

Having spent the last year and half designing & developing real time data pipelines, here are my 2 cents on the production gotchas I encountered related to Flink’s Restart strategy which I wish I knew sooner. Also if you are new to stream system development, checkout my previous article where i wrote about the change…

Read More

Data Enrichment – Designing & Optimizing a Real Time Stream Joining Pipeline

In my previous article, I wrote my thoughts on the Paradigm Shift I underwent adapting to the idiosyncrasies of real time systems when compared to batch processing. Picking up from where I left off, this article focuses on applying all those concepts to build a realtime data enrichment pipeline which performs join & lookup in…

Read More

Undergoing the Paradigm Shift – Batch Processing to RealTime Streaming

On this article, I share my experiences on undergoing a paradigm shift from Batch based to RealTime stream based processing over the course of building a realtime data enrichment pipeline which performs join & lookup in real time. When the volume of data increases to a point that a single DB cant handle it in…

Read More