Tail Recursion in Scala
Since the time I started programming using Scala, I’ve vehemently tried to adopt the paradigms of functional programming as much as possible. Recently I had to work on a recursive piece of code and my architect suggested me to use…
Read MoreTuning Kafka for Consistency
Having worked with Kafka for almost a year and half , I wanted to share my thoughts on Kafka’s consistency guarantees and how to achieve them. I will be skipping over the details of basic configurations and suggestions like having…
Read MoreSynchronizing Multi Threaded code based on Object Value.
On my most recent coding escapades, I had to design & implement a webserver capable of handling a staggering number of requests concurrently. Now, I could have gone with an abstracted high level multi-threading framework like Akka or Vert.x but…
Read MoreChoosing the correct Flink Restart Strategy & avoiding Production Gotchas
Having spent the last year and half designing & developing real time data pipelines, here are my 2 cents on the production gotchas I encountered related to Flink’s Restart strategy which I wish I knew sooner. Also if you are…
Read MoreData Enrichment – Designing & Optimizing a Real Time Stream Joining Pipeline
In my previous article, I wrote my thoughts on the Paradigm Shift I underwent adapting to the idiosyncrasies of real time systems when compared to batch processing. Picking up from where I left off, this article focuses on applying all…
Read MoreUndergoing the Paradigm Shift – Batch Processing to RealTime Streaming
On this article, I share my experiences on undergoing a paradigm shift from Batch based to RealTime stream based processing over the course of building a realtime data enrichment pipeline which performs join & lookup in real time. When the…
Read MoreDesigning & Optimizing a Content Recommendation System using MapReduce
Last Month I had the opportunity to design a content recommendation system and I decided to go with BigData due to the scope of the problem. After successfully implementing it and running it on a training dataset consisting of around…
Read MoreMultiThreading in Applications .. featuring JDBC
Nearly all the developers who’ve dealt with database operations have at least once faced the the scenario where their code is essentially waiting for SQL queries to finish, which results in bit of a slowdown around these parts. Java Database Connectivity (JDBC)…
Read MoreDemystifying Yarn – Parallelism in Hadoop2
With the introduction of YARN in Hadoop2, the resource scheduling engine got a massive overhaul. Gone are the days of the older slot based system for MapReduce framework, which was removed and replaced with a container based system. Intended to…
Read MoreSmall files.. Big problem !! – Using CombineTextInputFormat to optimize MapReduce & handle large number of small input files.
The problem I am going to address in this article is not just limited to Hadoop Distributed File System (The storage system of BigData) , its prevalent across other platforms as well. When you have some data split up in…
Read MoreHadoop on Windows : Install BigData environment and run WordCounter on your Pc.
Being a Big-data developer, one of the biggest challenges I faced initially is installing and configuring a development system for trying out the Hadoop framework and for running my Hadoop code. The whole BigData architecture by nature is distributed across…
Read MoreAbstracting Exception propagation – Avoid throwing Model Layer SqlExceptions directly to Service layer.
Around 2 years ago when I started my career as a Java developer, one of the first things I learnt was the MVC design approach for developing a Web application. Without going into too much detail about the design paradigm, the basic…
Read MoreMaking Data Clustering 768x times faster !!
From 64 hours to 300 seconds !!!That’s the sort of runtime improvement I was able to achieve by using Arrays as the primary data structure & reducing the time complexity from n3mC1 to n*m in a clustering algorithm I improved. Now this…
Read MoreGet new content delivered directly to your inbox.