Apache Kafka is a cornerstone of many streaming data projects. However, it is only the first step in the potentially long and arduous process of transforming streams into workable, structured data. How should you design the rest of your data architecture to build a scalable, cost effective solution for working with Kafka data? Let’s look at two approaches - reading directly from Kafka vs creating a data lake - and understand when and how you should use each.
The following article is an abridged version of our new guide to Data Lakes and Data Lake Platforms - get the full version for free here.
If you’re working with data in any capacity, you should be familiar with Data Lakes. Even if you don’t need one today, the rapid growth of data and demand for increasingly versatile analytic use cases (such as reporting, machine learning, and predictive analytics) could result in your organization outgrowing its data infrastructure much sooner than you currently foresee.
If you only read the bombastic headlines, you might be forgiven for thinking that Big Data is the name of a real-life superhero: fighting crime, busting traffic jams and even curing diseases. But when you work with data for a living, you quickly find out that underneath the shiny facade, ‘doing big data’ is also a major pain.
Mentioning the words “migration” or “database refactoring” to a typical DBA is unlikely to help you make new friends. Most organizations are extremely averse to changing their data infrastructure, which is often assumed to be a long, arduous and expensive process. Well, It doesn’t have to be this way, since with the right tools you can build a scalable big data infrastructure on AWS in just a week or two; still, change is always scary.
In this post we will provide a step-by-step implementation of an application that auto-completes Twitter hashtags, based on the most popular hashtags seen in the last 7 days. For example, when a user will write the text ‘trum’ in the auto-complete application, the application will probably suggest Trump related hashtags, since that’s a trending topic these days.