<img height="1" width="1" style="display:none" src="https://www.facebook.com/tr?id=315693165909440&amp;ev=PageView&amp;noscript=1">


The Big Data Blog

Eran Levy

Director of Marketing at Upsolver
Find me on:

Recent Posts

11 Alternatives to Alooma on AWS for Streaming and App Data

Sep 18, 2019 3:37:58 PM / by Eran Levy


Read More

4 Key Components of a Streaming Data Architecture (with Examples)

Sep 11, 2019 3:42:06 PM / by Eran Levy

Streaming data is becoming a core component of enterprise data architecture due to the explosive growth of data from non-traditional sources such as IoT sensors, security logs and web applications.


Streaming technologies are not new, but they have considerably matured in recent years. The industry is moving from painstaking integration of open-source Spark/Hadoop frameworks, towards full stack solutions that provide an end-to-end streaming data architecture built on the scalability of cloud data lakes.


Want to see how leading organizations design their big data infrastructure?‌‌ Check out these 4 real-life examples of streaming architectures.

Read More

Batch, Stream, and Micro-batch Processing: A Cheat Sheet

Sep 5, 2019 4:48:25 PM / by Eran Levy posted in Batch processing, Streaming processing, Batch ETL, Amazon Redshift, Google BigQuery


Read More

Data Lake ETL for IoT Data: From Streams to Analytics

Aug 29, 2019 4:48:45 PM / by Eran Levy posted in Database, Event Streams, IoT, SQL, Data Lake ETL


Read More

Orchestrating Streaming and Batch ETL for Machine Learning

Aug 20, 2019 4:38:56 PM / by Eran Levy posted in Schemaless, ETL, Event Streams, Data Lake ETL, Data infrastructure, User personalization


Read More

Partitioning Data on S3 to Improve Performance in Athena/Presto

Aug 13, 2019 2:04:33 PM / by Eran Levy posted in Data Lake Platform, Amazon Athena, Data Lake ETL, AWS S3, Partitioning


Read More

Getting Data Lake ETL Right: 6 Guidelines for Evaluating Tools

Aug 6, 2019 2:52:11 PM / by Eran Levy posted in Database, Data Lake Platform, SQL, Data Lake ETL, Data infrastructure


Read More

Real-time Machine Learning: Hype vs Reality

Jul 23, 2019 2:38:27 PM / by Eran Levy posted in Data infrastructure, Machine learning, Data science, Real-time


Read More

ETL Pipelines for Kafka Data: Choosing the Right Approach

May 30, 2019 4:44:59 PM / by Eran Levy posted in Apache Kafka, Streaming Data, ETL

If you’re working with streaming data in 2019, odds are you’re using Kafka - either in its open-source distribution or as a managed service via Confluent or AWS. The stream processing platform, originally developed at LinkedIn and available under the Apache license, has become pretty much standard issue for event-based data, spanning diverse use cases from sensors to application logs to clicks on online advertisements.

Read More

4 Challenges of Using Databases for Streaming Data (and a Solution)

May 21, 2019 1:59:55 PM / by Eran Levy posted in Big Data, Database, Streaming Data


Read More