<img height="1" width="1" style="display:none" src="https://www.facebook.com/tr?id=315693165909440&amp;ev=PageView&amp;noscript=1">

Upstream

The Big Data Blog

Eran Levy

Director of Marketing at Upsolver
Find me on:

Recent Posts

Orchestrating Streaming and Batch ETL for Machine Learning

Aug 20, 2019 4:38:56 PM / by Eran Levy posted in Schemaless, ETL, Event Streams, Data Lake ETL, Data infrastructure, User personalization

 

Read More

Partitioning Data on S3 to Improve Performance in Athena/Presto

Aug 13, 2019 2:04:33 PM / by Eran Levy posted in Data Lake Platform, Amazon Athena, Data Lake ETL, AWS S3, Partitioning

 

Read More

Getting Data Lake ETL Right: 6 Guidelines for Evaluating Tools

Aug 6, 2019 2:52:11 PM / by Eran Levy posted in Database, Data Lake Platform, SQL, Data Lake ETL, Data infrastructure

 

Read More

Real-time Machine Learning: Hype vs Reality

Jul 23, 2019 2:38:27 PM / by Eran Levy posted in Data infrastructure, Machine learning, Data science, Real-time

 

Read More

ETL Pipelines for Kafka Data: Choosing the Right Approach

May 30, 2019 4:44:59 PM / by Eran Levy posted in Apache Kafka, Streaming Data, ETL

If you’re working with streaming data in 2019, odds are you’re using Kafka - either in its open-source distribution or as a managed service via Confluent or AWS. The stream processing platform, originally developed at LinkedIn and available under the Apache license, has become pretty much standard issue for event-based data, spanning diverse use cases from sensors to application logs to clicks on online advertisements.

Read More

4 Challenges of Using Databases for Streaming Data (and a Solution)

May 21, 2019 1:59:55 PM / by Eran Levy posted in Big Data, Database, Streaming Data

 

Read More

Big Data Infrastructure: When to Build, When to Buy

May 14, 2019 4:04:53 PM / by Eran Levy posted in Big Data, Data Architecture, Data Engineering

Every software development team makes build-vs-buy decisions on a regular basis. For most coding problems, someone is offering a packaged or white-label solution. The decision whether to purchase a tool or develop an alternative in-house - to ‘build or buy’ - is typically made ad-hoc based on cost, existing engineering skillsets and organizational culture.

Read More

Kafka vs. RabbitMQ: Architecture, Performance & Use Cases

May 7, 2019 2:42:07 PM / by Eran Levy posted in Data Architecture, Apache Kafka, RabbitMQ

 

Read More

How to Improve AWS Athena Performance: The Complete Guide

Apr 22, 2019 12:59:57 PM / by Eran Levy

 

Read More

7 Popular Stream Processing Frameworks Compared

Mar 21, 2019 7:03:50 PM / by Eran Levy

Stream processing is a critical part of the big data stack in data-intensive organizations. Tools like Apache Storm and Samza have been around for years, and are joined by newcomers like Apache Flink and managed services like Amazon Kinesis Streams.

Read More