<img height="1" width="1" style="display:none" src="https://www.facebook.com/tr?id=315693165909440&amp;ev=PageView&amp;noscript=1">


The Big Data Blog

3 Steps To Reduce Your Elasticsearch Costs By 90 - 99%

Feb 27, 2019 4:47:59 PM / by Eran Levy posted in S3, Elasticsearch, Log Analysis

This article covers best practices for reducing the price tag of Elasticsearch using a data lake approach. Want to learn how to optimize your entire streaming data infrastructure? Check out our technical whitepaper to learn how leading organizations generate value from cloud data lakes. Get the paper now!


Elasticsearch is a fantastic log analysis and search tool, used by everyone from tiny startups to the largest enterprises. It’s a robust solution for many operational use cases as well as for BI and reporting, and performs well at virtually any scale - which is why many developers get used to ‘dumping’ all of their log data into Elasticsearch and storing it there indefinitely.

Read More

4 Key Components of a Streaming Data Architecture

Jan 30, 2019 8:15:38 PM / by Eran Levy

Streaming data is becoming a core component of enterprise data architecture. Streaming technologies are not new, but they have considerably matured over the past year. The industry is moving from painstaking integration of technologies like Kafka and Storm, towards full stack solutions that provide an end-to-end streaming data architecture.

Read More

Integrate Upsolver with Git for Carefree Change Management and Review

Jan 14, 2019 5:37:00 PM / by Eran Levy posted in Product Updates, Git

Today we’ve got some great news for organizations that have multiple users working on Upsolver, or anyone who likes to fiddle with the system and make frequent changes to data sources, output streams, aggregations or other features. Thanks to Upsolver’s new built-in Git integration you can have multiple users fiddling away, safe in the knowledge that all your work will be securely stored and easily recoverable in your Git repository.

Read More

Top 7 Trends in Streaming Data for 2019

Dec 20, 2018 6:43:01 PM / by Ori Rafael posted in Industry Trends, Schemaless, Streaming Data

As the end of the year rapidly approaches, it’s time to take a look at what the next one might have in store.

Read More

What’s New in Upsolver - December 2018 Edition

Dec 18, 2018 5:40:00 PM / by Eran Levy posted in Product Updates

We’re closing the year strong with some great new features that can help improve the breadth and versatility of your work with Upsolver. Highlights include:

Read More

Problems with Small Files on HDFS / S3? Make Them Bigger

Oct 23, 2018 2:08:54 PM / by Eran Levy posted in HDFS, S3, Amazon Athena

More often than not, big data is made up of a lot of small files. Event-based streams from IoT devices, servers or applications will typically arrive in kb-scale JSON files, easily adding up to hundreds of thousands of new files being ingested into your data lake on a daily basis.

Writing small files to an object storage (Amazon S3, Azure Blob, HDFS, etc.) is easy enough; however, trying to query the data in this state using an SQL engine such as Athena or Presto will absolutely kill both your performance and your budget.

Read More

Upsolver Data Security: An Overview

Oct 1, 2018 7:00:00 PM / by Eran Levy

As a company founded by data professionals, data security is of our utmost concern. Upsolver offers a comprehensive set of protections to provide the highest level of security to all sensitive data that is processed or managed using the Upsolver platform. Upsolver uses a cloud-native architecture to keep customers’ data safe in their own AWS account, while offering additional layers of security to prevent any unauthorized access to data through the Upsolver front-end UI.

Read More

4 Examples of Data Lake Architectures on Amazon S3

Sep 27, 2018 8:12:13 PM / by Eran Levy posted in Data Lake, Data Architecture

So you’ve decided it’s time to overhaul your data architecture. What’s next? How do you go about building a data lake that delivers the results you’re expecting.

Well, we’re strong believers in the notion that an example is worth a thousand convoluted explanations. That's why this post is all about real-life examples of companies that have built their data lakes on Amazon S3. Use it for inspiration, reference or as your gateway to learn more about the different components you'll need to become familiar with for your own initiative.

Read More

Apache Kafka with and without a Data Lake

Sep 13, 2018 4:09:13 PM / by Eran Levy posted in Data Lake, Data Architecture, Apache Kafka

Apache Kafka is a cornerstone of many streaming data projects. However, it is only the first step in the potentially long and arduous process of transforming streams into workable, structured data. How should you design the rest of your data architecture to build a scalable, cost effective solution for working with Kafka data? Let’s look at two approaches - reading directly from Kafka vs creating a data lake - and understand when and how you should use each.

Read More

Understanding Data Lakes and Data Lake Platforms

Sep 5, 2018 4:22:46 PM / by Eran Levy posted in Data Lake, Data Lake Platform, Data Architecture

The following article is an abridged version of our new guide to Data Lakes and Data Lake Platforms - get the full version for free here.

If you’re working with data in any capacity, you should be familiar with Data Lakes. Even if you don’t need one today, the rapid growth of data and demand for increasingly versatile analytic use cases (such as reporting, machine learning, and predictive analytics) could result in your organization outgrowing its data infrastructure much sooner than you currently foresee.

Read More