<img height="1" width="1" style="display:none" src="https://www.facebook.com/tr?id=315693165909440&amp;ev=PageView&amp;noscript=1">

Upstream

The Big Data Blog

Improving Redshift Spectrum's Performance & Costs

May 14, 2020 5:58:23 PM / by Roy Hegdish posted in Database, Data Lake, Data Architecture, Amazon Athena, Streaming Data, ETL, SQL, AWS S3, Apache Parquet, AWS Redshift

 

Read More

Streaming Machine Learning with Upsolver and AWS SageMaker

May 14, 2020 1:38:55 PM / by Roy Hegdish posted in Database, Data Lake, Data Architecture, Amazon Athena, Streaming Data, ETL, SQL, AWS S3, Apache Parquet, AWS Redshift

 

In a previous article, we covered one of the main challenges in machine learning: the need to set up, maintain and orchestrate two separate ETL flows - one for offline processing and creating the training dataset, and one for real-time serving and inference.

 

Read More

What is Apache Parquet and why you should use it

May 6, 2020 2:05:19 PM / by Roy Hegdish posted in Database, Data Lake, Data Architecture, Amazon Athena, Streaming Data, ETL, SQL, AWS S3, Apache Parquet, AWS Redshift

 

 

Read More

Comparing Amazon Athena vs Traditional Databases

May 3, 2020 11:40:05 AM / by Roy Hegdish posted in Database, Data Lake, Data Architecture, Amazon Athena, Streaming Data, ETL, SQL, AWS S3, AWS Redshift

 

 

Read More

How to Work with Streaming Data in AWS Redshift

Apr 16, 2020 11:19:55 AM / by Roy Hegdish posted in Data Lake, Data Architecture, Amazon Athena, Streaming Data, ETL, SQL, AWS S3, AWS Redshift

 

Amazon Redshift remains one of the most popular cloud data warehouses, and is still constantly being updated with new features and capabilities. Over 10,000 companies worldwide use Redshift as part of their AWS deployments (according to a recent press release).

 

Read More

Solving the Upserts Challenge in Data Lakes

Apr 1, 2020 5:55:49 PM / by Roy Hegdish posted in Data Lake, Data Architecture, ETL, SQL, AWS S3, Amazon S3

 

Updating or deleting data (upserts) is a basic functionality in databases, but is surprisingly difficult to do in data lake storage. In this article, we will explain the challenge of data lake upserts, and how we built a solution to enable an efficient and quick update and delete operations on object storage using Upsolver’s SQL-based data transformation engine.

 

Read More

4 Guiding Principles for Modern Data Lake Architecture

Mar 18, 2020 5:44:06 PM / by Roy Hegdish posted in Data Lake, Data Architecture, ETL, SQL, AWS S3, Amazon S3, Event sourcing

 

Data lakes are the cornerstones of modern big data architecture, but getting them right can be tricky. How do you design a data lake that will serve the business, rather than weigh down your IT department with technical debt and constant data pipeline rejiggering? In this document we cover the four essential principles for effectively architecting your data lake.

Read More

Data Lake as a Service: Is There a GUI-based Data Lake?

Mar 1, 2020 2:11:00 PM / by Eran Levy posted in Big Data, Data Lake, ETL, SQL, AWS S3, Amazon S3, Apache Spark

 

Recent surveys have shown that the data lake market is expected to grow to $20.1 billion by 2024, with a growing number of organizations looking to deploy a data lake in coming years. However, despite growing interest in big data initiatives, a roadblock many organizations run into is the complex, manual nature of building a data lake - which requires hiring skilled personnel that are in dire shortage.

Read More

How (and Why) to Analyze CloudWatch Logs In AWS Athena

Feb 27, 2020 2:15:06 PM / by Roy Hegdish posted in Data Lake, ETL, SQL, AWS S3, Amazon S3, CloudWatch

Amazon CloudWatch is a monitoring service for AWS cloud resources and the applications you run on AWS. While CloudWatch enables you to view logs and understand some basic metrics, it’s often necessary to perform additional operations on the data such as aggregations, cleansing and SQL querying, which are not supported by CloudWatch out of the box.

Read More

Protecting PII & Sensitive Data on S3 with Tokenization

Feb 24, 2020 3:48:46 PM / by Roy Hegdish posted in Data Lake, Amazon S3, Data security, PII

 

Read More