<img height="1" width="1" style="display:none" src="https://www.facebook.com/tr?id=315693165909440&amp;ev=PageView&amp;noscript=1">

Digital Resource Library

Welcome to our content library - gather insights on big data, machine learning, ETL streaming data & so much more!

Blog Post

How to Work with Streaming Data in AWS Redshift

In this article we look at streaming data storage and analytics and explain when and how to use Redshift.  

View Details >

Blog Post

Solving the Upserts Challenge in Data Lakes

Updating or deleting data is a basic functionality in databases, but is difficult to do in data lake storage.

View Details >

Webinar

Frictionless Data Lake ETL for Streaming Data

Discover how Upsolver helped ironSource transform 500K events per second, saving thousands of engineering hours.

View Details >

Blog Post

4 Guiding Principles for Modern Data Lake Architecture

In this article, we cover the four essential principles for effectively architecting your data lake.

View Details >

Blog Post

Data Lake as a Service: GUI-based Data Lake?

Why are data lakes still hard in the age of everything-as-a-service? What’s stopping the data lake from being ‘productized’?

View Details >

Blog Post

How (and Why) to Analyze CloudWatch Logs In AWS Athena

This article presents a reference architecture & principles for storing logs in analytics-ready format on Amazon S3.

View Details >

Blog Post

Protecting PII & Sensitive Data on S3 with Tokenization

This article looks at the challenge of protecting personal data, and how to solve it using S3 partitioning and tokenization.

View Details >

Blog Post

Custom Partitioning for Embedded Analytics - Athena

Check out this solution for building performant embedded analytics on streaming data using Amazon Athena. 

View Details >

Blog Post

Data Architecture for AWS Athena: 6 Examples

This article looks at how you can incorporate Athena in different data architectures & support various use cases.

View Details >

Blog Post

Apache Spark Limitations & the Alternative

Apache Spark has become the de-facto standard for big data processing.  But is Spark always the best choice? 

View Details >

Blog Post

14 of the Best Data Podcasts, Blogs and Websites

We’ve gathered a list of the 14 best data engineering resources - from podcasts to blogs to video libraries.

View Details >

Blog Post

Upsolver Lookup Tables: A Decoupled Alternative

We compare Upsolver's platform to alternative solutions which would require a NoSQL database.

View Details >

Blog Post

In 2020, the Data Lake is Ripe for Self-service

This article dives into trends of big data infrastructure - data lakes, stream processing and ETL.

View Details >

Blog Post

Best Practices for High-performance Data Lakes

In this article, we’ll present best practices you should adhere to when designing and implementing your data lake.

View Details >

Whitepaper

The Modern Data Lake Architecture, Powered by Upsolver

We present the infrastructural challenges of working with big data streams, thereafter presenting Upsolver's solution.

View Details >

Blog Post

5 Redshift & Athena Announcements from re:Invent 2019

This article reviews five exciting announcements related to Amazon Redshift and Amazon Athena. 

View Details >

Blog Post

Benchmarking AWS Athena & BigQuery: A Comparison

This article compares Athena and BigQuery, diving into their real-world performances. 

View Details >

Resource

Amazon Athena: Resources, Guides and Best Practices

Learn everything you need to get started with Amazon Athena, or discover new best practices to improve performance & costs.

View Details >

Blog Post

Streaming Data on AWS: Tools and Resources to Know

This guide jumps into some of the more popular tools for working with streaming data on Amazon Web Services. 

View Details >

Blog Post

ETL Kinesis Data to AWS Athena in Minutes with UpSQL

Here's a quick guide on how to create an ETL pipeline from  Kinesis to Athena using only SQL and a visual interface.

View Details >

Blog Post

7 Guidelines for Ingesting Big Data to Data Lakes

7 best practices for big data ingestion - from strategic principles down to the more tactical (and technical) issues.

View Details >

Blog Post

Joining Streams and Big Tables on S3: NoSQL/UpSQL /Spark

One of the biggest challenges when working in a data lake architecture is that you’re dealing with files sitting in a folder.

View Details >

Blog Post

Upsolver Announces SQL-based ETL for Cloud Data Lakes

The new functionality eliminates friction and complexity in big data initiatives, such as machine learning.

View Details >

DATAVERSITY

Data Streaming: 7 Unexpected Paths It’s Taking Today

With 2019 nearly over, do you know where data streaming is headed? Here's where it's heading.

View Details >

Podcast

How Upsolver Is Building A Data Lake Platform

Data Engineering Podcast - Upsolver CTO Yoni Iny discusses how to build a data lake platform in the cloud. 

View Details >

Blog Post

Joining Impression & Click Streams Easily Using UpSQL

We discuss how to use UpSQL to easily create a dataset for predicting whether a user will click on an ad.

View Details >

Blog Post

6 Tips for Querying Big Data in AWS Athena

6 things you need to keep in mind when building out ETL workflows for effectively consuming data in Athena.

View Details >

Blog Post

A Comparison: Apache Kafka vs Amazon Kinesis

Apache Kafka and Amazon Kinesis are two popular messaging queue systems. Let's compare.

View Details >

Blog Post

Alooma is Ending Support for AWS. Here's What's Next

It seems that Alooma is no longer catering to customers on AWS... so what happens next? 

View Details >

Blog Post

Athena or Redshift? Answer these 4 Questions to Decide

Here are 4 basic questions to ask when deciding to use either Athena or Redshift when working with your streaming data. 

View Details >

Blog Post

4 Key Components of a Streaming Data Architecture

Streaming data is becoming a core component of enterprise data architecture for a variety of reasons. 

View Details >

Blog Post

Data Lake ETL for IoT Data: From Streams to Analytics

For most enterprises, IoT projects have yet to cross the proof-of-concept stage and are yet to show clear return on investment.

View Details >

Blog Post

Orchestrating Streaming ETL for Machine Learning

Real-time machine learning brings about many challenges. with many projects getting stuck, failing to come into fruition.

View Details >

insideBIGDATA

Question: Do You Actually Need a Data Lake?

Here are 5 indications that should assist in deciding to join the data lake bandwagon or stick to traditional data warehousing.

View Details >

Webinar

Online Inference with User Personalization at Scale

Applying machine learning models that rely on both historical user data and real-time data is often challenging. 

View Details >

Whitepaper

AWS Athena Performance: Best Practices & Tips

In our whitepaper, we dive into the best practices and tips for maximizing value from Amazon Athena. 

View Details >

Whitepaper

5 Signs You've OutgrownYour Data Warehouse

As data grows, it may be time  to reevaluate where to store the data.  Here are the signs to know. 

View Details >

Webinar

ETL for Amazon Athena: 6 Things to Know

Listen to this webinar recording to learn the 6 core tenets of preparing data for Athena Amazon Athena. 

View Details >

Blog Post

Partitioning Data on S3 to Improve Athena Performance

In an AWS S3 data lake architecture, partitioning plays a crucial role when querying data in Amazon Athena..

View Details >

Blog Post

A Data Lake Approach to Event Stream Analytics

More recently, there's an increased demand event for log data analysis by data analytics teams & business units. 

View Details >

Blog Post

IoT Analytics: Challenges & Innovations

IoT analytics requires a lot of tools, from data lakes to stream processing frameworks and analytics tools.

View Details >

Blog Post

Real-time Machine Learning: Hype vs Reality

Getting machine learning projects off the ground is often easier said than done.  Here are some tips for your next project.

View Details >

Blog Post

Batch & Stream Processing: A Cheat Sheet

One of the questions to ask when planning out your data architecture is the question of batch vs stream processing.

View Details >

Blog Post

ETL Pipelines for Kafka Data: Choosing the Right Approach

Learn the basics of ETL for Kafka streams, and get an overview of three approaches to building a successful pipeline.

View Details >

Blog Post

Big Data Infrastructure: When to Build or to Buy

When you need a new big data infrastructure, when should you build? When should you buy? 

View Details >

Blog Post

Cloud Data Lake vs On-Premises Data Lake

Is it time to move your data lake to the cloud? Here's what you should consider... or shouldn't.

View Details >

Blog Post

Kafka vs. RabbitMQ: Architecture & Use Cases

Apache Kafka and RabbitMQ are two open-source and commercially-supported pub/sub systems.

View Details >

Blog Post

7 Stream Processing Frameworks Compared

Stream processing is a critical part of the big data stack in data-intensive organizations.

View Details >