<img height="1" width="1" style="display:none" src="https://www.facebook.com/tr?id=315693165909440&amp;ev=PageView&amp;noscript=1">

Implement Change Data Capture (CDC) in your AWS Data Lake

Think CDC is just for databases? Think again. Upsolver’s data lake automation platform enables you to implement CDC on S3 so data is queryable using a SQL engine (Athena / Redshift Spectrum / Presto / SparkSQL) with minimal time, effort and compute resources spent on ETLs.

Accurately reflect changes to operational databases in near real-time reports

Dramatically reduce engineering time and resources spent on ETLs and lake data management

Easily address data privacy and GDPR requirements

Data lakes are built around append-only file storage and not traditional tables model with primary / foreign keys. In order to maximize performance, data lakes are not utilizing indexing to find individual records and do not include atomic insert/update/delete operations. Performing updates / deletes in a lake based on a log of changes from an operational database, requires both ETLs to re-write the data and an integration to the SQL engines in order to keep the data consistent for queries.

With Upsolver, data lake CDC is easy and instant. Simply select your upsert key, and Upsolver will automate the rest - enabling you to update your data lake to reflect near real-time data changes, as well as in the tables you read via Amazon Athena / Redshift Spectrum / Presto / SparkSQL

Key Features

CDC implemented using a visual UI / SQL, replacing the need to code Hadoop / Spark ETLs.

Glue Data Catalog and Hive Metastore integration in order to keep table data consistent for queries in near real-time.

Compaction process (re-writes the data) that runs in the background and optimized the data storage so queries will run up to 100X faster​.

Time-travel on S3 for re-creating tables

Ability to reprocess historical data on S3


Discover how Upsolver can unlock the value of your streaming data.

See it in action