Deliver Data to ClickHouse Cloud 100x Faster with Upsolver

Upsolver is excited to announce our partnership with ClickHouse, advancing our commitment to empowering data-driven solutions. ClickHouse is an open-source OLAP database management system that enables customers to generate reports using SQL queries in real time. Leveraging a column-oriented architecture, ClickHouse is the fastest and most efficient way to query billions of rows of data in milliseconds

To achieve this level of performance, customers need an ETL solution that reliably inserts high-scale data. Upsolver meets this throughput head-on by streaming only changed rows into ClickHouse Cloud, making it 50x to 100x faster and 1000x more cost-effective than existing ETL and CDC solutions.

High Scale ETL with Upsolver

Designed for big data, Upsolver easily moves high-volume, streaming data from your operational systems into your analytics target. Whether your application source is generating events at a rate of thousands per second, or your gaming or ad-tech company is creating millions of events per second, Upsolver ingests into your data lakehouse or target analytics systems with ease. 

Upsolver automatically detects changes in the source, so only changed rows are ingested into ClickHouse Cloud for a superfast and efficient experience. Orchestration and task scheduling is handled internally, eliminating the need for external tools like Airflow or AWS Lambda. Furthermore, Upsolver automates schema evolution, maintains performance, and ensures the strongly-ordered, exactly once-delivery of your data.  

Our customers regularly use Upsolver to deliver high volume, nested streaming data, commonly at a rate of 1-2 million events per second, with peaks that are 2x – 5x thischeck out what customers are saying here. Adding a ClickHouse Cloud connector to the Upsolver platform unlocks the potential for customers to analyze billions of rows of high-quality, fresh data in real time. 

Upsolver automatically synchronizes sources and targets, handles schema evolution, and ensures data type matching. Jobs automatically recover from connectivity and network hiccups, minimizing the impact from traffic spikes. In addition, Upsolver ensures duplicates are removed and late arriving events are smoothly integrated.

Flexible and Powerful Ingestion Jobs

While Upsolver performs the heavy-duty operations under the hood to deliver high-scale data, we offer a simple and easy approach to building and monitoring your pipelines. Our zero-ETL solution is the easiest route to stream your data into ClickHouse Cloud, guiding you through a few simple steps to build and run your pipelines. With support for the industry’s main data platforms, Upsolver can ingest from database, streaming, and file sources into ClickHouse Cloud. 

Alternatively, customers can leverage the advanced features in Upsolver by writing SQL to create highly customizable jobs and rich transformations. Upsolver includes a vast library of functions and operators for transforming data prior to loading into the target, ensuring the data that lands is formatted to meet the needs of downstream consumers. Choose between ingesting your data directly to ClickHouse Cloud, or, into the lakehouse first and then writing a job to transform and load data into ClickHouse Cloud.

With a few lines of code, customers can load data directly into a ClickHouse table, and include in-flight transformations to reshape data and expectations to manage data quality issues.

CREATE SYNC JOB load_kafka_to_clickhouse
    COMMENT = 'Ingest sales orders from Kafka to Clickhouse
    START_FROM = BEGINNING
    CONTENT_TYPE = AUTO
    EXCLUDE_COLUMNS = ('customer.password') 
    COLUMN_TRANSFORMATIONS = (hashed_email = MD5(customer.email))       
    DEDUPLICATE_WITH = (COLUMNS = (orderid), WINDOW = 4 HOURS)
    COMMIT_INTERVAL = 5 MINUTES
AS COPY FROM KINESIS my_kafka_connection
    TOPIC = 'orders'  
INTO CLICKHOUSE my_clickhouse_connection.sales_db.orders_tbl
    WITH EXPECTATION exp_custid_not_null 
      EXPECT customer.customer_id IS NOT NULL ON VIOLATION DROP;

Upsolver includes a plethora of job options that can be tailored to suit the individual requirements of each job, and customers can integrate Upsolver into their CI/CD process and automate pipeline changes using the SDK and APIs. 

Built-in Job Monitoring and Data Observability

An essential aspect of ingesting and transforming data into ClickHouse Cloud is the ability to troubleshoot problems and check for data quality and freshness. Upsolver’s real-time job monitoring and data observability tools help users quickly detect and fix job or data-related issues.  

Our monitoring features provide meaningful metadata to enable customers to observe the flow and volume of their data, check for quality issues, and uncover changes in the source schema. Statistics are built in as standard in Upsolver thereby eliminating the time and expense of users having to build custom reports. 

Visualizations deliver real-time statistics, alongside drill-through functionality to observe data from table to column level detail. Customers with a specific use case or who want to extract data, can query Upsolver’s system tables to interrogate their metrics and build bespoke reports and dashboards. 

Common Use Cases Ingesting to ClickHouse with Upsolver

Upsolver is tailor-made to make it easy for users to reliably move high volume, high throughput data between operational and analytical systems. Upsolver and ClickHouse are often used in the following use cases.

1. Direct Ingestion from Streaming Sources

Upsolver facilitates streaming directly from Apache Kafka (self-managed), Confluent, and Amazon Kinesis or Amazon MSK into ClickHouse tables. It is not uncommon for application events to be streamed at a rate of thousands per second or, in ad-tech and gaming verticals this often exceeds a million events every second. Customers needing to perform rich transformations and filtering on high-scale data prior to insertion will find Upsolver to be unique in its ability to robustly deliver these features.

2. S3 Object Store to ClickHouse Cloud

Batch-produced data, often derived from legacy systems, partners, third-party applications, or logging and tracing tools, can be continuously loaded into ClickHouse Cloud using an Upsolver transformation job. Data can be prepared before loading, by filtering, fixing, removing poor quality data, or enriching rows.

While you can use query federation and the ClickHouse native S3 integration to query data, you won’t benefit from the performance gains to be made by loading the data directly into ClickHouse, which can return queries in milliseconds. Using Upsolver to ingest and transform your data not only ensures you load events at pace, but also that you maximize performance by using the ClickHouse table engine feature.

3. Ingestion to ClickHouse Cloud and Apache Iceberg

While ClickHouse delivers blazing fast analytics, many organizations still need to perform analytics and train ML models on structured and semi-structured data, using a variety of other tools. For such use cases, raw events must be stored in the lakehouse using Iceberg. 

Organizing and managing vast quantities of structured and semi-structured data in object storage is made easy with Iceberg’s open table format. After data is loaded into the lakehouse, businesses can evolve or rearrange large datasets without facing costly rewrites, and not only can data be accessed from ClickHouse, but also Jupyter notebooks with Python, and other modern processing engines like Apache Spark and Flink. Upsolver jobs enable you to load the data where it is needed, ensuring consumers have the latest data at their fingertips.

4. Ingest Operational CDC Data

Upsolver enables users to stream source changes (CDC) from operational databases, including PostgreSQL, MySQL, Microsoft SQL Server, and MongoDB, into the Iceberg lakehouse and ClickHouse Cloud. 

When you choose Upsolver as your ingestion solution, your events are transformed and prepared in an appropriate format to exploit the ClickHouse CollapsingMergeTree or VersionedCollapsingMergeTree table engines, which implement the CDC update/delete logic. 

Furthermore, Upsolver includes additional columns, such as IS_DELETED and EVENT_TIME, as well as computed fields you may want to add in order to facilitate an efficient CDC stream into your ClickHouse tables.

Get Started with Upsolver

Feeding an OLAP database that can analyze billions of rows a second requires a robust and reliable solution that guarantees data delivery. For ClickHouse Cloud users looking for a simple, highly scalable, and cost-effective solution to stream data from multiple sources into ClickHouse Cloud whilst performing complex transformations prior to insertion, you will find it in Upsolver. 

Discover Upsolver’s powerful ingestion engine and easy-to-use platform by starting your free 14-day trial, or book a demo with one of our Solutions Architects, who will be happy to give you a tour.

Published in: Blog , Upsolver News
Upsolver Team
Upsolver Team

Upsolver enables any data engineer to build continuous SQL data pipelines for cloud data lake. Our team of expert solution architects is always available to chat about your next data project. Get in touch

Keep up with the latest cloud best practices and industry trends

Get weekly insights from the technical experts at Upsolver.

Subscribe

Templates

All Templates

Explore our expert-made templates & start with the right one for you.