Continuous Integration and Development (CI/CD) in Upsolver

CI/CD is a set of methodologies used to frequently deliver new features to customers. The main CI/CD frameworks are continuous integration, continuous delivery, and continuous deployment. CI/CD is used to solve problems that development and operations teams often encounter when integrating new code. Implementing CI/CD in databases can be challenging, but Upsolver’s event sourcing architecture makes it easy to apply CI/CD principles throughout your development lifecycle.

The Challenge of Traceability in Traditional Database Architecture

When working with traditional databases, applying CI/CD can be challenging since the database state is maintained and managed in the database while the transformation code is maintained and managed separately.

In a typical development lifecycle, transformation code is constantly running and changing. As a result, your database is being updated over-time. What happens when you need to restore the entire system to a previous state? The transformation code, which is probably backed up in a version control system such as Git, can be easily restored; whereas your database state has already changed since the time that transformation was running and it will be challenging to restore without keeping a backup of the database for each version of your transformation code. Maintaining such a backup is both risky and will cause high infrastructure and operational costs. This will require two separate CI/CD processes – one for the database and one for the transformation code. By enabling a data lake architecture with full traceability, Upsolver allows you to simplify CI/CD processes for your application and data pipeline development.

Enable real-time data serving

Full Traceability (Event Sourcing) in Upsolver

Upsolver makes it easy to implement CI/CD since full traceability is built into the platform. Upsolver’s architecture follows event sourcing principles, and is based on an immutable log of all incoming events. These events are then processed with Upsolver ETL to create a queryable copy of the data. Unlike databases where the state constantly changes (which makes it hard to reproduce its original state without configuring, a change-log), in Upsolver you can always ‘go back in time’ and retrace your steps to learn about the exact transformation applied on your raw data, down to the event level. You can fix a bug in your ETL and then run it using the immutable copy of your raw data. As a result – when using Upsolver, you only need to implement CI/CD once for your ETL code, and not for your data.

Upsolver CICD - Frequently asked questions:

How do you move data pipelines from development to production?

Upsolver provides complete separation between your development and production environments, which can be applied to all the entities configured in your Upsolver account. When you are done developing and testing your ETL in your development environment – deploying it to production takes just a few clicks. You use the same ETL you already tested and developed in your dev environment and run it on top of your production data streams.

How do you backup ETL code?

Backing up your ETL code is being done using Upsolver’s Git integration feature. Using this functionality allows you to use all the familiar Git capabilities such as source code version management, collaboration, and code ownership.

How to manage Upsolver's infrastructure as code?

Upsolver provides a REST API that enables you to manage all Upsolver’s infrastructure from your code. This allows you to perform all the operations performed from Upsolver’s UI using our API in your code if you prefer to do so.

data lake ingestion

CI / CD

  • Only need CI/CD for ETL code (raw data is stored as an immutable log)
  • Isolation between dev and prod
  • GIT integration for ETL code source control
  • REST API for infra-as-code
Integrate Upsolver with Git for Carefree Change Management and Review
data lake ETL Demo

Batch and streaming pipelines.

Streaming plus batch in a single pipeline platform

No Airflow – orchestration inferred from data

$99 / TB of data ingested | unlimited free pipelines

Get Started Now

Templates

All Templates

Explore our expert-made templates & start with the right one for you.