The Challenge of Traceability in Traditional Database Architecture
When working with traditional databases, applying CI/CD can be challenging since the database state is maintained and managed in the database while the transformation code is maintained and managed separately.
In a typical development lifecycle, transformation code is constantly running and changing. As a result, your database is being updated over-time. What happens when you need to restore the entire system to a previous state? The transformation code, which is probably backed up in a version control system such as Git, can be easily restored; whereas your database state has already changed since the time that transformation was running and it will be challenging to restore without keeping a backup of the database for each version of your transformation code. Maintaining such a backup is both risky and will cause high infrastructure and operational costs. This will require two separate CI/CD processes - one for the database and one for the transformation code. By enabling a data lake architecture with full traceability, Upsolver allows you to simplify CI/CD processes for your application and data pipeline development.
Get our technical whitepaper
Discover how Upsolver helps leading organizations manage and scale their cloud data lake infrastructure.
Download the guide now.
Begin a free trial, no strings attached
If you're not sure about us, simply start a free trial. See how easy it can be to manage your data lake and prepare data streams for analysis with a free, fully-featured trial of Upsolver.
Begin your free trial.
Visit our big data blog
Keep up with data trends and learn more about the big data landscape through our Upstream blog.
Check out our data blog!
Full Traceability (Event Sourcing) in Upsolver
Upsolver makes it easy to implement CI/CD since full traceability is built into the platform. Upsolver’s architecture follows event sourcing principles, and is based on an immutable log of all incoming events. These events are then processed with Upsolver ETL to create a queryable copy of the data. Unlike databases where the state constantly changes (which makes it hard to reproduce its original state without configuring, a change-log), in Upsolver you can always ‘go back in time’ and retrace your steps to learn about the exact transformation applied on your raw data, down to the event level. You can fix a bug in your ETL and then run it using the immutable copy of your raw data. As a result - when using Upsolver, you only need to implement CI/CD once for your ETL code, and not for your data.
CI/CD - Frequently asked questions:
How do you move data pipelines from development to production?
Upsolver provides complete separation between your development and production environments, which can be applied to all the entities configured in your Upsolver account. When you are done developing and testing your ETL in your development environment - deploying it to production takes just a few clicks. You use the same ETL you already tested and developed in your dev environment and run it on top of your production data streams.
How do you backup ETL code?
Backing up your ETL code is being done using Upsolver’s Git integration feature. Using this functionality allows you to use all the familiar Git capabilities such as source code version management, collaboration, and code ownership.
How to manage Upsolver's infrastructure as code?
Upsolver provides a REST API that enables you to manage all Upsolver’s infrastructure from your code. This allows you to perform all the operations performed from Upsolver’s UI using our API in your code if you prefer to do so.
Still have questions? Let's have a quick chat.
Schedule a free, no-strings-attached demo to discover how Upsolver can radically simplify data lake ETL in your organization.