Upsolver June 2024 Feature Summary

Can you believe we’re halfway through 2024 and it’s June already? Having been with Upsolver just over a year now, I have observed the platform undergo a whole heap of improvements thanks to the dedication of our product and engineering teams. Over the last few weeks, our engineers have worked their usual magic to hunt down bugs and make enhancements that further solidify the Upsolver platform.

The exciting news this month is our support for creating replication jobs that write to Amazon Redshift. Having spent over half of my lifetime working with SQL Server, I’m always thrilled when the team release capabilities that support CDC. If you haven’t discovered replication jobs yet and want to ingest from a database source (we support MongoDB, MySQL, PostgreSQL, and SQL Server), please check out the documentation so you can find out for yourself how easy this is to implement.

Replication jobs enable you to ingest your CDC data from your database into multiple target schemas, and each schema can be configured with its own set of options. The SQL syntax for creating a replication job is really straightforward and easy to understand (it’s so cool, I wish I’d designed and coded this feature!!). As well as supporting Amazon Redshift, replication jobs can also write to Snowflake, Tabular, and AWS Glue Data Catalog targets.

This month’s update includes plenty of fixes and enhancements prior to upcoming new features that I am looking forward to sharing with you next month. While I polish my CV and apply for a job with our engineering team, check out the summary of this month’s changes. And if you want a guided tour of Upsolver – whether you have CDC, streaming, or file data to ingest – please book your free demo.

The following releases are included in this update:

Contents

Jobs

Issue Fixed with Deleting Materialized View Files

There was an issue where 2-minute merge files were sometimes not fully deleted when deleting materialized views. This problem has now been resolved.

Fixed Temporary Files Not Being Deleted in Index Creation

The bug where temporary files wouldn’t be deleted in the index creation task if the task failed, has now been fixed. 

Support for Managed Iceberg Tables in Non-Sync Jobs

We have now added support for using non-sync jobs to write to Upsolver managed Iceberg tables. Please check out this article to understand the difference between sync and non-sync jobs

Snowflake Ingestion Jobs Now Use Commit Interval

I’m pleased to report that ingestion jobs writing to a Snowflake target, now use the given commit interval. This change is applied only to jobs created from now on and not jobs created prior to this fix

Fixed the Bug When Casting to JSON in Iceberg Tables

The bug in jobs writing to Iceberg tables where using CAST to JSON type expressions caused double encoding of fields inside the JSON string, has now been fixed. 

CDC Jobs

NEW! Replication Jobs Support for Amazon Redshift Target

Replication jobs in Upsolver enable you to use one job that copies data from a single source into multiple target schemas. This super handy feature is also easy to create and maintain. Each target schema is managed by a replication group, which has its own set of options that can be configured independently of other groups.

We are delighted to announce that we now support Amazon Redshift as a target. In the example below, a job is created to ingest data from a CDC-enabled Microsoft SQL Server database, into a production and a development schema in the target Redshift database: 

CREATE REPLICATION JOB mssql_replication_to_redshift
  COMMENT = 'Replicate SQL Server CDC data to Redshift groups'
  COMPUTE_CLUSTER = "Default Compute (Free)"
  INTERMEDIATE_STORAGE_CONNECTION = s3_connection
  INTERMEDIATE_STORAGE_LOCATION = 's3://upsolver-integration-tests/test/' 
FROM my_mssql_connection 
WITH REPLICATION GROUP replicate_to_redshift_prod 
  INCLUDED_TABLES_REGEX = ('orders\..*')
  EXCLUDED_COLUMNS_REGEX = ('.*\.creditcard') -- exclude creditcard columns 
  COMMIT_INTERVAL = 5 MINUTES
  LOGICAL_DELETE_COLUMN = "is_deleted"
  REPLICATION_TARGET = my_redshift_connection
  TARGET_SCHEMA_NAME_EXPRESSION = 'ORDERS'
  TARGET_TABLE_NAME_EXPRESSION = $table_name
  WRITE_MODE = MERGE 
WITH REPLICATION GROUP replicate_to_redshift_dev 
  INCLUDED_TABLES_REGEX = ('orders\..*')
  COMMIT_INTERVAL = 1 HOUR
  REPLICATION_TARGET = my_redshift_connection
  TARGET_TABLE_NAME_EXPRESSION = 'history_' || $table_name
  TARGET_SCHEMA_NAME_EXPRESSION = 'ORDERS_DEV'
  WRITE_MODE = APPEND;

Each group can be defined to work at different intervals, and use different options. Explore the documentation to learn how to use replication jobs – my favorite feature in Upsolver!

Fixed Snapshotting Process Becoming Stuck 

We discovered an issue whereby, during the table snapshotting phase, the process would become stuck. This is now fixed and the snapshotting process now runs smoothly.

Read about performing snapshots in your CDC jobs.

Fixed CDC Client Becoming Stuck

Another issue of things getting in a jam, was sometimes the CDC client was getting temporarily stuck after a server replacement, but I’m happy to report that this has now been fixed. 

SQL Queries

Made GROUP BY Expressions Case Insensitive in Transformation Jobs

We have fixed the case sensitivity issue within transformation jobs so that GROUP BY expressions are now case insensitive relative to their SELECT expression. 

In the example below, we aggregate the orders based on the customer’s city and compute the total revenue per city. With the case sensitivity removed from the GROUP BY clause, we can specify CITY in uppercase, and it will be parsed without error, returning the correct aggregation on the city column.

CREATE JOB group_by_test
    START_FROM = BEGINNING
    ADD_MISSING_COLUMNS = TRUE
    RUN_INTERVAL = 1 MINUTE
AS INSERT INTO default_glue_catalog.upsolver_samples.orders_aggregated_data 
  MAP_COLUMNS_BY_NAME
    SELECT customer.address.city AS customer_address_city, 
       SUM(nettotal) AS total_revenue
    FROM default_glue_catalog.upsolver_samples.orders_raw_data
    WHERE TIME_FILTER()
    GROUP BY customer.address.CITY;

Please see the documentation for more information on GROUP BY, and this guide that shows you how to aggregate data in a transformation job

Enhancements

Upgraded the Amazon Redshift Driver 

The driver for Amazon Redshift has been upgraded to the latest version.

Libraries Upgraded and Patched

Further to the above upgrade, we have also upgraded all libraries to include recent security patches, thereby enhancing system security and stability.

Enhanced Schema Evolution For Amazon Redshift Targets

Upsolver handles schema evolution from sources that write to an Amazon Redshift target, whether you use an ingestion, replication, or transformation job.

We will automatically create and modify tables in your Amazon Redshift tables to accommodate changes in the data source. When a change in data type is encountered, Upsolver creates a new column in the target table if the type change is wider than the original. This prevents errors and data loss. 

Learn how Upsolver handles schema evolution and resolves type casting when writing to Amazon Redshift .

Support Casting to JSON for Amazon Redshift

We have made enhancements to the Upsolver API to enable casting to JSON in jobs writing to Amazon Redshift. 

Snowflake Target Schema Created if Not Existing in Target

When writing to a Snowflake database, if the target schema does not exist, it will now be created automatically, before the table is created.

Learn how we manage schema evolution in Snowflake.

Improvements made to the Handling of Apache Iceberg Files

When you choose Upsolver to manage your Iceberg tables, we take care of the compaction of your data files so experience consistently performant queries and reduced storage costs. We made improvements to how we delete dangling files from Iceberg tables, and how we expire old table snapshots. 

Flag Added to System Table to Denote Managed Iceberg Tables

We have added the is_external column to the system table system.information_schema.tables to flag whether a table is an Iceberg table managed by Upsolver, but is not a target for an Upsolver ingestion job. 

To query this system table, login to Upsolver, and click on Worksheets on the main menu on the left-hand side. In a new worksheet, run the following query:

SELECT * 
FROM system.information_schema.tables;
-- If you have a lot of tables, use LIMIT to reduce the result set 
-- LIMIT 50;

Fixed an Error While Reading Positional Delete Files

During the process of compacting Iceberg data files, an error occurred while reading positional delete files. This has now been corrected.


That’s it for this month! I look forward to bringing you all the latest news in July. In the meantime, please follow us on LinkedIn for all the latest news and announcements, or book a demo if you have not yet given Upsolver a test drive. 

Published in: Blog , Release Notes
Rachel Horder
Rachel Horder

Rachel spent the first half of her career building websites using .NET and later shifted her focus to SQL Server, honing her expertise in data migration and integration. Driven by her enthusiasm for data and ETL processes, she co-authored the book "What's New in SQL Server 2012" and contributed to writing the official training material for Microsoft SQL Server 2016 and Power BI. In her current role as Senior Technical Writer at Upsolver, Rachel continues to combine her passions for data and writing.

Keep up with the latest cloud best practices and industry trends

Get weekly insights from the technical experts at Upsolver.

Subscribe

Templates

All Templates

Explore our expert-made templates & start with the right one for you.