Explore our expert-made templates & start with the right one for you.
Welcome to the October 2023 update. This month brings major improvements, enhancements, and optimizations to the Upsolver UI, making ingesting your data even easier. Read on to discover what we’ve been working on.
This update covers the following releases:
- CDC Jobs
- SQL Queries
New and Improved Look and Feel
Our mission here at Upsolver is to make big data ingestion really easy. Our engineers have been working super hard this last month to further improve the Upsolver UI, ensuring you move your data from source to target with the least amount of friction. The biggest change you’ll notice when you next login to your account is the new-look Wizard. Let’s dive in and see how this will have your jobs up and running in minutes!
Start by selecting your source – we support the major streaming, file, and database sources – and then select your target. If you don’t see your source or target in the list, please drop us a line, we’re continually adding more data platforms. When you’ve selected your options, click Next.
Depending on the source and target combination you select, the Wizard will either present you with a SQL template to help you create your connections and build your first pipeline, or it will guide you through these steps using the UI.
For example, if you choose Microsoft SQL Server as your source, and Amazon Redshift as your target, the Wizard provides the following template:
Click Next, and the template automatically opens in the Upsolver query editor, ready for you to work on:
Each template is annotated with instructions at each step, with basic code to get you started. Our documentation provides plenty of support to help you create connections and build both basic and advanced jobs.
If you choose Snowflake as your target, the Wizard takes you end-to-end, and you can create your job without writing a single line of code!
First, the Wizard walks you through the process of creating your connection and configuring your source (you will be presented with an adapted version of this page for other sources, such as MongoDB or Amazon S3 for example, as all have slightly different configurations).
In the example below, using PostgreSQL as the source, you create a new, or use an existing, connection. Then you choose the publication name, configure the heartbeat table, and select which schemas to ingest.
Next, the Wizard enables you to create or use an existing connection to Snowflake and select your target schema:
You can then configure the options to name your job, set how often you want Snowflake to be updated, and how you want to handle deletes:
The next screen presents you with the SQL code to create the job based on your choices:
The Wizard provides this basic job template, which is adequate for many circumstances, or ready to adapt. What’s really cool about this is that you can skip this altogether and click Run to create the job without touching any code!
Alternatively, open the script in a worksheet and enhance it with expectations, exclude columns, or perform essential transformations such as masking PII data. Furthermore, if you add the script to version control, you can store a history of all the changes.
All in all, the new Wizard will help you ingest your data with ease!
Treat Names as Globs in Apache Kafka Jobs
A glob is a string of literal and/or wildcard characters that is used to match names. We now treat names as globs when copying data from your Apache Kafka topics. This means that stars will match any number of characters, and a question mark will match one character.
Write Dates as ISO-8601 Strings in Elasticsearch Jobs
Ask any developer what’s the biggest headache in their data world, and the chances are it won’t be anything as complex as you imagine, but something as simple as handling dates and times. If you’re ingesting data to Elasticsearch, then this is one headache we can take away from you.
Currently, any timestamp fields ingested to Elasticsearch without timezone information will be stored in UTC format, which may not be your intended outcome if the value is a local time. Thanks to our recent update, your date and timestamp types will now be written as ISO-8601 strings in jobs that write to Elasticsearch. Geek out over this extensive information on the Wikipedia site to learn more about ISO-8601.
Create Missing Table in Snowflake
We’ve added a new job option for writing data to Snowflake. If you include the
CREATE_TABLE_IF_MISSING option in your job, Upsolver will create it automatically. Otherwise if you exclude this option (or set it to
FALSE), the table must be created in Snowflake prior to running your job.
CREATE SYNC JOB ingest_orders_to_snowflake CREATE_TABLE_IF_MISSING = TRUE START_FROM = NOW CONTENT_TYPE = AUTO COMMIT_INTERVAL = 1 MINUTES EVENT_TIME_COLUMN = UPSOLVER_EVENT_TIME ADD_MISSING_COLUMNS = TRUE AS COPY FROM kinesis_samples STREAM = 'orders' INTO my_snowflake_connection.DEMO.ORDERS;
Check out our documentation for full details of the Snowflake job options.
We fixed a bug in synchronized transformation jobs whereby a job with a smaller write interval than another job would not read the respective data.
Exclude Tables Not in a PostgreSQL Publication
When you create an ingestion job on a change data capture database, Upsolver takes an initial snapshot of the database, using Debezium to handle the snapshot and ongoing stream of events.
When Debezium first connects to your database, it determines which tables to capture. By default, all non-system table schemas are captured. By capturing the full database schema, it makes it easier to add event data for tables not yet enabled for CDC at a later date.
For example, if you have 50 tables in your database, but only 35 are enabled for CDC, the Debezium engine would by default take a schema snapshot of all 50 tables, but only capture event data for the 35 CDC-enabled tables.
We have added a new feature whereby tables that aren’t included in a PostgreSQL publication will not be part of the snapshot. So going back to our example, Upsolver now defaults to capturing only the 35 tables you enabled, leading to better performance during the snapshot process.
For Microsoft SQL Server and MySQL, adding the
TABLE_INCLUDE_LIST source option to your job will perform the same action, though you will need to specify each table for inclusion. For MongoDB jobs, use the
COLLECTION_INCLUDE_LIST source option. You can read more about how Upsolver works with snapshots in our documentation.
Support IF EXISTS in DROP Statements
When it comes to coding, it’s always good practice to catch an error and handle it gracefully, rather than be confronted with a big red cross. That’s why our engineers have added support for adding
IF EXISTS to your drop statements. If you run the following statement:
DROP TABLE IF EXISTS "my_table";
And if the table does not exist, you’ll be presented with a message to say the entity does not exist, rather than an error message. A much nicer coding experience! Supported objects include: clusters, connections, tables, jobs, and materialized views.
Create Query Cluster
A cluster hosts a materialized view in memory for millisecond response times, and we’ve added a new feature to enable you to create a query cluster for querying your materialized views in real time.
You create a query cluster in exactly the same the way as any other compute cluster, and then attach it to your materialized view:
CREATE QUERY CLUSTER my_query_cluster MIN_INSTANCES = 1 MAX_INSTANCES = 8 COMMENT = 'Query cluster for store orders'; CREATE SYNC MATERIALIZED VIEW default_glue_catalog.upsolver_demo.mv_physical_store_orders AS SELECT orderid, LAST(saleinfo.source) as source, LAST(saleinfo.store.location.country) as country, LAST(saleinfo.store.location.name) as name, LAST(saleinfo.store.servicedby.employeeid) as employeeid, LAST(saleinfo.store.servicedby.firstname) as firstname, LAST(saleinfo.store.servicedby.lastname) as lastname FROM default_glue_catalog.upsolver_demo.sales_info_raw_data GROUP BY orderid QUERY_CLUSTER = "my_query_cluster";
Furthermore, if you have an existing materialized view and create a new query cluster, you can use an
ALTER statement to attach it:
ALTER MATERIALIZED VIEW default_glue_catalog.upsolver_demo.mv_physical_store_orders SET QUERY_CLUSTER = "my_query_cluster";
Support for il-central-1 Region
Following the recent update from AWS, Upsolver now supports the AWS Israel (Tel Aviv) il-central-1 region; however, this region is currently only supported with private VPC deployments. This enables you to run Upsolver from the AWS data centers located within Israel.
Read more about the announcement from AWS.
Reduced Amazon S3 API Calls
Finally, and this is great news for your invoices, we have reduced the number of Amazon S3 API calls to lower your S3 costs.
That’s it for this month. As we move forward, we can anticipate more new features and enhancements that will empower us to make better data-driven decisions and truly unlock the value of our data assets. Stay tuned for more exciting developments from Upsolver in November!