Most of our users have data from various sources including real time streaming data and historical data. It’s very important to combine data from these sources together for analytics. This guide provides an overview on how to combine multiple data sources together.
Let’s get started.
1. Click on OUTPUTS on the left and then NEW on the right upper corner.
2. SELECT Amazon Athena as the data output.
3. Click on Add to add as many data sources as you need. Click on Next to continue.
1. Use the + sign next to data to map your fields to the output.
2. Deselect the columns that you don’t want to output. If you want to add each field individually, you can click on the plus sign next to each field. The plus sign next to data brings in all fields unless you deselect the ones you don’t want to output. For this example, we’re going to output all fields. Leave everything checked. Click on ADD FIELDS.
3. Perform the same thing for the second data source. Click on the + sign next to data and select ADD FIELDS to add all the fields from the second data source as well.
5. Click on the SQL tab on the upper right hand corner.
6. By default, when two data sources have the same column name, Upsolver will add _<number> to the duplicated column. For example, if both data sets have a column named “id”, the output will have “id” from one data source and “id_1” from another data source. This behavior of not merging two columns together automatically is because not all columns with the same name mean the same thing. If the columns do mean the same and they should be merged, then COALESCE statement will combine the columns from two data sources together.
7. Click on PREVIEW to review your data and RUN.
2. Select the COMPUT CLUSTER that you want to use. Choose time range that you want to load your data from. Keep in mind that for live streaming data, leave ENDING AT as Never. Click on DEPLOY.
Joining multiple data streams for real-time analytics.