Alooma is a cloud ETL solution used to integrate and enrich data from various sources into data warehouses such as Amazon Redshift, Snowflake and Google BigQuery, as well as data lakes built on object storage such as Amazon S3. Alooma was founded in 2013 and since then has managed to land some impressive enterprise customers such as Sony, Strava and the New York Times (according to the logos flaunted on the company’s homepage).
Alooma’s data pipeline tool initially offered support for all major cloud services: Amazon Web Services, Microsoft Azure and Google Cloud. However, when the company was acquired by Google Cloud in early 2019, concerns were immediately voiced that the platform would not continue to support competing cloud platforms for long.
Based on the information we’ve received as well as changes to Alooma’s public website, it seems that these concerns have come to fruition: Alooma will soon be limited to Google Cloud, and existing customers will be asked to migrate their ETL infrastructure or look for alternatives.
The Present and Future of Alooma on Amazon Web Services
Alooma was used by AWS users as an ETL tool that extracted data from apps, databases and streaming sources, enabled users to perform transformations and enrichments, and loaded the data into Redshift (support for writing to Amazon S3 was also available).
About six months after the Google acquisition, it seems that Alooma is beginning to sunset its support for Amazon Web Services and Redshift. While we do not know it for a fact, we have seen multiple indications suggesting this is the case, including:
- Several conversations with existing Alooma customers who were informed that Alooma will no longer be available on AWS starting from early 2020
- Changes to Alooma’s website - as can be seen in the cached version of Alooma’s homepage on the Wayback Machine archive, in March of this year the tagline still read: Alooma brings all your data sources together into BigQuery, Redshift, Snowflake, Azure, and more. Today a similar tagline appears, but every database except BigQuery is conspicuously absent:
Source: Alooma website
So it seems there are pretty strong signs that times are indeed a-changing for Alooma customers who currently use the platform as their AWS data pipeline. What happens next?
Migrating from Alooma: the Alternatives
As we’ve mentioned, Alooma was mostly used for ETL to Amazon Redshift - with the two major types of sources being streaming data from Apache Kafka or Amazon Kinesis, and application data from sources such as Salesforce, Mixpanel or Intercom. These are distinct use cases, and there are viable alternatives for each one, allowing your data infrastructure to continue operating without the uncalled for interruption of migrating to a new cloud provider.
Alternatives for Streaming Data
Alooma on AWS currently provides native integration with Kafka or Kinesis streams, managed workflow orchestration to ensure events are written to Redshift, and enables basic data transformation and enrichment using Python.
You can get the same or more functionality from the following tools:
Obviously we are not unbiased when talking about our own data lake ETL platform, but we will mention that Upsolver offers native integration to Kafka, Kinesis or Amazon Managed Kafka and a full suite of self-service ETL capabilities that are performed entirely via a visual UI and SQL. You can then write the clean, enriched and structured data to Redshift in a few clicks.
However, the main advantage of replacing Alooma with Upsolver is that Upsolver enables you to transition from a data warehouse architecture to a data lake - storing all your raw data on Amazon S3, and then use a variety of tools in addition to Redshift or in some cases instead of it - including serverless query engines such as Amazon Athena and Redshift Spectrum. If properly implemented, this can result in significant performance improvements and reduced costs.
Another benefit of the Upsolver approach is the ability to easily store high volumes of historical data on inexpensive object storage (which would have been prohibitively expensive in a data warehouse), with the ability to ‘replay’ a past state of the data as it came in. Upsolver also enables you to easily migrate use cases such as upserts, CDC and streaming analytics from your existing data warehouse to a lake architecture.
Here's an example of building a streaming data pipeline from Kinesis to Athena using Upsolver, without writing code:
Apache-spark based Solutions
If you prefer an open-source approach, you can build your ETL pipeline using Apache Spark, which is probably the most popular open-source framework for transforming semi-structured data streams. You can run Spark on Amazon EMR clusters, or use various managed services including Databricks and Glue ETL by Amazon.
While Spark and Spark-based solutions will allow you to perform ETL on streaming data and write the results to Redshift, the major drawback is that using these services is not as hands-free as the more 'productized' approach. Data transformation in Spark will typically require a larger engineering effort for orchestration, ETL coding, data management and more and you will not get the same level of self-service and ease of use that you would get from Alooma or Upsolver.
You can read more about running Apache Spark on Amazon in the AWS website, or get a more in-depth view of the pros and cons of Spark as a framework in this Quora post written by Upsolver CTO Yoni Iny.
Alternatives for Application Data
A different use case for Alooma was consolidating data from multiple apps into Redshift, usually with the goal of building a marketing data warehouse. Alooma offered dozens of built-in integrations to many popular apps such as Salesforce, Marketo, Mixpanel and more.
There are many tools that provide similar functionality for this use case, each with its own strengths, weaknesses and set of capabilities. We'll include a list below, but generally it's best to keep in mind that these tools will usually be used to facilitate a Redshift data warehouse rather than an open data lake architecture. If you're working with very large volumes of data this approach can be costly and cumbersome, as we've covered in our previous article about scaling your data infrastructure.
List of ETL tools that support Redshift as a destination for app data:
Want to talk to one of our solutions architects about migrating from Alooma? Schedule a quick chat right here. If you want to learn more about data and ETL architecture, check out our guide to streaming data architecture or recent webinar on ETL for Amazon Athena.