AppsFlyer: Cutting compute costs 75% by moving from a data warehouse to Upsolver + Amazon Athena

CUSTOMER STORY

INDUSTRY: Mobile ad attribution and marketing analytics
SOLUTION: Real-time and lower-cost analytics; fast time to value; automated and optimized ETL
TECHNICAL USE CASE: Mobile ad attribution and marketing analytics
75%
reduction in compute costs
120 billion
events processed every day
100s
engineering hours saved

AppsFlyer is a mobile ad attribution and marketing analytics platform that enables advertisers and marketing professionals to analyze which campaigns drive the best results while preserving customer privacy.  AppsFlyer’s data-driven approach gives brands a holistic view of every user journey across platforms, channels, and devices. 

An open tech stack is essential to AppsFlyer’s mission to provide customers with accurate data, privacy, security, and innovation.  But the scale of the business is challenging:

  • 120 billion events processed every day
  • AppsFlyer SDK installed in more than 95% of mobile devices worldwide
  • Almost 90,000 advertiser customers, from startups to the world’s top brands
  • More than 8,000 technology partners

Soaring data costs in a black box environment

AppsFlyer uses a large data lake for support purposes.  Many tens of AppsFlyer services generate raw data that’s stored in the lake – amounting to petabytes per day of data streams.  A large number of AppsFlyer staff – in support, product, and data science – work with the data in the lake daily, creating ad hoc queries to isolate issues, spot anomalies that could indicate fraud, and identify impactful KPIs. AppsFlyer stores massive logs of requests/responses in BigQuery on Google Cloud Platform for ad hoc use cases.  

AppsFlyer’s ETL process, written in-house in Clojure, added further levels of complexity.  It was all hand-coded, and the staff had to manage the software life-cycle using a CLI – deployment, monitoring, code changes – everything.  Tuning became a Sisyphean task.  The company was scaling quickly, but their home-grown system was not sustainable, neither in terms of infrastructure reliability nor TCO. 

75%
reduction in compute costs
120 billion
events processed every day
100s
engineering hours saved

“Maintaining that in-house ETL created a huge engineering overhead,” says Avner Livne, AppsFlyer Real-Time Application (RTA) Groups Lead.  “Data transformation was very hard.  Schema changes were very hard.  While it was functional, everything required a lot of attention and engineering.”  

The cost of a single use case was $3,000 a day on BigQuery – $1.1 million annually. This led to a growing backlog in delivering the analytics required by the company’s data consumers.  AppsFlyer urgently wanted to move out of BigQuery and GCP to cut costs, reduce complexity, and speed its time to innovation.

Rapid deployment free of turbulence

To address these issues, AppsFlyer conducted a proof of concept (POC) with Upsolver.  In less than a month, they were able to:

  • integrate the entire solution in AppsFlyer’s environment;
  • create a data lake on Amazon S3; and
  • optimize the data lake by automatically implementing best practices to meet stringent performance requirements.

In just a fraction of the anticipated timeline, Upsolver enabled AppsFlyer to:

  • collect log data at scale
  • manage its S3 data lake
  • preprocess the data
  • optimize it for Athena.
quote icon “I told the Upsolver guys that I really don't need them anymore because everything just works. The adoption was really fast.”

AppsFlyer used Upsolver’s visual IDE and SQL to make the data query ready via integration with AWS Glue Data Catalog, after all the required transformations were carried out on data in S3. 

With some minor guidance from the Upsolver team, AppsFlyer’s data practitioners executed the entire process. “The stability of the Upsolver flow is very impressive,” says Livne.  “I was talking one day to the Upsolver guys, and I told them that I really don’t need them anymore because everything just works.  The adoption was really fast, since everything is inside the UI.”  A project that was estimated to last months instead only took a couple of weeks to complete. 

AppsFlyer sees ROI immediately after liftoff

Avner began seeing value right away.  “The first thing we noticed was a significant cost reduction,” he says.  “The Upsolver engine was far more efficient than our in-house ETL for compute cost.  The second thing we noticed immediately was the reduction in engineer work. We saved hundreds of data engineering hours compared to what it took to support the in-house ETL.”  In particular, Upsolver’s native upsert capability proved valuable.  “To replay the data in the old ETL was really risky and error-prone.  In Upsolver the replayability was easy and reliable, and saved us many hours and a lot of pain.”

Blue skies ahead for AppsFlyer 

Since going live, Avner notes, Upsolver’s easy visual IDE has led to unanticipated dividends.  “After the adoption of Upsolver, we started digging into the data, since it was no longer a petabyte-scale black box.  We actually saw places where we can stream log records and manipulate the data to reduce the size of the tables that we create.  The cost savings in Athena queries alone has been huge.  We still have ad hoc queries but you don’t need a developer to optimize for queries.  The support team can make changes or create new queries themselves wherever they’re looking for specific anomalies.  Upsolver has just taken a huge weight off of all our shoulders.”

Today AppsFlyer uses Upsolver to manage the complete flow of support data, from pre-processing to preparation for analytics.  In addition to vast savings in time and cost, Upsolver has accelerated time to value, thanks to its continuous data transformation.  “Everything is automated, managed,” says Avner.  “There’s no need to worry about compaction or duplicates or anything.  Everything works very, very, very smoothly.” 

Nor does Avner or the team lose sleep over optimizing performance of their new support data lake.  “Monitoring overhead is significantly reduced,” he says.  “It’s like we have a guardian angel on our shoulder monitoring the stream and alerting us if anything goes wrong.  It is very reassuring.”

quote icon “There’s a significant cost reduction. The Upsolver engine is far more efficient than our in-house ETL for compute cost.”