Mobile user acquisition, real-time bidding
Creating real-time profiles of users for predictive decisioning; building datasets for machine learning
TBs of new data generated daily from multiple streams
S3, Athena, EMR, Kinesis Firehose
Bigabid is an innovative mobile marketing and real-time bidding company that empowers its partners to scale their mobile user acquisition efforts while lowering advertising costs through CPA and AI-based optimization.
In order to maintain a high level of performance in the competitive app advertising market, Bigabid needed to introduce real-time user profiling to its algorithmic decision engine. This case study will demonstrate how the company built a high-performance real-time architecture with minimal data engineering using Upsolver, S3 and Amazon Athena.
The Technical Challenge
From Batch to Real-time Processing
The core value Bigabid delivers to its customers (mobile apps developers) is the ability acquire new users at a low cost.. Bigabid builds a profile of each user, taking hundreds of data points extracted from several streams into account; its proprietary machine learning algorithms can then predict which user is likely to download an app or make in-app purchases, and decide when and how to bid on advertising space accordingly.
These user profiles were being updated 2-3 items a day, in a batch process. However, Bigabid’s Co-founder and CTO Amit Attias wanted to replace this system with real-time stream processing in which the user profiles would be updated based on their most recent actions - as relying on fresh data could significantly improve the results delivered by Bigabid’s predictive algorithm.
- Create a real-time aggregate view of users which would include data from the last 180 days, and use it to deliver more accurate bidding decisions
- Clean, enrich and join data from several streams: bids, impressions, clicks, and several third party data providers;
- Support for advanced features such as Count of Sessions per User, which would be difficult to achieve in a traditional database architecture
- Improve data freshness from 8 hours to <1 minutes.
Why Bigabid Chose Upsolver
After evaluating the project scope, Amit concluded that developing this type of real-time architecture in-house using Spark Streaming and a database such as Aerospike, Redis or DynamoDB, would require 6 months of engineering work, with 4 engineers devoted to the task.
This realization lead Amit to take a closer look at Upsolver's stream processing platform, which promised to deliver the same results at a fraction of the time and cost. With the help of the Upsolver team, Bigabid managed to build a working proof of concept for their real-time pipeline in hours.
Launched and maintained by a single user
Real-time processing instead of daily batches
Improved decisioning with fresh data
Data from multiple streams is ingested via Kinesis Firehose into Bigabid’s S3 data lake. Upsolver is then used for ETL - combining, cleaning and enriching data from multiple streams to achieve a more complete user profile.
For example, one stream might contain campaigns-per-user, while another details apps per campaign. Using Upsolver, Bigabid can easily see all the campaigns that a certain user is included in, and then pull that data in real time.
Bigabid use an Upsolver Materialized View to build an aggregated view of each user, with data collected by the Bigabid platform as well as third party sources. These views were previously updated 2-3 times a day in a batch process, and are now produced in real time. Fresher data translates into more accurate decisions and better performance for Bigabid’s advertising partners.
Data Science / Research:
Upsolver, Amazon S3 and Amazon Athena are used for ad-hoc querying by Bigabid’s Data Science team. Insights are used to improve machine learning models that power the company’s predictive engine.
While Bigabid were already using Amazon Redshift for their business intelligence reporting, Athena has proven to be the better choice for their data science team. It enables them to query massive amounts of data at a granular level (user level) to better understand user behaviour. Storing TBs per day in Redshift would be expensive and require resources to manage a large Redshift cluster. Athena enables to query data on S3, fast, without maintenance and at a low cost.
Upsolver is used to prepare data for Athena. Instead of coding ETLs, Upsolver visual interface enabled to easily flatten the data into tables, perform data validations, create enrichments (geo attributes from IP, device attributes from user agent) and automate the process of creating and editing Athena tables. Upsolver also optimizes Athena’s storage layer (Parquet format, compaction of small files) so queries could run much faster. .
The same data is also used in a separate pipeline that sends data via Amazon EMR to Apache Spark, which is used to perform complex calculations that could not be done in SQL.
“Choosing Upsolver saved me six months of development work by four engineers, which has had a dramatic impact on our bottom line.”
Working with fresher, more accurate data has played a major role in Bigabid’s ability to continuously provide exceptional performance for its customers. The real-time architecture, powered by Upsolver, is now an essential component in Bigabid’s platform.
“Upsolver plays a crucial part in our core data infrastructure, and the team has proven to be a reliable partner that’s been committed to our success from day one.”
Amit Attias, Co-founder and CTO, Bigabid