Browsi provides an AI-powered adtech solution that helps publishers monetize content by offering ad inventory-as-a-service. The company’s platform automatically optimizes ad placements and layout to ensure users are being served relevant ad content without hurting the overall user experience. The company serves high-traffic publishers such as Hearst, Minute Media, Graham Media Group and Ynet.
Matan Ghuy, Big Data Engineer at Browsi, was tasked with maintaining its data infrastructure. The company had built its data lake infrastructure on ingesting data from Amazon Kinesis via a Lambda function that ensured exactly-once processing, while ETL was handled via a batch process, coded in Spark/Hadoop and running on an Amazon EMR cluster once a day. .
Amazon Athena was used to query the data, and due to the batch latencies, the data in Athena was either up to 24 hours old, or expensive and slow to query as it had not yet been compacted. Additionally, the overall solution was cumbersome and difficult to maintain, and each new ETL pipeline required additional effort from Matan, which prevented him from focusing on other back-end development tasks. When the company began evaluating Upsolver, Matan immediately saw its value as a self-service platform that would replace the manual infrastructure work that was taking up dozens of hours each week.
After a short proof of concept, Upsolver was seamlessly integrated into Browsi’s AWS account and quickly became the company’s main data lake ETL platform.
The company implemented Upsolver to replace both the Lambda architecture used for ingest and the Spark/EMR implementation used to process data, transitioning from batch to stream processing and enabling end-to-end latency (Kinesis -> Athena) of mere minutes.
While the previous implementation was based on manual coding, Upsolver enables Matan to manage all ETL flows from its visual interface and without writing any code.
Events are generated by scripts on publisher websites, which are streamed via Amazon Kinesis Streams. Upsolver ingests the data from Kinesis and writes it to S3 while ensuring partitioning, exactly-once processing, and other data lake best practices are enforced.
From there, the company built its output ETL flows to Amazon Athena, which is used for data science as well as BI reporting via Domo. For internal reporting, Upsolver creates daily aggregations of the data which are processed by a homegrown reporting solution.