Upsolver + Proofpoint

Building an Agile and Scalable
Streaming Data Infrastructure over S3




IT, Network-as-a-service

Use case

Implement a data lake using AWS S3, for security research, monitoring and customer-facing analytics


11 data streams of network logs. Started at dozens of GBs per month and growing rapidly to hundreds of TBs a month


Kinesis, Athena, ElasticSearch

Feature Highlights

The Backstory

Meta Networks, acquired by Proofpoint, is a fast-growing startup that’s reinventing the secure enterprise network for the cloud age. Its Network-as-a-Service allows businesses to rapidly connect people, applications, clouds and sites, and secure them with a software-defined perimeter.

The Meta NaaS collects large volumes of data in real-time: DNS, traffic, IP addresses, API access and more – a single user visiting a single a website can produce hundreds of new data points. As its customer base and operations grew, the team realized they needed a scalable way to manage, store and structure the data that was coming in, as well as provide analytics and reporting to its customers.

The Goal

Building a Data Lake without Drowning in Technical Complexity

As a networking solution provider, it was clear to Meta Networks that they would need a solution for big data management and integration. At the outset, they had three major goals in mind:

  • Monitoring – security is a key consideration for Meta’s customers, and the ability to monitor all aspects of their network is crucial for ensuring its integrity – creating a pressing need to deliver comprehensive, real-time network usage information
  • Internal research to detect cyber threats, improve the Meta NaaS and provide a stronger offering to the company’s customer base
  • Customer-facing analytics and reporting on network usage and security stats

To support the company’s rapid growth, the Meta team realized they would need a scalable and cost-effective solution for working with high volumes of streaming data using Amazon S3 and Kinesis .

As a lean startup with limited data engineering resources, Meta preferred to avoid the complexity of an enterprise data warehouse or open-source tools such as Spark and MapReduce.

The Requirements

  • Ability to scale massively and quickly with each new customer Meta onboards
  • Ensure data is stored on S3 without data loss or duplications
  • Enable self-service handling of streaming data without recruiting big data engineers
  • Seamless connection to Athena to enable researchers to ask and answer questions using SQL
  • Seamless connection to ElasticSearch for monitoring purposes
  • Flexible API that could be integrated into Meta Networks’ product

The Solution

From Evaluation to Production in Under a Month

1 second average query response time

Terabytes of streaming data processed daily

1 developer maintaining the entire data lake infrastructure

The Meta team started playing around with Upsolver as a means to optimize the ingestion of data into its S3 data lake; however, they found that Upsolver enabled them to move forward quickly and independently, and it soon became an integral part of the company’s big data architecture, simplifying and improving the entire cycle of transforming raw data into valuable information for Meta Networks and its customers. It took a single developer less than three weeks to implement Upsolver and fully integrate it within Meta’s infrastructure.


With Upsolver, we had an operational data lake with demonstrable value to our customers in three weeks. Doing the same ourselves with open-source solutions would have taken us four months.

Alon Horowitz, Co-founder and VP R&D, Meta Networks

Upsolver is the shortest path from streaming to usable data.

The Results

Upsolver is being used to join multiple data streams in-flight – combining raw data from several APIs in order to present enriched information to Meta Networks customers, who now have access to actionable reporting about activity on their network. This is achieved at very low latencies of around 1 second, despite the massive amounts of data being processed.

Today, the same developer continues to manage Meta’s growing data lake through Upsolver, adding new capabilities in a matter of hours – which allows the company to continue developing its NaaS platform and rapidly introduce new analytical features that will add more value to its customers.

With Upsolver, we had an operational data lake with demonstrable value to our customers in three weeks

Alon Horowitz, Co-founder and VP R&D, Meta Networks