Proofpoint: Building an Agile and Scalable Streaming Data Infrastructure over S3

CUSTOMER STORY

INDUSTRY: IT, Network-as-a-service
USE CASE: Implement a data lake using AWS S3, for security research, monitoring and customer-facing analytics
DATA: 11 data streams of network logs. Started at dozens of GBs per month and growing rapidly to hundreds of TBs a month
INTEGRATIONS: Kinesis, Athena, ElasticSearch
1 second
average query response time
1 developer
maintaining the entire data lake infrastructure

Meta Networks, acquired by Proofpoint, is a fast-growing startup that’s reinventing the secure enterprise network for the cloud age. Its Network-as-a-Service allows businesses to rapidly connect people, applications, clouds and sites, and secure them with a software-defined perimeter.

The Meta NaaS collects large volumes of data in real-time: DNS, traffic, IP addresses, API access and more – a single user visiting a single a website can produce hundreds of new data points. As its customer base and operations grew, the team realized they needed a scalable way to manage, store and structure the data that was coming in, as well as provide analytics and reporting to its customers.

The Goal

As a networking solution provider, it was clear to Meta Networks that they would need a solution for big data management and integration. At the outset, they had three major goals in mind:

  • Monitoring – security is a key consideration for Meta’s customers, and the ability to monitor all aspects of their network is crucial for ensuring its integrity – creating a pressing need to deliver comprehensive, real-time network usage information
  • Internal research to detect cyber threats, improve the Meta NaaS and provide a stronger offering to the company’s customer base
  • Customer-facing analytics and reporting on network usage and security stats

To support the company’s rapid growth, the Meta team realized they would need a scalable and cost-effective solution for working with high volumes of streaming data using Amazon S3 and Kinesis .

As a lean startup with limited data engineering resources, Meta preferred to avoid the complexity of an enterprise data warehouse or open-source tools such as Spark and MapReduce.

 

1 second
average query response time
1 developer
maintaining the entire data lake infrastructure
quote icon With Upsolver, we had an operational data lake with demonstrable value to our customers in three weeks. Doing the same ourselves with open-source solutions would have taken us four months.

The Solution

The Meta team started playing around with Upsolver as a means to optimize the ingestion of data into its S3 data lake; however, they found that Upsolver enabled them to move forward quickly and independently, and it soon became an integral part of the company’s big data architecture, simplifying and improving the entire cycle of transforming raw data into valuable information for Meta Networks and its customers. It took a single developer less than three weeks to implement Upsolver and fully integrate it within Meta’s infrastructure.

quote icon With Upsolver, we had an operational data lake with demonstrable value to our customers in three weeks.

The Results

Upsolver is being used to join multiple data streams in-flight – combining raw data from several APIs in order to present enriched information to Meta Networks customers, who now have access to actionable reporting about activity on their network. This is achieved at very low latencies of around 1 second, despite the massive amounts of data being processed.

Today, the same developer continues to manage Meta’s growing data lake through Upsolver, adding new capabilities in a matter of hours – which allows the company to continue developing its NaaS platform and rapidly introduce new analytical features that will add more value to its customers.

Read the case study on the AWS Big Data Blog.