Proofpoint: Building an Agile and Scalable Streaming Data Infrastructure over S3

CUSTOMER STORY

INDUSTRY: IT, Network-as-a-service
USE CASE: Implement a data lake using AWS S3, for security research, monitoring, and customer-facing analytics
DATA: 11 data streams of network logs. Started at dozens of GBs per month and growing rapidly to hundreds of TBs a month
INTEGRATIONS: Kinesis, Athena, ElasticSearch
1 second
average query response time
1 developer
maintaining the entire data lake infrastructure

Meta Networks, acquired by Proofpoint, is a fast-growing startup that’s reinventing the secure enterprise network for the cloud age. Its Network-as-a-Service (NaaS) enables businesses to rapidly connect people, applications, clouds, and sites, and secure them with a software-defined perimeter.

The Meta NaaS collects large volumes of data in real-time: DNS, traffic, IP addresses, API access, and more – a single user visiting a single a website can produce hundreds of new data points. As its customer base and operations grew, the team realized they needed a scalable way to manage, store, and structure the data that was coming in, as well as provide analytics and reporting to its customers.

The Goal

As a networking solution provider, it was clear to Meta Networks that they would need a solution for big data management and integration. At the outset, they had three major goals in mind:

  • Monitoring – security is a key consideration for Meta’s customers, and the ability to monitor all aspects of their network is crucial for ensuring its integrity – creating a pressing need to deliver comprehensive, real-time network usage information
  • Internal research to detect cyber threats, improve the Meta NaaS, and provide a stronger offering to the company’s customer base
  • Customer-facing analytics and reporting on network usage and security stats

To support the company’s rapid growth, the Meta team realized they would need a scalable and cost-effective solution for working with high volumes of streaming data using Amazon S3 and Kinesis.

As a lean startup with limited data engineering resources, Meta preferred to avoid the complexity of an enterprise data warehouse or open-source tools such as Spark and MapReduce.

 

1 second
average query response time
1 developer
maintaining the entire data lake infrastructure
quote icon With Upsolver, we had an operational data lake with demonstrable value to our customers in three weeks. Doing the same ourselves with open-source solutions would have taken us four months.

The Solution

The Meta team started experimenting with Upsolver as a means to optimize the ingestion of data into its S3 data lake. They wound up discovering that Upsolver would enable them to move forward quickly and independently. It soon became an integral part of the company’s big data architecture; Upsolver’s declarative pipelines simplified and improved the entire cycle of transforming raw data into valuable information for Meta Networks and its customers. It took a single developer less than three weeks to implement Upsolver and fully integrate it within Meta’s infrastructure.

quote icon With Upsolver, we had an operational data lake with demonstrable value to our customers in three weeks.

The Results

Upsolver is being used to join multiple data streams in-flight – combining raw data from several APIs to present enriched information to Meta Networks customers, who now have access to actionable reporting about activity on their network. This is achieved at very low latencies of around 1 second, despite the massive amounts of data being processed.

Today, the same developer continues to manage Meta’s growing data lake through Upsolver, adding new capabilities in a matter of hours – which enables the company to continue developing its NaaS platform and rapidly introduce new analytical features that add more value to its customers.

Read the case study on the AWS Big Data Blog.

Templates

All Templates

Explore our expert-made templates & start with the right one for you.