Implement a data lake using AWS S3, for security research, monitoring and customer-facing analytics
11 data streams of network logs. Started at dozens of GBs per month and growing rapidly to hundreds of TBs a month
Kinesis, Athena, ElasticSearch
Meta Networks is a fast-growing startup that’s reinventing the secure enterprise network for the cloud age. Its Network-as-a-Service allows businesses to rapidly connect people, applications, clouds and sites, and secure them with a software-defined perimeter.
The Meta NaaS collects large volumes of data in real-time: DNS, traffic, IP addresses, API access and more - a single user visiting a single a website can produce hundreds of new data points. As its customer base and operations grew, the team realized they needed a scalable way to manage, store and structure the data that was coming in, as well as provide analytics and reporting to its customers.
Building a Data Lake without Drowning in Technical Complexity
As a networking solution provider, it was clear to Meta Networks that they would need a solution for big data management and integration. At the outset, they had three major goals in mind:
- Monitoring - security is a key consideration for Meta’s customers, and the ability to monitor all aspects of their network is crucial for ensuring its integrity - creating a pressing need to deliver comprehensive, real-time network usage information
- Internal research to detect cyber threats, improve the Meta NaaS and provide a stronger offering to the company’s customer base
- Customer-facing analytics and reporting on network usage and security stats
To support the company’s rapid growth, the Meta team realized they would need a scalable and cost-effective solution for working with high volumes of streaming data using Amazon S3 and Kinesis .
As a lean startup with limited data engineering resources, Meta preferred to avoid the complexity of an enterprise data warehouse or open-source tools such as Spark and MapReduce: “I think data warehouses aren’t really relevant anymore”, says Alon, “I would much rather build a data lake, store all the data, and figure out exactly what I’m going to do with it later”.
- Ability to scale massively and quickly with each new customer Meta onboards
- Ensure data is stored on S3 without data loss or duplications
- Enable self-service handling of streaming data without recruiting big data engineers
- Seamless connection to Athena to enable researchers to ask and answer questions using SQL
- Seamless connection to ElasticSearch for monitoring purposes
- Flexible API that could be integrated into Meta Networks’ product
“We needed a platform that could handle seriously large volumes of data in real-time, and that our developers - not all of whom come from a big data background - would enjoy using.”
"With Upsolver, we had an operational data lake with demonstrable value to our customers in three weeks."
- Alon Horowitz, Co-founder and VP R&D, Meta Networks
From Evaluation to Production in Under a Month
average query response time
of streaming data processed daily
maintaining the entire data lake infrastructure
The Meta team started playing around with Upsolver as a means to optimize the ingestion of data into its S3 data lake; however, they found that Upsolver enabled them to move forward quickly and independently, and it soon became an integral part of the company’s big data architecture, simplifying and improving the entire cycle of transforming raw data into valuable information for Meta Networks and its customers. It took a single developer less than three weeks to implement Upsolver and fully integrate it within Meta’s infrastructure.
“With Upsolver, we had an operational data lake with demonstrable value to our customers in three weeks. Doing the same ourselves with open-source solutions would have taken us four months.”
Upsolver is being used to join multiple data streams in-flight - combining raw data from several APIs in order to present enriched information to Meta Networks customers, who now have access to actionable reporting about activity on their network. This is achieved at very low latencies of around 1 second, despite the massive amounts of data being processed.
Today, the same developer continues to manage Meta’s growing data lake through Upsolver, adding new capabilities in a matter of hours - which allows the company to continue developing its NaaS platform and rapidly introduce new analytical features that will add more value to its customers.
Upsolver enabled Meta’s S3 data lake to be up and running in three weeks, rather than the months it would have taken using open-source alternatives
By using a self-service, managed solution, Meta Networks saved tens of thousands of dollars in developer hours for implementation and ongoing maintenance