The Data Lake Platform Build a scalable data lake on any cloud. No data engineering required
Upsolver’s Data Lake Platform takes the complexity out of streaming data integration, management and preparation on any cloud data lake - AWS, Azure or Google Cloud.
Effortlessly stream data to and from your S3 data lake
Upsolver streamlines data integration between all major input and output platforms and your existing data lake.
- Get your data streams from Kafka / Kinesis to S3 with no data loss or duplications and at any scale
- Connect to an existing data lake or bucket on S3
- Easily deliver columnar data to SQL engines like Athena, Impala and more
- Data schema is managed automatically (schema-on-read)
- Output connectors to: AWS S3, AWS Redshift, AWS Athena, Impala, Presto, Elasticsearch
- Multiple input and output formats: JSON, CSV/TSV, Avro, Parquet, Protobuf
Automated file management on AWS S3
Upsolver automatically handles the underlying file management and optimization on AWS S3:
- Improve S3 performance by compacting small files together and splitting big files
- Improve S3 data freshness using a managed hot-cold architecture
- Partition data by actual event time and handle late events
- Seamlessly replay data from S3 to incorporate additional fields
- Manage tables and partitions using AWS Glue Data Catalog or Hive
Get your streaming data ready for analysis, in-flight
Upsolver makes it incredibly easy to transform streaming data into analytics using an intuitive drag & drop interface. Finally, you’ll have all the analytical capabilities of a database, at the scale of a data lake.
- Blazingly fast in-memory data processing at any scale
- Pre-aggregate your data in-flight
- Enrich data with hundreds of out-of-the-box functions or Python snippets
- Work with nested JSON without manual wrangling
- Join between data streams and enrich with historical data in-flight
Upsolver can be deployed either on our VPC or your VPC.
Your data always remains private - it resides ONLY on your S3.