Watch this webinar for a technical deep dive where we show you how to build and operate an Iceberg-based lakehouse.
You’ll start by learning how to create and query Iceberg tables using Apache Spark. Then, you’ll explore how data is organized in S3 and what properties you should tune for best performance.
Lastly, you’ll learn how Upsolver can help you get started even faster and automate the many maintenance and optimization tasks required for a high-performing lakehouse.
Watch to learn about:
-
- Spark in Lakehouse Architecture: Learn how Apache Iceberg integrates with Apache Spark, emphasizing its role in ETL and reducing the cost of data transformation and storage compared to a traditional data warehouse.
- Simple and Reliable Ingestion with Upsolver: Examine how Upsolver simplifies the ingestion of operational data into Iceberg tables, highlighting its no-code and ZeroETL approaches for efficient data movement.
- Impacts of Data Management on Query Performance: Explore the impacts of small files, fast and continuous updates/deletes, and manifest file churn on query performance. Compare how data is managed and optimized between Spark and Upsolver, including how each handles schema evolution and transactional concurrency.
- Best Practices for Implementing Lakehouse Architectures: Discuss best practices for deploying and managing a lakehouse architecture using Spark and Upsolver, with insights into optimizing storage, improving query speeds, and ensuring high quality, reliable data.
Presented by: