Try Iceberg Table Analyzer

Use our free and open source Iceberg Table Analyzer to quickly analyze your existing lakehouse and identify problematic Iceberg tables.

Find Iceberg Tables for Optimization

Quickly find tables in your lakehouse that need compaction to reduce storage and increase data scans

Full Scan Overhead

This measures the time it takes to read the entire table. Improving this metric can lead to faster queries, especially for those that need to process the whole dataset.

Worst Partition Scan Overhead

This is the time it takes to scan the worst-performing partition of the table. Partitions with a high scan overhead can be a bottleneck for queries that target specific segments of data.

Total File Count

This indicates the number of files in the table. A high count can lead to metadata overhead and slower query planning because the system must keep track of more files.

Worst Partition File Count

This shows the number of files in the most bloated partition. A large number of files in a partition can cause slower performance due to increased overhead in managing those files during query execution.

Average File Size

It’s the average size of the files in the table. Small files can lead to a “small file problem,” where the overhead of opening and closing files can dominate query execution time.

Total Table Size

This represents the total amount of data stored in the table. While this metric doesn’t directly affect performance, it gives a sense of the scale of the data and potential storage costs.

Largest Partition Size

This is the size of the biggest partition in the table. A disproportionately large partition can result in skewed performance, as it may take much longer to process than other partitions.

Empowering the next generation
of data developers

From startups to enterprises

Templates

All Templates

Explore our expert-made templates & start with the right one for you.