Apache Iceberg is an open table format for huge analytic datasets. It manages large-scale data stored in files (like Parquet or ORC) on distributed storage systems (like HDFS, S3, or GCS).

In short, Iceberg turns your data lake into a data warehouse-like system with strong guarantees and high performance.

Key Features:

  • Schema evolution and partition evolution without rewriting data.
  • Hidden partitioning: simplifies queries by abstracting partition logic.
  • ACID transactions for consistency during concurrent reads/writes.
  • Time travel and rollback via metadata snapshots.
  • Optimized for engines like Spark, Trino, Flink, Dremio, and Snowflake.