Iceberg V3 Spec Summary

Iceberg format version

When the format version is upgraded?

https://iceberg.apache.org/spec/#format-versioning

The format version number is incremented when new features are added that will break forward-compatibility---that is, when older readers would not read newer table features correctly. Tables may continue to be written with an older version of the spec to ensure compatibility by not using features that are not yet implemented by processing engines.

  • 古い reader は newer version を正しく読むことができない

History

Spec に対する Behavioral or functional changes は vote によって決まるが、これがそのタイミングによって format version X に入るかが決まる仕組み

V3 Spec references

  • V2 の様子を見ると V3 Spec が Finalize するのはまだ時間がかかりそう
  • Document だと:

New data types: nanosecond timestamp(tz), unknown, variant, geometry, geography
Default value support for columns
Multi-argument transforms for partitioning and sorting
Row Lineage tracking
Binary deletion vectors

Supported status

Specs: Iceberg V3 Spec Milestone · apache/iceberg

CategoryFeatureSpecImpl
Data typesVariant typesYES (#10392, Proposal)YES Parquet ??? ORC ??? Avro ???
Type promotionYES
Geospatial typesYES
Unknown typesYES
Nano seconds Timestamp TZYESYES (#???)
Row lineageRow trackingYES (#???)YES (#???)
OthersMulti-args transformations for Partitons/Sorting

Data types

Improved Deletes

Row Lineage

Misc.

Released features for V3

1.8.0

  • Spec
    • Add Deletion vectors to the table specification (#11240)
    • Add Variant Type (#10831)
    • Add EnableRowLineage metadata update (#12050)
    • Add added-rows field to Snapshot (#11976)
    • Reassign row lineage field IDs (#12100)
  • API
    • Define Variant Data type (#11324)
    • Add UnknownType (#12012)
  • Core:
    • Support for reading Deletion Vectors (#11481)
    • Support for writing Deletion Vectors (#11476)
    • Implement table metadata fields for row lineage and enable operations to populate these fields (#11948)
  • AWS:
    • Support for writing Deletion Vectors (#11476)
  • Spark:
    • Support for reading default values for Parquet (#11803)
    • Support for writing Deletion Vectors for V3 tables (#11561)

1.7.0

  • API
    • Add default value APIs and Avro implementation (#9502)
    • Add compatibility checks for Schemas with default values (#11434)
    • Implement types timestamp_ns and timestamptz_ns (#9008)
    • Add addNonDefaultSpec to UpdatePartitionSpec to not set the new partition spec as default (#10736)

For study