Iceberg V3 Spec Summary
Iceberg format version
When the format version is upgraded?
https://iceberg.apache.org/spec/#format-versioning
The format version number is incremented when new features are added that will break forward-compatibility---that is, when older readers would not read newer table features correctly. Tables may continue to be written with an older version of the spec to ensure compatibility by not using features that are not yet implemented by processing engines.
- 古い reader は newer version を正しく読むことができない
History
- 2023 Aug. 31 - V2 default version (since Iceberg 1.4.0)
- 2021 Aug.3 - [V3 Spec and impl]
- 2021 July 28 - Aug. 3 [V2 spec finalized] Iceberg format version 2 の Spec が Finalized (vote で終了 ref: https://lists.apache.org/thread/ws2gg52d124p7bx9jgrn3kctrtfgtltp)
- Iceberg 0.12.0 announcement
- 2020 Nov. 10 - Format V2 support started (Iceberg 0.10.0)
- 2020 May 20 - The Iceberg project graduated on 2020-05-20
- 2020 Apr あたり (はっきりとはわからないが、0.8.0 と 0.7.0 の間で、Row Delete を v2 とするかといったような会話は見つけられた)
- Prepare metadata writers for format v2 by rdblue · Pull Request #903 · apache/iceberg
- Iceberg 0.8.0 incub と Iceberg 0.7.0 incub の間で v2 が追加される
- https://github.com/apache/iceberg/blob/8c05a2f5f1c8b111c049d43cf15cd8a51920dda1/site/docs/spec.md -> 0.8.0 (May 7, 2020)
- https://github.com/apache/iceberg/blob/9c81babac65351f7aa21dd878f01c5c81ae304af/site/docs/spec.md -> 0.7.0 (Oct 26, 2019)
- 関連する Spec: https://github.com/apache/iceberg/pull/912 (Spec. が追加されたのは 2020 Apr)
Spec に対する Behavioral or functional changes は vote によって決まるが、これがそのタイミングによって format version X に入るかが決まる仕組み
V3 Spec references
- V2 の様子を見ると V3 Spec が Finalize するのはまだ時間がかかりそう
- Document だと:
New data types: nanosecond timestamp(tz), unknown, variant, geometry, geography
Default value support for columns
Multi-argument transforms for partitioning and sorting
Row Lineage tracking
Binary deletion vectors
- Meetup notes だと
- Data types: Variant type, Geo types, Timestamp_TZ, unknown
- Improved Deletes: Deletion vectors, Optimized tracking (???), Compact representaion (???)
- Rown Lineage: Row tracking, Incremental processing (???)
- V3 project overall in Github: https://github.com/orgs/apache/projects/377/views/1
- V3 class creation: https://github.com/apache/iceberg/issues/10747
- Milestone Spec: https://github.com/apache/iceberg/milestone/42?closed=1
Supported status
Specs: Iceberg V3 Spec Milestone · apache/iceberg
- Variant Data Type Support · Issue #10392 · apache/iceberg (Opened)
- Geospatial Support · Issue #10260 · apache/iceberg (Opened)
- Improve Position Deletes in V3 · Issue #11122 · apache/iceberg (Opened)
- Spec: Support geo type by szehon-ho · Pull Request #10981 · apache/iceberg
- Restrict generated locations to URI syntax · Issue #10168 · apache/iceberg -> Non stale
- Row Lineage for V3 · Issue #11129 · apache/iceberg
- Spec: Add v3 types and type promotion by rdblue · Pull Request #10955 · apache/iceberg
Category | Feature | Spec | Impl |
---|---|---|---|
Data types | Variant types | YES (#10392, Proposal) | YES Parquet ??? ORC ??? Avro ??? |
Type promotion | YES | ||
Geospatial types | YES | ||
Unknown types | YES | ||
Nano seconds Timestamp TZ | YES | YES (#???) | |
Row lineage | Row tracking | YES (#???) | YES (#???) |
Others | Multi-args transformations for Partitons/Sorting |
Data types
- Variant types:
- Geo types:
- Unknown types:
- Nano seconds Timestamp_TZ:
- Multi-args transformations:
- Spec: Multi-arg transform support in Iceberg
- Sub-impls: spec: Remove
source-ids
forV{1,2}
tables by Fokko · Pull Request #12161 · apache/iceberg -> 多分 bug fix
Improved Deletes
- Deletion vectors:
Row Lineage
- Spec: Row Lineage Proposal
- Impl:
- Sub-impls:
Misc.
- Default value:
- Metadata variant shreading:
- V3 encryption: [Priority 2] Spec v3: Encryption • apache
Released features for V3
1.8.0
- Spec
- API
- Core:
- Support for reading Deletion Vectors (#11481)
- Support for writing Deletion Vectors (#11476)
- Implement table metadata fields for row lineage and enable operations to populate these fields (#11948)
- AWS:
- Support for writing Deletion Vectors (#11476)
- Spark:
- Support for reading default values for Parquet (#11803)
- Support for writing Deletion Vectors for V3 tables (#11561)
1.7.0
- API
For study
- V3 and REST Spec overview: https://www.youtube.com/watch?v=0C8CLOzNVEU
- Variant type: https://www.youtube.com/watch?v=MKqllL_D-fs
- Deletion Vectors: https://www.youtube.com/watch?v=vjgJridq8G0