Iceberg metadata naming convention

The following table shows the summary of the Iceberg’s file name convention:

File nameConventionNote
metadata file<newVersion (%05d)>-<UUID>.<fileExtension>The metadata’s UUID doesn’t match the commitUUID (= snapshot ID) related to manifest lists and manifest files. The file extension includes .gz or .json
manifest listsnap-<snapshotId>-<attempt.incrementAndGet()>-<commitUUID>.avroThe commitUUID (= snapshot ID) matches that of manifest files
manifest file<commitUUID>-m<manifestCount.getAndIncrement>.avron/a

Details

When you operate something on your Iceberg table via a distributed computing engine such as Apache Spark, Trino, Flink etc, you might see the following files are generated:

/iceberg-warehouse/db/table/metadata
  ├── 00000-25005c05-834d-4650-a529-410eabcb12d6.metadata.json // metadata file
  ├── 00001-65f87f03-6d7e-41be-8dce-c813ffe70937.metadata.json 
  ├── snap-1081867561747206961-1-fc6cefef-bfb2-4c11-a105-205785bcb5ac.avro // manifest list
  ├── fc6cefef-bfb2-4c11-a105-205785bcb5ac-m0.avro // manifest file

That is the situation after creating an Iceberg table (only the metadata file starting from 00000- is created), and then adding a record into the table. Looking at each file name,

  • The UUID in the metadata file name doesn’t match any manifest lists or manifest files
  • The UUID (fc6cefef-bfb2-4c11-a105-205785bcb5ac) in the manifest list name matches the UUID in the manifest file name (fc6cefef-bfb2-4c11-a105-205785bcb5ac)

Let’s see the relevant code in the Iceberg repository.

Code references

Metadata file <newVersion (%05d)>-<UUID>.<fileExtension>

See https://github.com/apache/iceberg/blob/apache-iceberg-1.7.1/core/src/main/java/org/apache/iceberg/BaseMetastoreTableOperations.java#L328

  private String newTableMetadataFilePath(TableMetadata meta, int newVersion) {
    String codecName =
        meta.property(
            TableProperties.METADATA_COMPRESSION, TableProperties.METADATA_COMPRESSION_DEFAULT);
    String fileExtension = TableMetadataParser.getFileExtension(codecName);
    return metadataFileLocation(
        meta,
        String.format(Locale.ROOT, "%05d-%s%s", newVersion, UUID.randomUUID(), fileExtension));
  }

Manifest list: snap-<snapshotId>-<attempt.incrementAndGet()>-<commitUUID>.avro

See https://github.com/apache/iceberg/blob/apache-iceberg-1.7.1/core/src/main/java/org/apache/iceberg/SnapshotProducer.java#L511

import java.util.UUID;

// ...

  protected OutputFile manifestListPath() {
    return ops.io()
        .newOutputFile(
            ops.metadataFileLocation(
                FileFormat.AVRO.addExtension(
                    String.format(
                        Locale.ROOT,
                        "snap-%d-%d-%s",  // <= HERE
                        snapshotId(),
                        attempt.incrementAndGet(),
                        commitUUID))));

Manifest file: <commitUUID>-m<manifestCount.getAndIncrement>.avro

See https://github.com/apache/iceberg/blob/apache-iceberg-1.7.1/core/src/main/java/org/apache/iceberg/SnapshotProducer.java#L527

import java.util.UUID;

// ....

  protected EncryptedOutputFile newManifestOutputFile() {
    String manifestFileLocation =
        ops.metadataFileLocation(
            FileFormat.AVRO.addExtension(commitUUID + "-m" + manifestCount.getAndIncrement())); // <= HERE
    return EncryptingFileIO.combine(ops.io(), ops.encryption())
        .newEncryptingOutputFile(manifestFileLocation);

For manifestCount: See https://github.com/apache/iceberg/blob/apache-iceberg-1.7.1/core/src/main/java/org/apache/iceberg/SnapshotProducer.java#L111

  private final AtomicInteger manifestCount = new AtomicInteger(0);

Appendix

File name convention for Hadoop Catalog

The convention for the Hadoop catalog is different from the other catalogs. When the table metadata is updated, you might see the metadata file names starting from v1, v2, … in your storage:

  ├── v1.metadata.json // metadata file
  ├── v2.metadata.json 
  ├── snap-1081867561747206961-1-fc6cefef-bfb2-4c11-a105-205785bcb5ac.avro // manifest list
  ├── fc6cefef-bfb2-4c11-a105-205785bcb5ac-m0.avro // manifest file
  ...

This is because HadoopTableOperations has the following implementaion (which is different from other catalogs) for this as below (see https://github.com/apache/iceberg/blob/apache-iceberg-1.8.1/core/src/main/java/org/apache/iceberg/hadoop/HadoopTableOperations.java#L259)

  private Path metadataFilePath(int metadataVersion, TableMetadataParser.Codec codec) {
    return metadataPath("v" + metadataVersion + TableMetadataParser.getFileExtension(codec));
  }