Skip to content

Commit

Permalink
Merge branch 'main' of https://github.com/StateFarmIns/iceberg into r…
Browse files Browse the repository at this point in the history
…ewrite_table_path_incremental_fix
  • Loading branch information
barronfuentes committed Feb 4, 2025
2 parents 4cc0820 + da20029 commit 31a79e5
Show file tree
Hide file tree
Showing 2 changed files with 19 additions and 29 deletions.
12 changes: 1 addition & 11 deletions core/src/main/java/org/apache/iceberg/TableMetadata.java
Original file line number Diff line number Diff line change
Expand Up @@ -750,7 +750,7 @@ public TableMetadata buildReplacement(

return new Builder(this)
.upgradeFormatVersion(newFormatVersion)
.resetMainBranch()
.removeRef(SnapshotRef.MAIN_BRANCH)
.setCurrentSchema(freshSchema, newLastColumnId.get())
.setDefaultPartitionSpec(freshSpec)
.setDefaultSortOrder(freshSortOrder)
Expand Down Expand Up @@ -1360,16 +1360,6 @@ public Builder removeRef(String name) {
return this;
}

private Builder resetMainBranch() {
this.currentSnapshotId = -1;
SnapshotRef ref = refs.remove(SnapshotRef.MAIN_BRANCH);
if (ref != null) {
changes.add(new MetadataUpdate.RemoveSnapshotRef(SnapshotRef.MAIN_BRANCH));
}

return this;
}

/**
* Set a statistics file for a snapshot.
*
Expand Down
36 changes: 18 additions & 18 deletions site/docs/benchmarks.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,89 +44,89 @@ Below are the existing benchmarks shown with the actual commands on how to run t
### IcebergSourceNestedListParquetDataWriteBenchmark
A benchmark that evaluates the performance of writing nested Parquet data using Iceberg and the built-in file source in Spark. To run this benchmark for either spark-2 or spark-3:

`./gradlew :iceberg-spark:iceberg-spark[2|3]:jmh -PjmhIncludeRegex=IcebergSourceNestedListParquetDataWriteBenchmark -PjmhOutputPath=benchmark/iceberg-source-nested-list-parquet-data-write-benchmark-result.txt`
`./gradlew :iceberg-spark:iceberg-spark-3.5_2.12:jmh -PjmhIncludeRegex=IcebergSourceNestedListParquetDataWriteBenchmark -PjmhOutputPath=benchmark/iceberg-source-nested-list-parquet-data-write-benchmark-result.txt`

### SparkParquetReadersNestedDataBenchmark
A benchmark that evaluates the performance of reading nested Parquet data using Iceberg and Spark Parquet readers. To run this benchmark for either spark-2 or spark-3:

`./gradlew :iceberg-spark:iceberg-spark[2|3]:jmh -PjmhIncludeRegex=SparkParquetReadersNestedDataBenchmark -PjmhOutputPath=benchmark/spark-parquet-readers-nested-data-benchmark-result.txt`
`./gradlew :iceberg-spark:iceberg-spark-3.5_2.12:jmh -PjmhIncludeRegex=SparkParquetReadersNestedDataBenchmark -PjmhOutputPath=benchmark/spark-parquet-readers-nested-data-benchmark-result.txt`

### SparkParquetWritersFlatDataBenchmark
A benchmark that evaluates the performance of writing Parquet data with a flat schema using Iceberg and Spark Parquet writers. To run this benchmark for either spark-2 or spark-3:

`./gradlew :iceberg-spark:iceberg-spark[2|3]:jmh -PjmhIncludeRegex=SparkParquetWritersFlatDataBenchmark -PjmhOutputPath=benchmark/spark-parquet-writers-flat-data-benchmark-result.txt`
`./gradlew :iceberg-spark:iceberg-spark-3.5_2.12:jmh -PjmhIncludeRegex=SparkParquetWritersFlatDataBenchmark -PjmhOutputPath=benchmark/spark-parquet-writers-flat-data-benchmark-result.txt`

### IcebergSourceFlatORCDataReadBenchmark
A benchmark that evaluates the performance of reading ORC data with a flat schema using Iceberg and the built-in file source in Spark. To run this benchmark for either spark-2 or spark-3:

`./gradlew :iceberg-spark:iceberg-spark[2|3]:jmh -PjmhIncludeRegex=IcebergSourceFlatORCDataReadBenchmark -PjmhOutputPath=benchmark/iceberg-source-flat-orc-data-read-benchmark-result.txt`
`./gradlew :iceberg-spark:iceberg-spark-3.5_2.12:jmh -PjmhIncludeRegex=IcebergSourceFlatORCDataReadBenchmark -PjmhOutputPath=benchmark/iceberg-source-flat-orc-data-read-benchmark-result.txt`

### SparkParquetReadersFlatDataBenchmark
A benchmark that evaluates the performance of reading Parquet data with a flat schema using Iceberg and Spark Parquet readers. To run this benchmark for either spark-2 or spark-3:

`./gradlew :iceberg-spark:iceberg-spark[2|3]:jmh -PjmhIncludeRegex=SparkParquetReadersFlatDataBenchmark -PjmhOutputPath=benchmark/spark-parquet-readers-flat-data-benchmark-result.txt`
`./gradlew :iceberg-spark:iceberg-spark-3.5_2.12:jmh -PjmhIncludeRegex=SparkParquetReadersFlatDataBenchmark -PjmhOutputPath=benchmark/spark-parquet-readers-flat-data-benchmark-result.txt`

### VectorizedReadDictionaryEncodedFlatParquetDataBenchmark
A benchmark to compare performance of reading Parquet dictionary encoded data with a flat schema using vectorized Iceberg read path and the built-in file source in Spark. To run this benchmark for either spark-2 or spark-3:

`./gradlew :iceberg-spark:iceberg-spark[2|3]:jmh -PjmhIncludeRegex=VectorizedReadDictionaryEncodedFlatParquetDataBenchmark -PjmhOutputPath=benchmark/vectorized-read-dict-encoded-flat-parquet-data-result.txt`
`./gradlew :iceberg-spark:iceberg-spark-3.5_2.12:jmh -PjmhIncludeRegex=VectorizedReadDictionaryEncodedFlatParquetDataBenchmark -PjmhOutputPath=benchmark/vectorized-read-dict-encoded-flat-parquet-data-result.txt`

### IcebergSourceNestedListORCDataWriteBenchmark
A benchmark that evaluates the performance of writing nested Parquet data using Iceberg and the built-in file source in Spark. To run this benchmark for either spark-2 or spark-3:

`./gradlew :iceberg-spark:iceberg-spark[2|3]:jmh -PjmhIncludeRegex=IcebergSourceNestedListORCDataWriteBenchmark -PjmhOutputPath=benchmark/iceberg-source-nested-list-orc-data-write-benchmark-result.txt`
`./gradlew :iceberg-spark:iceberg-spark-3.5_2.12:jmh -PjmhIncludeRegex=IcebergSourceNestedListORCDataWriteBenchmark -PjmhOutputPath=benchmark/iceberg-source-nested-list-orc-data-write-benchmark-result.txt`

### VectorizedReadFlatParquetDataBenchmark
A benchmark to compare performance of reading Parquet data with a flat schema using vectorized Iceberg read path and the built-in file source in Spark. To run this benchmark for either spark-2 or spark-3:

`./gradlew :iceberg-spark:iceberg-spark[2|3]:jmh -PjmhIncludeRegex=VectorizedReadFlatParquetDataBenchmark -PjmhOutputPath=benchmark/vectorized-read-flat-parquet-data-result.txt`
`./gradlew :iceberg-spark:iceberg-spark-3.5_2.12:jmh -PjmhIncludeRegex=VectorizedReadFlatParquetDataBenchmark -PjmhOutputPath=benchmark/vectorized-read-flat-parquet-data-result.txt`

### IcebergSourceFlatParquetDataWriteBenchmark
A benchmark that evaluates the performance of writing Parquet data with a flat schema using Iceberg and the built-in file source in Spark. To run this benchmark for either spark-2 or spark-3:

`./gradlew :iceberg-spark:iceberg-spark[2|3]:jmh -PjmhIncludeRegex=IcebergSourceFlatParquetDataWriteBenchmark -PjmhOutputPath=benchmark/iceberg-source-flat-parquet-data-write-benchmark-result.txt`
`./gradlew :iceberg-spark:iceberg-spark-3.5_2.12:jmh -PjmhIncludeRegex=IcebergSourceFlatParquetDataWriteBenchmark -PjmhOutputPath=benchmark/iceberg-source-flat-parquet-data-write-benchmark-result.txt`

### IcebergSourceNestedAvroDataReadBenchmark
A benchmark that evaluates the performance of reading Avro data with a flat schema using Iceberg and the built-in file source in Spark. To run this benchmark for either spark-2 or spark-3:

`./gradlew :iceberg-spark:iceberg-spark[2|3]:jmh -PjmhIncludeRegex=IcebergSourceNestedAvroDataReadBenchmark -PjmhOutputPath=benchmark/iceberg-source-nested-avro-data-read-benchmark-result.txt`
`./gradlew :iceberg-spark:iceberg-spark-3.5_2.12:jmh -PjmhIncludeRegex=IcebergSourceNestedAvroDataReadBenchmark -PjmhOutputPath=benchmark/iceberg-source-nested-avro-data-read-benchmark-result.txt`

### IcebergSourceFlatAvroDataReadBenchmark
A benchmark that evaluates the performance of reading Avro data with a flat schema using Iceberg and the built-in file source in Spark. To run this benchmark for either spark-2 or spark-3:

`./gradlew :iceberg-spark:iceberg-spark[2|3]:jmh -PjmhIncludeRegex=IcebergSourceFlatAvroDataReadBenchmark -PjmhOutputPath=benchmark/iceberg-source-flat-avro-data-read-benchmark-result.txt`
`./gradlew :iceberg-spark:iceberg-spark-3.5_2.12:jmh -PjmhIncludeRegex=IcebergSourceFlatAvroDataReadBenchmark -PjmhOutputPath=benchmark/iceberg-source-flat-avro-data-read-benchmark-result.txt`

### IcebergSourceNestedParquetDataWriteBenchmark
A benchmark that evaluates the performance of writing nested Parquet data using Iceberg and the built-in file source in Spark. To run this benchmark for either spark-2 or spark-3:

`./gradlew :iceberg-spark:iceberg-spark[2|3]:jmh -PjmhIncludeRegex=IcebergSourceNestedParquetDataWriteBenchmark -PjmhOutputPath=benchmark/iceberg-source-nested-parquet-data-write-benchmark-result.txt`
`./gradlew :iceberg-spark:iceberg-spark-3.5_2.12:jmh -PjmhIncludeRegex=IcebergSourceNestedParquetDataWriteBenchmark -PjmhOutputPath=benchmark/iceberg-source-nested-parquet-data-write-benchmark-result.txt`

### IcebergSourceNestedParquetDataReadBenchmark
* A benchmark that evaluates the performance of reading nested Parquet data using Iceberg and the built-in file source in Spark. To run this benchmark for either spark-2 or spark-3:

` ./gradlew :iceberg-spark:iceberg-spark[2|3]:jmh -PjmhIncludeRegex=IcebergSourceNestedParquetDataReadBenchmark -PjmhOutputPath=benchmark/iceberg-source-nested-parquet-data-read-benchmark-result.txt`
` ./gradlew :iceberg-spark:iceberg-spark-3.5_2.12:jmh -PjmhIncludeRegex=IcebergSourceNestedParquetDataReadBenchmark -PjmhOutputPath=benchmark/iceberg-source-nested-parquet-data-read-benchmark-result.txt`

### IcebergSourceNestedORCDataReadBenchmark
A benchmark that evaluates the performance of reading ORC data with a flat schema using Iceberg and the built-in file source in Spark. To run this benchmark for either spark-2 or spark-3:

`./gradlew :iceberg-spark:iceberg-spark[2|3]:jmh -PjmhIncludeRegex=IcebergSourceNestedORCDataReadBenchmark -PjmhOutputPath=benchmark/iceberg-source-nested-orc-data-read-benchmark-result.txt`
`./gradlew :iceberg-spark:iceberg-spark-3.5_2.12:jmh -PjmhIncludeRegex=IcebergSourceNestedORCDataReadBenchmark -PjmhOutputPath=benchmark/iceberg-source-nested-orc-data-read-benchmark-result.txt`

### IcebergSourceFlatParquetDataReadBenchmark
A benchmark that evaluates the performance of reading Parquet data with a flat schema using Iceberg and the built-in file source in Spark. To run this benchmark for either spark-2 or spark-3:

`./gradlew :iceberg-spark:iceberg-spark[2|3]:jmh -PjmhIncludeRegex=IcebergSourceFlatParquetDataReadBenchmark -PjmhOutputPath=benchmark/iceberg-source-flat-parquet-data-read-benchmark-result.txt`
`./gradlew :iceberg-spark:iceberg-spark-3.5_2.12:jmh -PjmhIncludeRegex=IcebergSourceFlatParquetDataReadBenchmark -PjmhOutputPath=benchmark/iceberg-source-flat-parquet-data-read-benchmark-result.txt`

### IcebergSourceFlatParquetDataFilterBenchmark
A benchmark that evaluates the file skipping capabilities in the Spark data source for Iceberg. This class uses a dataset with a flat schema, where the records are clustered according to the
column used in the filter predicate. The performance is compared to the built-in file source in Spark. To run this benchmark for either spark-2 or spark-3:

`./gradlew :iceberg-spark:iceberg-spark[2|3]:jmh -PjmhIncludeRegex=IcebergSourceFlatParquetDataFilterBenchmark -PjmhOutputPath=benchmark/iceberg-source-flat-parquet-data-filter-benchmark-result.txt`
`./gradlew :iceberg-spark:iceberg-spark-3.5_2.12:jmh -PjmhIncludeRegex=IcebergSourceFlatParquetDataFilterBenchmark -PjmhOutputPath=benchmark/iceberg-source-flat-parquet-data-filter-benchmark-result.txt`

### IcebergSourceNestedParquetDataFilterBenchmark
A benchmark that evaluates the file skipping capabilities in the Spark data source for Iceberg. This class uses a dataset with nested data, where the records are clustered according to the
column used in the filter predicate. The performance is compared to the built-in file source in Spark. To run this benchmark for either spark-2 or spark-3:
`./gradlew :iceberg-spark:iceberg-spark[2|3]:jmh -PjmhIncludeRegex=IcebergSourceNestedParquetDataFilterBenchmark -PjmhOutputPath=benchmark/iceberg-source-nested-parquet-data-filter-benchmark-result.txt`
`./gradlew :iceberg-spark:iceberg-spark-3.5_2.12:jmh -PjmhIncludeRegex=IcebergSourceNestedParquetDataFilterBenchmark -PjmhOutputPath=benchmark/iceberg-source-nested-parquet-data-filter-benchmark-result.txt`

### SparkParquetWritersNestedDataBenchmark
* A benchmark that evaluates the performance of writing nested Parquet data using Iceberg and Spark Parquet writers. To run this benchmark for either spark-2 or spark-3:
`./gradlew :iceberg-spark:iceberg-spark[2|3]:jmh -PjmhIncludeRegex=SparkParquetWritersNestedDataBenchmark -PjmhOutputPath=benchmark/spark-parquet-writers-nested-data-benchmark-result.txt`
`./gradlew :iceberg-spark:iceberg-spark-3.5_2.12:jmh -PjmhIncludeRegex=SparkParquetWritersNestedDataBenchmark -PjmhOutputPath=benchmark/spark-parquet-writers-nested-data-benchmark-result.txt`

0 comments on commit 31a79e5

Please sign in to comment.