forked from OSGeo/gdal
-
Notifications
You must be signed in to change notification settings - Fork 4
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Parquet writer: add SORT_BY_BBOX=YES/NO layer creation option
Defaults to NO Documentation: ``` - .. lco:: SORT_BY_BBOX :choices: YES, NO :default: NO :since: 3.9 Whether features should be sorted based on the bounding box of their geometries, before being written in the final file. Sorting them enables faster spatial filtering on reading, by grouping together spatially close features in the same group of rows. Note however that enabling this option involves creating a temporary GeoPackage file (in the same directory as the final Parquet file), and thus requires temporary storage (possibly up to several times the size of the final Parquet file, depending on Parquet compression) and additional processing time. The efficiency of spatial filtering depends on the ROW_GROUP_SIZE. If it is too large, too many features that are not spatially close will be grouped together. If it is too small, the file size will increase, and extra processing time will be necessary to browse through the row groups. Note also that when this option is enabled, the Arrow writing API (which is for example triggered when using ogr2ogr to convert from Parquet to Parquet), fallbacks to the generic implementation, which does not support advanced Arrow types (lists, maps, etc.). ``` Experiments with the canonical https://storage.googleapis.com/open-geodata/linz-examples/nz-building-outlines.parquet dataset: * Generation of datasets: // Organize in row groups of 65,536 features, no BBOX, no sorting ``` $ time ogr2ogr out_no_bbox.parquet nz-building-outlines.parquet -progress -lco WRITE_COVERING_BBOX=NO 0...10...20...30...40...50...60...70...80...90...100 - done. real 0m4,457s ``` // Organize in row groups of 65,536 features, add BBOX columns, no sorting ``` $ time ogr2ogr out_unsorted.parquet nz-building-outlines.parquet -progress 0...10...20...30...40...50...60...70...80...90...100 - done. real 0m5,408s ``` // Organize in row groups of max 65,536 features, add BBOX columns, sort using RTree ``` $ time ogr2ogr out_sorted.parquet nz-building-outlines.parquet -progress -lco SORT_BY_BBOX=YES 0...10...20...30...40...50...60...70...80...90...100 - done. real 0m40,311s ``` // Organize in row groups of max 16,384 features, add BBOX columns, sort using RTree ``` $ time ogr2ogr out_sorted_16384.parquet nz-building-outlines.parquet -progress -lco SORT_BY_BBOX=YES -lco ROW_GROUP_SIZE=16384 0...10...20...30...40...50...60...70...80...90...100 - done. real 0m44,149s ``` * File sizes: ``` out_no_bbox.parquet 436,475,127 out_unsorted.parquet 504,120,728 out_sorted.parquet 489,507,910 out_sorted_16384.parquet 492,760,561 ``` * Spatial filter selecting a single feature: ``` $ time ogrinfo out_no_bbox.parquet -spat 1818654 5546189 1818655 5546190 -al -so -json -noextent | jq .layers[0].featureCount 1 real 0m1,302s $ time ogrinfo out_unsorted.parquet -spat 1818654 5546189 1818655 5546190 -al -so -json -noextent | jq .layers[0].featureCount 1 real 0m0,947s $ time ogrinfo out_sorted.parquet -spat 1818654 5546189 1818655 5546190 -al -so -json -noextent | jq .layers[0].featureCount 1 real 0m0,278s $ time ogrinfo out_sorted_16384.parquet -spat 1818654 5546189 1818655 5546190 -al -so -json -noextent | jq .layers[0].featureCount 1 real 0m0,183s ``` * Spatial filter selecting ~ 470,000 features (over a total of 3.2 millions): ``` $ time ogrinfo out_no_bbox.parquet -spat 1750445 5812014 1912866 5906677 -al -so -json -noextent | jq .layers[0].featureCount 471147 real 0m1,957s $ time ogrinfo out_unsorted.parquet -spat 1750445 5812014 1912866 5906677 -al -so -json -noextent | jq .layers[0].featureCount 471147 real 0m1,718s $ time ogrinfo out_sorted.parquet -spat 1750445 5812014 1912866 5906677 -al -so -json -noextent | jq .layers[0].featureCount 471147 real 0m1,067s $ time ogrinfo out_sorted_16384.parquet -spat 1750445 5812014 1912866 5906677 -al -so -json -noextent | jq .layers[0].featureCount 471147 real 0m1,021s ```
- Loading branch information
Showing
11 changed files
with
738 additions
and
22 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.