erase
was renamed todifference
, as most other open source applications/libraries use this terminology.erase
just keeps existing for backwards compatibility for now, but a warning is shown that it might be removed in the (distant) future. (#595)
- Add function
apply_vectorized
to apply a vectorized function on a geometry column and use it internally where possible (#588, #594) - Add support for renaming column with only difference in casing (#549)
- Add support to pass a single string for all
column
type parameters if a single column should be retained (#523) - Enable "CURVE" geometrytype files to be processed in the general file and layer operations (#558)
- Don't convert to multi-part geometries by default in
copy_layer
,... (#570) - Improve performance of
erase
andintersection
for very complex input geometries. This gives similar improvements for such datasets toidentity
,symmetric_difference
andunion
. (#585, #601, #591) - only apply subdivide once on each input layer in symmetric_difference
- Add configuration option to only warn on dissolve errors (#561)
- Add some pre-flight checks when geofileops is imported (#573)
- Add support for renaming layer with only difference in casing (#593)
- Fix
copy_layer
to a gpkg.zip file (#604) - Fix
missing_ok
parameter inremove
being ~ignored (#605)
- For
dissolve
,agg_columns
aggregations sometimes give wrong results (#541)
- Set the default value of
keep_empty_geoms
toFalse
for all standard operations. This changes the default formake_valid
and in some cases forsimplify
. The only exception isselect
, where the default staysTrue
. (#472, #499) - Up to now, geofileops always tried to create spatial indexes on output files. For some formats this has disadvantages thought. Hence, and for consistency, from now on the default behaviour regarding spatial index creation of GDAL will be followed. E.g. default a spatial index for "GPKG", but no index for "ESRI Shapefile". (#511)
- The default filter clause for
export_by_location
is now "intersects is True" while in previous versions it was "intersects is True and touches is False", to be in line withjoin_by_location
, other libraries and use for non-polygon data. (#508) - When
join_by_location
was applied, a column "spatial_relation" with the spatial relation between the geometries was added. This is no longer the case. (#475)
- Add support for self-overlays in overlay operations (#468)
- Add support for a spatial query in
export_by_location
(#508) - Improve
dissolve_within_distance
results (#494) - Improve performance of
join_by_location
for relation "intersects is True" and for complex polygon features in layer to compare with (#502, #519) - Improve handling of queries evaluating to True for disjoint features in
join_by_location
(#509) - Make fiona an optional depencency (#301)
- Add configuration option GFO_REMOVE_TEMP_FILES that can be used to avoid temp files being removed for debugging purposes (#480)
- Add a context manager, TempEnv, to set temporary env variables (#481)
- Don't copy both input files if only one is not spatialite based in two layer operations (#247)
- If an output_dir doesn't exist yet, avoid processing being done before an error is raised. (#518)
- Small changes to support geopandas 1.0 (#529)
- Linting improvements: ruff version to 0.4.4 + apply import sorting (#531)
- Small changes to support gdal 3.9 (#528)
- Several improvements to documentation: new FAQ, improved user guide,... (#465, #469, #474)
- Linting improvements: use mypy + use ruff-format instead of black for formatting (#478, #479)
- Fix crash when using e.g.
erase
withgridsize <> 0.0
specified on input file containing an EMPTY geometry (#470) - Fix
dissolve
possibly having EMPTY geometries as output ifgridsize <> 0.0
(#473) - Fix wrong results from
join_by_location
if ran on result ofjoin_by_location
withcolumn_prefixes=""
(#475) - Fix error in two layer operations if equal column aliases used based on a constant or a function result (#477)
- Fix
erase
(and depending two layer operations likeunion
,...) giving wrong results ifsubdivide_coords
< 1 (#489) - Fix two-layer operations with
gridsize
sometimes outputting NULL geometries (#495)
- Fix error in
erase
iferase_path
countains multiple layers (#451) - Fix error in
dissolve
on polygon input if a pass that is not the last one has 0 onborder polygons in its result (#459, #461) - Fix
groupby
parameter becoming case sensitive indissolve
whenagg_columns
is also specified (#462)
- Add support to read/add/remove embedded layer styles in gpkg (#263)
- Add
gridsize
parameter to most spatial operations (#261, #407, #413) - Add
keep_empty_geoms
andwhere_post
parameters to many single layer spatial operations (#262, #398) - Add
where_post
parameter to many two layer spatial operations (#312) - Add
columns
,sql
andwhere
parameters tocopy_layer
andappend_to
(#311, #432) - Add
dissolve_within_distance
operation (#409) - Add support for lang+ algorithm in
simplify
(#334) - Add support to use
select
andselect_two_layers
on attribute tables (= tables without geometry column) and/or have an attribute table as result (#322, #379) - Add support to process all file types supported by gdal in most general file and layer
operations, e.g.
get_layerinfo
,read_file
,... (#402) - Add support for files with Z/M dimensions in the general file and layer operations (#369)
- Add support for spatialite 5.1 in
join_nearest
(#412) - Improve performance of
makevalid
andisvalid
(#258) - Improve performance of
intersection
, 30% faster for typical data, up to 4x faster for large input geometries (#340, #358) - Improve performance of
clip
: 3x faster for typical data (#358) - Improve performance of
export_by_location
, especially whenarea_inters_column_name
andmin_area_intersect
areNone
: a lot faster + 10x less memory usage (#370) - Improve performance of
erase
,identity
,symmetric difference
andunion
by applying on-the-fly subdividing of complex geometries to speed up processing. The new parametersubdivide_coords
can be used to control the feature. For files with very large input geometries, up to 100x faster + 10x less memory usage. (#329, #330, #331, #357, #396, #427, #438, #446) - Improve performance of spatial operations when only one batch is used (#271)
- Improve performance of single layer operations (#375)
- Improve performance of some geopandas/shapely based operations (#342, #408)
- Add checks that
output_path
must not be equal to the/aninput_path
for geo operations (#246) - Follow geopandas behaviour of using shapely2 and/or pygeos instead of forcing pygeos (#294)
- Improve handling of "SELECT * ..." style queries in
select
andselect_two_layers
(#283) - Improve handling + tests regarding empty input layers/NULL geometries (#320)
- Improve logging: use geo operation being executed as
logger name
(#410) - Many small improvements to logging, documentation, (gdal)error messages,... (#321, #366, #394, #439,...)
- Use smaller footprint conda packages for tests (use
geopandas-base
,nomkl
) (#377) - Use ruff instead of flake8 for linting (#317)
- Fix parameter
index
into_file
being ~ ignored (#285) - Fix
fid
column in output having only null values in e.g.union
(#362) - Fix "json" aggregate column handling in dissolve on line and point input files gives wrong results (#257)
- Fix error in
read_file
whenread_geometry=False
andcolumns
specified (#393) - Fix error in
copy_layer
/convert
withexplodecollections
on some input files (#395) - Fix dissolve forcing output to wrong geometrytype in some cases (#424)
- Drop support for shapely1 (#329, #338)
- Parameter
precision
ofmakevalid
is renamed togridsize
as this is the typical terminology in other libraries (#273) - When
join_nearest
is used with spatialite, >= 5.1, two additional mandatory parameters need to be specified:distance
andexpand
(#412) - Parameter
area_inters_column_name
inexport_by_location
now defaults toNone
instead of "area_inters" (#370) - Deprecate
convert
and rename tocopy_layer
(#310) - Deprecate
split
and rename toidentity
(#397) - Deprecate
is_geofile
andis_geofile_ext
(#402) - Make the
GeofileType
enum private, in preparation of probably removing it later on (#402) - Remove the long-deprecated functions
get_driver_for_ext
,to_multi_type
andto_generaltypeid
(#276) - Remove the long-deprecated
vector_util
,geofileops.geofile
andgeofileops.geofileops
namespaces (#276) - Remove
geometry_util
,geoseries_util
andgrid_util
(#339):- Most functions were moved to
pygeoops
because they are generally reusable. - Remaining functions are moved either to
_geometry_util
or_geoseries_util
to make it clearer they are not public.
- Most functions were moved to
- Use pyogrio for GeoDataFrame io to improve performance for operations involving GeoDataFrames (#64, #217)
- Add possibility to backup the fid in output files when applying operations (#114)
- Add support to
to_file
to write empty dataframe + add basic support forforce_output_geometrytype
(#205) - Add support to
read_file
to execute sql statements (#222) - Add function
get_layer_geometrytypes
to get a list of all geometry types that are actually in a layer (#230) - Add
fid_as_index
parameter toread_file
(#215) - Preserve
fid
values in single layer operations when possible (#) - Add
force_output_geometrytype
parameter toapply
(#233) - Optimize performance of operations when only one batch is used (#19)
- Optimize number batches for single layer sql operations (#214)
- Add check for select operations that if nb_parallel > 1, {batch_filter} is mandatory in sql_stmt (#208)
- Small improvements/code cleanup (#216, #223, #240,...)
- When a geo operation results in an empty result, gfo now always writes an empty output file instead of no output. This is also the behaviour of other high level libraries like in the toolbox of QGIS or ArcGIS. This behaviour needs gdal version >= 3.6.3 to be applied consistently. (#188)
- In
read_file
the columns in the output now reflect the casing used in the parameter rather than the casing in the source file (#229) - Functions
read_file_sql
andread_file_nogeom
are deprecated in favour ofread_file
. Mind: in read_file the sql_dialect default is None, not "SQLITE". (#222, #232, #236) - The (private) util function
view_angles
is moved to pygeoops (#209)
- Support geopandas 12 with shapely 2.0 + pygeos (#191, #193)
- Support improvements in gdal 3.6.2 (#195)
- Improve performance of sql-based operations for very large input files (#201)
- Small improvements to formatting, linting,... (#202)
- Fix: Due to a change in fiona >= 1.9, using read_file on string columns with all None values ended up as a float64 column (#199)
- Because geofileops uses pygeos directly, pin geopandas to < 1.0. More info: geopandas#2691 (#200)
- Make writing to gpkg more robust in locking situations (#179)
- Add create_spatial_index parameter to to_file (#183)
- Ignore pandas futurewarning in dissolve (#184)
- Improve dissolve performance (#185)
- Small general improvements (#180)
- Fix groupby columns in dissolve sometimes becoming all NULL values (#181)
- In to_file, the default behaviour is now also for .shp to create a spatial index, consistent with convert,... (#183)
- Fix regression in to_file to support append to unexisting file (#177)
- Add (private) function
is_valid_reason
for GeoSeries (#164) - Small improvements in logging, formatting, avoid deprecation warnings,... (#163, #166, #171)
- Add CI tests for python 3.10 and 3.11. On python 3.11 the simplification library is not available (#170)
- Fix groupby columns in dissolve not being treated case insensitive (#162)
- Fix to_file doesn't throw an error nor saves data when appending a dataframe with different columns than file (#159)
- Fix ValueError: max_workers must be <= 61 in dissolve (#160)
- Fix sql_dialect parameter is ignored in select operation (#115)
- Add single layer function
gfo.export_by_bounds
(#149) - Add single layer function
gfo.clip_by_geometry
(#150) - Add single layer function
gfo.warp
to warp geometries based on GCP's (#151) - Add (private) function to calculate view angles from a point towards a GeoDataFrame (#140)
- Add (private) function to calculate topologic simplify (#146)
- Small changes to support geopandas 0.11+, newer pandas versions,... (#143, #147, #153)
- Fix typo in hardcoded 31370 custom prj string (#142)
The main improvements in this version are the geo operations gfo.clip
and
gfo.symmetric_difference
that were added.
- Add
gfo.clip
geo operation, more info here (#4) - Add
gfo.symmetric_difference
geo operation, more info here (#85) - Add support for all relevant spatial operations to join_by_location (#79)
- In
gfo.dissolve
, support aggregations on a groupby column and None data in aggregation columns (#130) - Add support to reproject to
gfo.convert
(#89) - Add function
gfo.drop_column
(#92) - Add detailed column info in
gfo.get_layerinfo
(#110) - Add support to specify (any) gdal options in relevant fileops (#83)
- Add support to write an attribute table (=no geometry column) to geopackage (#125)
- Don't list attribute tables in e.g.
gfo.listlayers
by default anymore (#124) - Speed up creation of an index on a geopackage (#87)
- Add
view_angles
function for geometries, geoseries (#136) - Some improvements to the benchmarks
- Use black to comply to pep8 + minor general improvements (#113)
- Fix dissolve bugs (#93)
- When
agg_columns="json"
is used and the dissolve needs multiple passes, the json output is not correct. - when combining tiled output with
explodecollections=False
, the output is still ~exploded.
- When
- For e.g.
gfo.intersection
some attribute data is NULL if output format is .shp (#102) gfo.dissolve
gives error if a linestring input layer contains special characters (#108)
- Property column of
gfo.get_layerinfo
is now a Dict[str, ColumnInfo] instead of a List[str] (#110) - For the simplify operation, use rdp version that preserves topology (#105)
- Removed redundant
verbose
parameter in all functions (#133) - Attribute tables are not listed anymore by default in e.g.
gfo.listlayers
(#124) - Rename some files in util that are rather private (#84)
- Remove long-time deprecated
clip_on_tiles
parameter ingfo.dissolve
(#95) - Deprecate
gfo.intersect
for new namegfo.intersection
to be consistent with most other libraries. Now a warning is given, in the futuregfo.intersect
will be removed (#99).
The main new features in this version are the simplified API, a new geo operation "apply" and dissolve supporting aggregations on columns now.
- Add apply geo operation. Info on how to use it can be found here (#41)
- Add support for different aggregations on columns in dissolve operation. Info on how to use it can be found here (#3)
- Simplify API by removing the seperation in geofile versus geofileops (#52)
- Improve type annotations and documentation
- Increase test coverage, including tests on latlon files which weren't available yet (#32)
- Improve performance of buffer, simplify and complexhull by using the spatialite/sql implementation (#53)
- Improve benchmarks, eg. add graphs,... (#55)
- Improve performance of _harmonize_to_multitype + improve test (#56)
- In the two-layer operations, in some cases columns could get "lost" in the output file (#67)
- Breaking change: in
gfo.dissolve
, the parametersaggfunc
andcolumns
are replaced byagg_columns
. More info on the usage can be found here. - Due to flattening the API, using
from geofileops import geofile
andfrom geofileops import geofileops
is deprecated, and you should use eg.import geofileops as gfo
. A "FutureWarning" is shown now, in a future version this possibility will probably be removed. - The following functions are deprecated. A "FutureWarning" is shown now, in a future
version this function they will be removed:
gfo.get_driver(path)
can be replaced byGeofileType(Path(path)).ogrdriver
.get_driver_for_ext(file_ext: str)
can be replaced byGeofileType(file_ext).ogrdriver
.gfo.to_multi_type(geometrytypename: str)
can be replaced byGeometryType(geometrytypename).to_multitype
.gfo.to_generaltypeid(geometrytypename: str)
can be replaced byGeometryType(geometrytypename).to_primitivetype.value
.
Several improvements and fixes. Especially if you process large files (> 5 GB) some of them will be very useful.
- Add options to choose the buffer style to use (#37).
- Add option in makevalid to specify precision to use.
- Add support for dissolve with groupby_columns=None + explodecollections=True
- add batchsize parameter to all geo operations to be able to limit memory usage if needed (#38).
- Decrease memory usage for most operations.
- Run processing in low CPU priority so the computer stays responding.
- Fix error when processing very large files for buffer, convexhull, dissolve and simplify (IO timeouts).
- Fix error in simplify when points on geometries should be kept.
- Don't give an error when result is empty for sql based operations.
- Explodecollections=True doesn't work on operations on 2 layers.
In this release, the main change is a new operation that has been added: nearest_join.
- New operation nearest_join() (#12).
- Improve performance for files where the rowid isn't consecutive (#14).
- Improve documentation, mainly for the select operations.
- Add check in get_layerinfo on column names containing illegal characters (='"').
- Suppress fiona warning "Sequential read of iterator was interrupted."
- Dissolve: output is sorted by a geohash instead of random, which improves performance when reading the file afterwards.
- Drop support for spatialite < 5 and geopandas < 0.9.
- Bugfix: for the dissolve operation, when rows in the groupby columns contained NULL/None/NaN values, those rows were dropped. This is the default behaviour of (geo)pandas, but as the behaviour doesn't feel very intuitive, it was decided to divert from it.
Improved performance for all operations that involve overlays between 2 layers (intersect, union, identity/split,...).
- Improved performance for all operations that involve overlays between 2 layers (intersect, union, identity/split,...). Especially if the input files are in Geopackage format the improvement should be significant because in this case the input data isn't copied to temp files anymore.
- Added the method geofile.execute_sql() to be able to execute a DML/DDL sql statement to a geofile.
- Smaller code cleanups to improve readability on several places.
- In the identity/split and union operations, in some cases (mainly when input layers had self-intersections) intersections in the output were unioned instead of keeping them as seperate rows.
- When using an input file with multiple layers (eg. a geopackage), this sometimes resulted in data errors.
- Fix an error in the lang simplification algorithm that occured in some edge cases.
This release mainly brings improvements and bugfixes when using the Lang simplification algorithm.
Additionally, this is the first version that gets a conda package.
- Support for all geometry types for Lang simplification
- Some code restructuring + code cleanup in util (backwards compatible for now, but isn't public interface anyway)
- Improve test coverage
- preserve_topology wsn't always respected in Lang simplification
- clip_on_tiles=False in dissolve resulted in some polygons not being dissolved. Because there is no easy solution, parameter is deprecated and always treated as True.
This release is mainly focused on improvements to the dissolve and simplify operations.
It is important to note that for the dissolve operation the default parameters behave slightly different, so the change is not backwards compatible!
- Dissolve
- Different handling of default parameters (not backwards compatible!)
- Better performance
- Several bugfixes
- Simplify: add support for different simplification algorythms: ramer-douglas-peucker, visvalingam-whyatt and lang.
- Many improvements to documentation
- Improvements to type annotations
- Some new undocumented (so use at own risk!) functions were added to
geofileops.vector_util.
- simplify_ext: an "extended" simplify function:
- Supports multiple simplification algorythms: ramer-douglas-peucker, visvalingam-whyatt and lang.
- Option to specify a "points-to-keep geometry": points/nodes of the geometries to be simplified that intersect it won't be removed by the simplify operation.
- count_number_points: count the number of coordinates that any shapely geometry consists of
- simplify_ext: an "extended" simplify function:
This version contains quite some fixes and small improvements. The most important ones are listed here.
- Stop using sqlite WAL output format as default: gives locking issues when using files on a file share
- Improve handling of output geometry type
- Add scripts, dependencies,... to use pytest-cov to report unit test coverage
- Add first version of sphynx documentation
- Improve test coverage
- Improve benchmark tests
- Negative buffer shouldn't output rows with empty geometries
- Dissolve gave an error in some specific cases
- Join_by_location operation gave wrong results
Most typical operations are now supported: buffer, simplify, export_by_location, intersect, union, erase, dissolve,...