Skip to content

Commit

Permalink
Fix WKB writing on big-endian systems
Browse files Browse the repository at this point in the history
`shapely.to_wkb` (called in `pyogrio.geopandas.write_dataframe` via
`geopandas.array.to_wkb`) defaults to machine byte order, so simply
reading the first byte (of a 4-byte integer geometry type) is broken on
big-endian machines. And since some geometry type could be 2 bytes, this
also seems broken on little-endian machines, though I did not check if
any 2-byte types could be produced.

The commented-out code here indicates the correct course of action
(reading the first byte to determine byte order). Instead of using
`int.from_bytes`, I wrote it out as bitshifting operations, which
compile directly to 1 or 2 (if byte swapping) assembly instructions.

Also, change `wkb_buffer` to be `const`, as it doesn't need to be
writeable (and perhaps this will save a copy.)
  • Loading branch information
QuLogic committed Dec 5, 2024
1 parent 8c32f2d commit c8e2b58
Show file tree
Hide file tree
Showing 2 changed files with 13 additions and 5 deletions.
4 changes: 4 additions & 0 deletions CHANGES.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,10 @@

## 0.11.0 (TBD)

### Bug fixes

- Fix WKB writing on big-endian systems (#497).

### Packaging

- the GDAL library included in the wheels is upgraded from 3.9.2 to 3.10.0 (#499).
Expand Down
14 changes: 9 additions & 5 deletions pyogrio/_io.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -2253,7 +2253,8 @@ def ogr_write(
cdef OGRGeometryH ogr_geometry_multi = NULL
cdef OGRFeatureDefnH ogr_featuredef = NULL
cdef OGRFieldDefnH ogr_fielddef = NULL
cdef unsigned char *wkb_buffer = NULL
cdef const unsigned char *wkb_buffer = NULL
cdef unsigned int wkbtype = 0
cdef int supports_transactions = 0
cdef int err = 0
cdef int i = 0
Expand Down Expand Up @@ -2373,15 +2374,18 @@ def ogr_write(
# TODO: geometry must not be null or errors
wkb = None if geometry is None else geometry[i]
if wkb is not None:
wkbtype = <int>bytearray(wkb)[1]
# may need to consider all 4 bytes: int.from_bytes(wkb[0][1:4], byteorder="little")
# use "little" if the first byte == 1
wkb_buffer = wkb
if wkb_buffer[0] == 1:
# Little endian WKB type.
wkbtype = wkb_buffer[1] + (wkb_buffer[2] << 8) + (wkb_buffer[3] << 16) + (<unsigned int>wkb_buffer[4] << 24)
else:
# Big endian WKB type.
wkbtype = (<unsigned int>(wkb_buffer[1]) << 24) + (wkb_buffer[2] << 16) + (wkb_buffer[3] << 8) + wkb_buffer[4]
ogr_geometry = OGR_G_CreateGeometry(<OGRwkbGeometryType>wkbtype)
if ogr_geometry == NULL:
raise GeometryError(f"Could not create geometry at index {i} for WKB type {wkbtype}") from None

# import the WKB
wkb_buffer = wkb
err = OGR_G_ImportFromWkb(ogr_geometry, wkb_buffer, len(wkb))
if err:
raise GeometryError(f"Could not create geometry from WKB at index {i}") from None
Expand Down

0 comments on commit c8e2b58

Please sign in to comment.