- Relax pyspark constraint
- Breaking: Provide the same kwargs used in the protobuf lib on encoding/decoding rather than the
options
dict, exceptDescriptorPool
which is unserializable. - Breaking: Change param
mc
->message_converter
on top level functions.
- Bugfix: Fixed a bug where int64 protobuf types were not being properly converted into spark types.
- Added support for protobuf wrapper types.
- Bugfix: Fixed a bug where
options
was not being passed recursively inget_spark_schema
.
- Add
to_protobuf
andfrom_protobuf
functions to operate on columns without needing aMessageConverter
. - Add
df_to_protobuf
anddf_from_protobuf
functions to operate on DataFrames without needing aMessageConverter
. These functions also optionally handle struct expansion.
- Bugfix: Fix
bytearray
TypeError when using newer versions of protobuf
- Breaking: return type instances to be passed to custom serializers rather than type class + init kwargs
- Bugfix:
get_spark_schema
now returns properly when the descriptor passed has a registered custom serializer
- Breaking: pbspark now encodes the well known type
Timestamp
to sparkTimestampType
by default. - Bugfix: protobuf bytes now properly convert to spark BinaryType
- Bugfix: message decoding now properly works by populating the passed message instance rather than returning a new one
- protobuf objects are now patched only temporarily when being used by pbspark
- Timestamp conversion now references protobuf well known type objects rather than objects copied from protobuf to pbspark
- Modify the encoding function to convert udf-passed Row objects to dictionaries before passing to the parser.
- Documentation fixes and more details on custom encoders.
- Breaking: protobuf bytes fields will now convert directly to spark ByteType and vice versa.
- Relax constraint on pyspark
- Bump minimum protobuf version to 3.20.0
- Add
to_protobuf
method to encode pyspark structs to protobuf
- initial release