Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DEV] 2.2.0 - C API Breaking changes #11

Merged
merged 29 commits into from
Sep 29, 2024
Merged

[DEV] 2.2.0 - C API Breaking changes #11

merged 29 commits into from
Sep 29, 2024

Conversation

MatrixEditor
Copy link
Owner

@MatrixEditor MatrixEditor commented Sep 8, 2024

Minor Release 2.2.0 breaking changes

This release will introduce breaking changes in the C API of this project, reducing the overhead of CpFieldObject and
splitting the monolithic parsing functions.

Tasks

  • Write tests for all implemented C atoms and builtin atoms
  • Add documentation for all new C atoms
  • Add benchmarks

What's changed

The native module (caterpillar._C) can be installed via pip without the environment variable present:

pip install "caterpillar[all]@git+https://github.com/MatrixEditor/caterpillar/#subdirectory=src/ccaterpillar"

The following new C atoms have been added/implemented:

  • CpBuiltinAtomObject: Base class for all standard atoms (C-layer) (Python Type: builtinatom)
  • CpRepeatedAtomObject: essentially a sequence of atoms (Python Type: repeated)
  • CpConditionAtomObject: adds a condition to the target atom (Python Type: conditional)
  • CpSwitchAtomObject: simulates a switch statement (Python Type: switch)
  • CpOffsetAtomObject: places the atom at a specific offset (Python Type: atoffset)
  • CpPrimitiveAtomObject: same as builtinatom but for Python classes (Python Type: patom)
  • CpBytesAtomObject: equivalent of caterpillar.py.Bytes (Python Type: octetstring)
  • CpPStringAtomObject: Pascal-String implementation (Python Type: pstring)
  • CpConstAtomObject: constant expressions (Python Type: const)
  • CpEnumAtomObject: support for enum.Enum types (Python Type: enumeration)
  • CpVarIntAtomObject: little endian and big endian variable-length integer objects (Python Type: VarInt)
  • CpComputedAtomObject: constant expressions (Python type: computed)
  • CpLazyAtomObject: forward references to atoms (Python type: lazy`)
  • CpCStringAtomObject: C-Strings with variable padding character (Python type: cstring`)

The following C atoms have been renamed:

  • CpIntAtomObject: Python Type int_t -> Int
  • CpPaddingAtomObject: Python Type padding_t -> Padding
  • CpFloatAtomObject: Python Type float_t -> Float
  • CpCharAtomObject: Python Type char_t -> Char
  • CpBoolAtomObject: Python Type bool_t -> Bool

There are now two python modules which import all available classes at once. They can be accessed via their
language tag (Both implementations are not compatible to each other):

from caterpillar.c import *
from caterpillar.py import *

A new unique feature was added to the C module to be able to map Python types to atom types. Using the TYPE_MAP one can configure a mapping between a native Python type and its corresponding atom representation:

from caterpillar.c import *

TYPE_MAP[int] = u32

@struct
class Format:
    value: int    # now we can simply use int

How the C API headers are generated has changed:

  • All API functions, types, objects and source files are placed into src/capi.dat using the following schema:
    • type:INDEX:STRUCT_NAME:TYPEDEF_NAME:CAPI_TYPE
      Defines a C API type for a C structure. The index is optional and the CAPI_TYPE will be inferred as PyTypeObject if none set
    • obj:INDEX:NAME:TYPE
      Defines a C API object.
    • func:INDEX:NAME:RETURN_TYPE:REFCOUNT
      Defines a C API function. The function must be present within the source set of this file.
    • src:FILE
      Defines the source file (relative to this file) that contains the function definitions.

The following changes have been made to the C API:

- int CpPack_Field(PyObject* op, CpFieldObject* field, CpLayerObject* layer);
- int CpPack_Common(PyObject* op, PyObject* atom, CpLayerObject* layer);
- int CpPack_Struct(PyObject* op, CpStructObject* struct_, CpLayerObject* layer);
- int _Cp_Pack(PyObject* op, PyObject* atom, CpLayerObject* layer);
- PyObject* CpUnpack_Field(CpFieldObject* field, CpLayerObject* layer);
- PyObject* CpUnpack_Common(PyObject* op, CpLayerObject* layer);
- PyObject* CpUnpack_Struct(CpStructObject* struct_, CpLayerObject* layer);
- PyObject* _Cp_Unpack(PyObject* atom, CpLayerObject* layer); 
+ PyObject* CpState_ReadSsize_t(CpStateObject* self, Py_ssize_t size);
- int CpLayer_SetSequence(CpLayerObject* self,PyObject* sequence,Py_ssize_t length,int8_t greedy);
+ int CpStruct_Pack(CpStructObject* self, PyObject* obj, CpLayerObject* layer);
+ PyObject* CpStruct_Unpack(CpStructObject* self, CpLayerObject* layer);
+ PyObject* CpStruct_SizeOf(CpStructObject* self, CpLayerObject* layer);
+ CpSeqLayerObject* CpSeqLayer_New(CpStateObject* state, CpLayerObject* parent);
+ int CpSeqLayer_SetSequence(CpSeqLayerObject* self,PyObject* sequence,Py_ssize_t length,int8_t greedy);
+ CpObjLayerObject* CpObjLayer_New(CpStateObject* state, CpLayerObject* parent);
@@ int CpPaddingAtom_PackMany(CpPaddingAtomObject* self,PyObject* value,CpLayerObject* layer,CpLengthInfoObject* lengthinfo); @@
@@ PyObject* CpPaddingAtom_UnpackMany(CpPaddingAtomObject* self,CpLayerObject* layer,CpLengthInfoObject* lengthinfo); @@
+ int CpConstAtom_Pack(CpConstAtomObject* self, PyObject* value, CpLayerObject* layer);
+ PyObject* CpConstAtom_Unpack(CpConstAtomObject* self, CpLayerObject* layer);
+ int CpRepeatedAtom_Pack(CpRepeatedAtomObject* self,PyObject* op,CpLayerObject* layer);
+ PyObject* CpRepeatedAtom_Unpack(CpRepeatedAtomObject* self, CpLayerObject* layer);
+ PyObject* CpRepeatedAtom_GetLength(CpRepeatedAtomObject* self, PyObject* context);
+ int CpConditionAtom_Pack(CpConditionAtomObject* self, PyObject* op, PyObject* layer);
+ PyObject* CpConditionAtom_Unpack(CpConditionAtomObject* self, CpLayerObject* layer);
+ int CpConditionAtom_IsEnabled(CpConditionAtomObject* self, PyObject* context);
+ PyObject* CpSwitchAtom_GetNext(CpSwitchAtomObject* self, PyObject* op, PyObject* context);
+ int CpSwitchAtom_Pack(CpSwitchAtomObject* self, PyObject* obj, CpLayerObject* layer);
+ PyObject* CpSwitchAtom_Unpack(CpSwitchAtomObject* self, CpLayerObject* layer);
+ int CpOffsetAtom_Pack(CpOffsetAtomObject* self, PyObject* obj, CpLayerObject* layer);
+ PyObject* CpOffsetAtom_Unpack(CpOffsetAtomObject* self, CpLayerObject* layer);
+ PyObject* CpOffsetAtom_GetOffset(CpOffsetAtomObject* self, PyObject* layer);
+ PyObject* CpBytesAtom_GetLength(CpBytesAtomObject* self, CpLayerObject* layer);
+ int CpBytesAtom_Pack(CpBytesAtomObject* self, PyObject* value, CpLayerObject* layer);
+ PyObject* CpBytesAtom_Unpack(CpBytesAtomObject* self, CpLayerObject* layer);
+ int CpPStringAtom_Pack(CpPStringAtomObject* self,PyObject* value,CpLayerObject* layer);
+ PyObject* CpPStringAtom_Unpack(CpPStringAtomObject* self, CpLayerObject* layer);
+ int CpEnumAtom_Pack(CpEnumAtomObject* self, PyObject* value, CpLayerObject* layer);
+ PyObject* CpEnumAtom_Unpack(CpEnumAtomObject* self, CpLayerObject* layer);
+ int CpVarIntAtom_Pack(CpVarIntAtomObject* self,PyObject* value,CpLayerObject* layer);
+ PyObject* CpVarIntAtom_Unpack(CpVarIntAtomObject* self, CpLayerObject* layer);
+ PyObject* CpVarIntAtom_BSwap(PyObject* number, bool little_endian);
+ unsigned long long CpVarIntAtom_BSwapUnsignedLongLong(unsigned long long number,bool little_endian);
+ long long CpVarIntAtom_BSwapLongLong(long long number, bool little_endian);
+ Py_ssize_t CpVarIntAtom_BSwapSsize_t(Py_ssize_t number, bool little_endian);
+ int CpComputedAtom_Pack(CpComputedAtomObject* self,PyObject* obj,CpLayerObject* layer);
+ PyObject* CpComputedAtom_Unpack(CpComputedAtomObject* self, CpLayerObject* layer);
+ int CpLazyAtom_Pack(CpLazyAtomObject* self, PyObject* obj, CpLayerObject* layer);
+ PyObject* CpLazyAtom_Unpack(CpLazyAtomObject* self, CpLayerObject* layer);
+ int CpCStringAtom_Pack(CpCStringAtomObject* self,PyObject* value,CpLayerObject* layer);
+ PyObject* CpCStringAtom_Unpack(CpCStringAtomObject* self, CpLayerObject* layer);

---

+ Implementation for native switch and condition
---

+ fixed issues with CpLayer* classes
+ changed struct to builtinatom
---

+ CpPrimitiveAtom <-> patom
+ Updated _C.pyi
+ updated tests (padding_atom excluded)
+ Added CpStruct_Sizeof
+ Fixed CpState_Seek
+ Builtin-Atoms now use builtinatom as base class
---

+ Removed API functions to pack fields and structs
+ Fixed repeated atom parsing logic to accept __pack_many__ and __unpack_many__
+ New type CpLengthInfoObject
+ Removed deprecated parsing logic
+ Added verification logic to paddingatom
...

+ Python class: "octetstring"
+ Tests included
---

+ Python type: `pstring`
+ New API functions: CpState_ReadSsize_t, CpPStringAtom_Pack and CpPStringAtom_Unpack
+ Tests included
+ Added  macro to implement byteorder operator (_CpEndian_ImplSetByteorder)
---

+ Python class: `enumeration`
+ Tests included
+ Updated roadmap accordingly
@MatrixEditor MatrixEditor self-assigned this Sep 8, 2024
@MatrixEditor MatrixEditor changed the title [DEV] 2.2.0 - Breaking changes [DEV] 2.2.0 - C API Breaking changes Sep 8, 2024
---

+ CpVarIntAtom for both little endian and bigendian
+ Added C installation candidate to subpackage
+ Tests included
+ New C API functions: CpVarIntAtom_New,  CpVarIntAtom_Pack,  CpVarIntAtom_Unpack, CpVarIntAtom_BSwap, CpVarIntAtom_BSwapUnsignedLongLong, CpVarIntAtom_BSwapLongLong, CpVarIntAtom_BSwapSsize_t
+ Changed CpEndian and CpArch representation
@MatrixEditor MatrixEditor marked this pull request as draft September 9, 2024 20:53
---

+ Python type: `lazy`
+ Tests included
+ README installation update
---

+ Python Type: `cstring`
+ Tests included
+ Changed encoding parameter to be optional
---

+ int_t -> Int
+ float_t -> Float
+ varint_t -> VarInt
+ const_t -> const
+ padding_t -> Padding
+ bool_t -> Bool
+ char_t -> Char
+ added missing __bits__ method string to internal module state
---

+ CpInvalidDefault and CpDefaultSwitchOption now in separate source file
---

+ c.Int Python type documentation
+ added copybutton to docs
---

+ Fixed issue with STRUCT_OPTIONS referenced two times
+ cstring now parses with greedy option by default
+ Documentation updates
+ modified tests adapt to  fixed bugs and naming changes
---

+ Renamed atom documentation files and removed the "atom" suffix
---

+ Removed field related tests
---

+ Fixed refcount issue in _CpPack_EvalLength and _CpUnpack_EvalLength
+ Added Py_TPFLAGS_BASETYPE to all C types
+ Struct model classes can now be used in pack() and unpack() calls
+ Added missing operations to CpBinaryExpr and CpUnaryExpr
@MatrixEditor MatrixEditor marked this pull request as ready for review September 29, 2024 06:53
@MatrixEditor MatrixEditor merged commit 7d129ee into master Sep 29, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant