Exa.TrkX Data IO

Generic data reader for common ML routine.

Objective

Train multiple model though a pipeline with different input and output data format can be very annoying. For example, when we want to investigate performance of each stage, we need to deal different data format for each stage and this usually requires lots of hard-coding.

This package is aim to simplify this process by providing additional data format definition file. Though this file, data can be read, process, and form suitable dataframe for later use in clean, readable way.

Features

Merge data to single object across files from different locations.
Data processing API for multi-dimensional data.
Extensible API for custom data format.

TODO

Enhance data processing API.
Support shared data for all events.
Provide data post-processing API to create missing column automatically.

Preparation

Install

You can clone this project and install with

pip3 install -e .

and walk through examples, or install package only by

pip3 install git+https://github.com/rlf23240/ExaTrkXDataIO

Testing

Before you start using this package, it is highly recommended seeing some examples in examples folder. To run the example, you need:

Install package using pip3 install -e .
Get data and place at least 10 event under examples/data. In this example, we use particles/event{evt_id}-particles.csv and feature_store/{evt_id} files. It should be placed as following:
Read through examples/configs/reader/default.yaml and examples/read.py to see how configuration file works.
Run examples/read.py.

Customization

EventFileParser

EventFileParser is responsible for loading data from file and extract desired columns from it. To customize file parsing, you can inherit EventFileParser and implement following two method:

load(self, path: Path) -> Any:

Load your data from file and return it here.
extract(self, data: Any, tag: str) -> np.array:

Extract column from data you previously loaded in load method here.

Finally, declare your parser in configuration file and you are way to go.

EventDataProcessor

EventDataProcessor is responsible for process data into suitable way to fit into a column of dataframe. For flexibility, the process is breaking into series of procedure and you are free to define your custom step. To customize data processing, you can inherit EventDataProcessor and implement following method:

process(self, data: np.array, **kwargs) -> np.array:

Process data and return your result here. No need to constrain yourself to return 1-D array, it is responsibility for user to guarantee the processing pipeline only resulting an 1-D array.

Finally, declare your processor in configuration file and you are way to go.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
ExaTrkXDataIO		ExaTrkXDataIO
examples		examples
images		images
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Exa.TrkX Data IO

Objective

Features

TODO

Preparation

Install

Testing

Customization

EventFileParser

EventDataProcessor

About

Releases

Packages

Languages

License

rlf23240/ExaTrkXDataIO

Folders and files

Latest commit

History

Repository files navigation

Exa.TrkX Data IO

Objective

Features

TODO

Preparation

Install

Testing

Customization

EventFileParser

EventDataProcessor

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages