-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* refigured prepare_docstrings, started adding pages for CLI documentation * adding logo to readme * formatting... * Merge branch 'update-docs' of https://github.com/zbilodea/odapt into update-docs * issues with merge... * issues with merge * chore: update README.md * small changes to main * merging * cli changes should be complete * formatting... * still formatting... * guide done for now --------- Co-authored-by: Eduardo Rodrigues <[email protected]>
- Loading branch information
1 parent
ad7fa50
commit 517289b
Showing
37 changed files
with
868 additions
and
167 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,43 @@ | ||
CLI Guide for add_histograms (add) | ||
================================== | ||
|
||
Instructions for function `add_histograms <https://hepconvert.readthedocs.io/en/latest/hepconvert.histogram_adding.add_histograms.html>`__. | ||
|
||
Command: | ||
-------- | ||
|
||
.. code-block:: bash | ||
hepconvert add [options] [OUT_FILE] [IN_FILES] | ||
Examples: | ||
--------- | ||
|
||
.. code-block:: bash | ||
hepconvert add -f --progress-bar --union summed_hists.root hist1.root hist2.root hist3.root | ||
Or, if files are in a directory: | ||
|
||
.. code-block:: bash | ||
hepconvert add -f --append --same_names summed_hists.root path/directory/ | ||
Options: | ||
-------- | ||
|
||
``--force``, ``-f`` Use flag to overwrite a file if it already exists. | ||
|
||
``--progress-bar`` Will show a basic progress bar to show how many histograms have summed, and how many files have been read. | ||
|
||
``--append``, ``-a`` Will append histograms to an existing file. | ||
|
||
``--compression``, ``-c`` Compression type. Options are "lzma", "zlib", "lz4", and "zstd". Default is "zlib". | ||
|
||
``--compression-level`` Level of compression set by an integer. Default is 1. | ||
|
||
``--union`` Use flag to add together histograms that have the same name and append all others to the new file. | ||
|
||
``--same-names`` Use flag to only add histograms together if they have the same name. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
.. toctree:: | ||
:caption: Command Line Interface Instructions | ||
:hidden: | ||
|
||
parquet-to-root <parquet_to_root> | ||
root-to-parquet <root_to_parquet> | ||
copy-root <copy_root> | ||
merge-root <merge_root> | ||
add (add_histograms) <add> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,57 @@ | ||
Command Line Interface Guide: copy_root | ||
======================================= | ||
|
||
Instructions for function `hepconvert.copy_root <https://hepconvert.readthedocs.io/en/latest/hepconvert.copy_root.copy_root.html>`__ | ||
|
||
Command: | ||
-------- | ||
|
||
.. code-block:: bash | ||
hepconvert copy-root [options] [OUT_FILE] [IN_FILE] | ||
Examples: | ||
--------- | ||
|
||
.. code-block:: bash | ||
hepconvert copy-root -f --progress-bar --keep-branches 'Jet_*' out_file.root in_file.root | ||
Branch skimming using ``cut``: | ||
|
||
.. code-block:: bash | ||
hepconvert copy-root -f --keep-branches 'Jet_*' --cut 'Jet_Px > 5' out_file.root in_file.root | ||
Options: | ||
-------- | ||
|
||
``--drop-branches``, ``-db`` and ``--keep-branches``, ``-kb`` list, str or dict. Specify branch names to remove from the ROOT file. Either a str, list of str (for multiple branches), or a dict with form {'tree': 'branches'} to remove branches from certain ttrees. Wildcarding accepted. | ||
|
||
``--drop-trees``, ``-dt`` and ``--keep-trees``, ``-kt`` list of str, or str. Specify tree names to remove/keep TTrees in the ROOT files. Wildcarding accepted. | ||
|
||
``--cut`` For branch skimming, passed to `uproot.iterate <https://uproot.readthedocs.io/en/latest/uproot.behaviors.TBranch.iterate.html>`__. str, if not None, this expression filters all of the expressions. | ||
|
||
``--expressions`` For branch skimming, passed to `uproot.iterate <https://uproot.readthedocs.io/en/latest/uproot.behaviors.TBranch.iterate.html>`__. Names of TBranches or aliases to convert to ararys or mathematical expressions of them. If None, all TBranches selected by the filters are included. | ||
|
||
``--force``, ``-f`` Use flag to overwrite a file if it already exists. | ||
|
||
``--progress-bar`` Will show a basic progress bar to show how many TTrees have merged and written. | ||
|
||
``--append``, ``-a`` Will append new TTree to an existing file. | ||
|
||
``--compression``, ``-c`` Compression type. Options are "lzma", "zlib", "lz4", and "zstd". Default is "zlib". | ||
|
||
``--compression-level`` Level of compression set by an integer. Default is 1. | ||
|
||
``--name`` Give a name to the new TTree. Default is "tree". | ||
|
||
``--title`` Give a title to the new TTree. | ||
|
||
``--initial-basket-capacity`` (int) Number of TBaskets that can be written to the TTree without rewriting the TTree metadata to make room. Default is 10. | ||
|
||
``--resize-factor`` (float) When the TTree metadata needs to be rewritten, this specifies how many more TBasket slots to allocate as a multiplicative factor. Default is 10.0. | ||
|
||
``--step-size`` Size of batches of data to read and write. If an integer, the maximum number of entries to include in each iteration step; if a string, the maximum memory size to include. The string must be a number followed by a memory unit, such as “100 MB”. Default is "100 MB" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,229 @@ | ||
General Guide and Examples: | ||
=========================== | ||
Is something missing from this guide? Please post your questions on the `discussions page <https://github.com/scikit-hep/hepconvert/discussions>`__! | ||
|
||
Features of all (or most) functions: | ||
---------------------------------------- | ||
|
||
**Automatic handling of Uproot duplicate counter issue:** | ||
If you are using a hepconvert function that goes ROOT -> ROOT (both the input and output files are ROOT) | ||
and working with data in jagged arrays, if branches have the same "fLeafCount", hepconvert | ||
will group branches automatically so that Uproot will not create a `counter branch for each branch <https://github.com/scikit-hep/uproot5/discussions/903>`__. | ||
|
||
**Quick Modifications of ROOT files and TTrees:** | ||
|
||
Functions ``copy_root``, ``merge_root``, and ``root_to_parquet`` have a few options for applying quick | ||
modifications to ROOT files and TTree data. | ||
|
||
**Branch slimming:** | ||
Parameters ``keep_branches`` or ``drop_branches`` (list or dict) control branch slimming. | ||
Examples: | ||
|
||
.. code:: python | ||
>>> hepconvert.root_to_parquet("out_file.root", "in_file.root", keep_branches="x*", progress_bar=True, force=True) | ||
# Before: | ||
# name | typename | interpretation | ||
# ---------------------+--------------------------+------------------------------- | ||
# x1 | int64_t | AsDtype('>i8') | ||
# x2 | int64_t | AsDtype('>i8') | ||
# y1 | int64_t | AsDtype('>i8') | ||
# y2 | int64_t | AsDtype('>i8') | ||
# After: | ||
# name | typename | interpretation | ||
# ---------------------+--------------------------+------------------------------- | ||
# x1 | int64_t | AsDtype('>i8') | ||
# x2 | int64_t | AsDtype('>i8') | ||
.. code:: python | ||
>>> hepconvert.root_to_parquet("out_file.root", "in_file.root", keep_branches={"tree1": ["branch2", "branch3"], "tree2": ["branch2"]}, progress_bar=True, force=True) | ||
# Before: | ||
# Tree1: | ||
# name | typename | interpretation | ||
# ---------------------+--------------------------+------------------------------- | ||
# branch1 | int64_t | AsDtype('>i8') | ||
# branch2 | int64_t | AsDtype('>i8') | ||
# branch3 | int64_t | AsDtype('>i8') | ||
# Tree2: | ||
# name | typename | interpretation | ||
# ---------------------+--------------------------+------------------------------- | ||
# branch1 | int64_t | AsDtype('>i8') | ||
# branch2 | int64_t | AsDtype('>i8') | ||
# branch3 | int64_t | AsDtype('>i8') | ||
# After: | ||
# Tree1: | ||
# name | typename | interpretation | ||
# ---------------------+--------------------------+------------------------------- | ||
# branch2 | int64_t | AsDtype('>i8') | ||
# branch3 | int64_t | AsDtype('>i8') | ||
# Tree2: | ||
# name | typename | interpretation | ||
# ---------------------+--------------------------+------------------------------- | ||
# branch2 | int64_t | AsDtype('>i8') | ||
**Branch skimming:** | ||
Parameters ``cut`` and ``expressions`` control branch skimming. Both of these parameters go to Uproot's `iterate | ||
<https://uproot.readthedocs.io/en/latest/uproot.behaviors.TBranch.iterate.html>`__ | ||
function. See Uproot's documentation for more details. | ||
|
||
Basic example: | ||
|
||
.. code:: python | ||
hepconvert.copy_root("skimmed_HZZ.root", "HZZ.root", keep_branches="Jet_", | ||
force=True, expressions="Jet_Px", cut="Jet_Px >= 10",) | ||
**Remove TTrees:** | ||
Use parameters ``keep_ttrees`` or ``drop_ttrees`` to remove TTrees. | ||
|
||
.. code:: python | ||
# Creating example data: | ||
with uproot.recreate("two_trees.root") as file: | ||
file["tree"] = {"x": np.array([1, 2, 3])} | ||
file["tree1"] = {"x": np.array([1, 2, 3])} | ||
hepconvert.copy_root("one_tree.root", "two_trees.root", keep_trees=tree, | ||
force=True, expressions="Jet_Px", cut="Jet_Px >= 10",) | ||
**How hepconvert works with ROOT** | ||
|
||
hepconvert uses Uproot for reading and writing ROOT files; it also has the same limitations. | ||
It currently only works with flat TTrees (nanoAOD-like data), and cannot yet read or write RNTuples. | ||
|
||
As described in Uproot's documentation: | ||
|
||
.. note:: | ||
|
||
A small but growing list of data types can be written to files: | ||
|
||
* strings: TObjString | ||
* histograms: TH1*, TH2*, TH3* | ||
* profile plots: TProfile, TProfile2D, TProfile3D | ||
* NumPy histograms created with `np.histogram <https://numpy.org/doc/stable/reference/generated/numpy.histogram.html>`__, `np.histogram2d <https://numpy.org/doc/stable/reference/generated/numpy.histogram2d.html>`__, and `np.histogramdd <https://numpy.org/doc/stable/reference/generated/numpy.histogramdd.html>`__ with 3 dimensions or fewer | ||
* histograms that satisfy the `Universal Histogram Interface <https://uhi.readthedocs.io/>`__ (UHI) with 3 dimensions or fewer; this includes `boost-histogram <https://boost-histogram.readthedocs.io/>`__ and `hist <https://hist.readthedocs.io/>`__ | ||
* PyROOT objects | ||
|
||
**Memory Management** | ||
|
||
Each hepconvert function has automatic and customizable memory management for working with large files. | ||
|
||
Functions reading **ROOT** files will read in batches controlled by the parameter ``step_size``. | ||
Set ``step_size`` to either an `int` to set the batch size to a number of entries, or a `string` in | ||
form of "100 MB". | ||
|
||
|
||
**Progress Bars** | ||
hepconvert uses the package tqdm for progress bars, if you do not have the package installed an error message will provide installation instructions. | ||
They are controlled with the ``progress_bar`` argument. | ||
For example, to use a default progress bar with copy_root, set progress_bar to True: | ||
|
||
.. code:: python | ||
hepconvert.copy_root("out_file.root", "in_file.root", progress_bar=True) | ||
Some functions can handle a customized tqdm progress bar. | ||
To use a customized tqdm progress bar, make a progress bar object and pass it to the hepconvert function like so, | ||
|
||
.. code:: python | ||
>>> import tqdm | ||
>>> bar_obj = tqdm.tqdm(colour="GREEN", desc="Description") | ||
>>> hepconvert.add_histograms("out_file.root", "path/in_files/", progress_bar=bar_obj) | ||
.. image:: https://raw.githubusercontent.com/scikit-hep/hepconvert/main/docs/docs-img/progress_bar.png | ||
:width: 450px | ||
:alt: hepconvert | ||
:target: https://github.com/scikit-hep/hepconvert | ||
|
||
|
||
Some types of tqdm progress bar objects may not work in this way. | ||
|
||
|
||
**Command Line Interface** | ||
|
||
All functions are able to be run in the command line. See the "Command Line Interface Instructions" tab on the left to see CLI | ||
instructions on individual functions. | ||
|
||
Adding Histograms | ||
----------------- | ||
``hepconvert.add_histograms`` adds the values of many histograms | ||
and writes the summed histograms to an output file (like ROOT's hadd, but limited | ||
to histograms). | ||
|
||
|
||
**Parameters of note:** | ||
|
||
``union`` If True, adds the histograms that have the same name and appends all others | ||
to the new file. | ||
|
||
``append`` If True, appends histograms to an existing file. Force and append | ||
cannot both be True. | ||
|
||
``same_names`` If True, only adds together histograms which have the same name (key). If False, | ||
histograms are added together based on TTree structure (bins must be equal). | ||
|
||
Memory: | ||
``add_histograms`` has no memory customization available currently. To maintain | ||
performance it stores the summed histograms in memory until all files have | ||
been read, then the summed histograms are written to the output file. Only | ||
one input ROOT file is read and kept in memory at a time. | ||
|
||
|
||
Merging TTrees | ||
-------------- | ||
``hepconvert.merge_root`` merges TTrees in multiple ROOT files together. The end result is a single file containing data from all input files (again like ROOT's hadd, but can handle flat TTrees and histograms). | ||
|
||
.. warning:: | ||
At the moment, hepconvert.merge can only merge TTrees that have the same | ||
number of branches, with the same names and datatypes. | ||
We are working on adding backfill capabilities for mismatched TTrees. | ||
|
||
**Features:** | ||
merge_root has parameters ``cut``, ``expressions``, ``drop_branches``, ``keep_branches``, ``drop_trees`` and ``keep_trees``. | ||
|
||
|
||
Copying TTrees | ||
-------------- | ||
``hepconvert.copy_root`` copies TTrees in multiple ROOT files together. | ||
|
||
.. warning:: | ||
At the moment, hepconvert.merge can only merge TTrees that have the same | ||
number of branches, with the same names and datatypes. | ||
We are working on adding backfill capabilities for mismatched TTrees. | ||
|
||
**Features:** | ||
merge_root has parameters ``cut``, ``expressions``, ``drop_branches``, ``keep_branches``, ``drop_trees`` and ``keep_trees``. | ||
|
||
|
||
Parquet to ROOT | ||
--------------- | ||
|
||
Writes the data from a single Parquet file to one TTree in a ROOT file. | ||
This function creates a new TTree (name the new tree with parameter ``tree``). | ||
|
||
|
||
ROOT to Parquet | ||
--------------- | ||
|
||
Writes the data from one TTree in a ROOT file to a single Parquet file. | ||
If there are multiple TTrees in the file, specify one TTree to write to the Parquet file using the ``tree`` parameter. | ||
|
||
**Features:** | ||
root_to_parquet has parameters ``cut``, ``expressions``, ``drop_branches``, ``keep_branches``. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
.. toctree:: | ||
:caption: Guide with Examples | ||
:hidden: | ||
|
||
general_guide |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
hepconvert.add_histograms | ||
========================= | ||
|
||
Defined in `hepconvert.histogram_adding <https://github.com/zbilodea/hepconvert/blob/52e6cbfbbf81c669ca31b8a538d8f3e8984b35a5/src/hepconvert/histogram_adding.py>`__ on `line 345 <https://github.com/zbilodea/hepconvert/blob/52e6cbfbbf81c669ca31b8a538d8f3e8984b35a5/src/hepconvert/histogram_adding.py#L345>`__. | ||
|
||
.. autofunction:: hepconvert.add_histograms |
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
hepconvert.copy_root | ||
==================== | ||
|
||
Defined in `hepconvert.copy_root <https://github.com/zbilodea/hepconvert/blob/52e6cbfbbf81c669ca31b8a538d8f3e8984b35a5/src/hepconvert/copy_root.py>`__ on `line 15 <https://github.com/zbilodea/hepconvert/blob/52e6cbfbbf81c669ca31b8a538d8f3e8984b35a5/src/hepconvert/copy_root.py#L15>`__. | ||
|
||
.. autofunction:: hepconvert.copy_root |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.