Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Example Jupyter Notebook for SuperCDMS catalog access #11

Open
zonca opened this issue Oct 26, 2021 · 6 comments
Open

Example Jupyter Notebook for SuperCDMS catalog access #11

zonca opened this issue Oct 26, 2021 · 6 comments

Comments

@zonca
Copy link
Member

zonca commented Oct 26, 2021

Related to #2

@pibion it would be useful if one of your students can create a Jupyter Notebook that shows what querying functionality the DID Finder API should have.

Structure of the notebook

  • A cell at the top that defines all the input parameters for the query, for example experiment run, site, date.
  • Then a call to CDMSDataCatalog.findData to gather the requested dataset, possibly with some logic to automatically set other parameters in the query.
  • Finally access the actual data array and make a simple plot.
@zonca
Copy link
Member Author

zonca commented Nov 16, 2021

ok, example notebook is at https://gist.github.com/zonca/62784b06fabd72b254c2a7f7252eeafc

Basically the input parameters are:

Facility='SLAC', nFridgeRun=85, ProdTag='ProdG133', Series='09210915_110633', nMergeLevel=1
detnum=2

as output we get a path to a root file on CVMFS that we can load with uproot

@zonca
Copy link
Member Author

zonca commented Nov 30, 2021

this avoids the use of RQAnalysis which is changing API in the current development version, anyway the less packages we depend on, the better.

@zonca
Copy link
Member Author

zonca commented Nov 30, 2021

trying to figure out if I can run CDMSDataCatalog without cdms, so I can easily use it inside a docker container,
not sure from the docs at:

https://confluence.slac.stanford.edu/pages/viewpage.action?spaceKey=CDMS&title=SuperCDMS+Data+Catalog

@pibion
Copy link

pibion commented Nov 30, 2021

@bloer @zonca I think CDMSDataCatalog is completely independent of cdms - the only package it depends on that I know of is the SLAC data catalog.

Does CDMSDataCatalog not have a setup.py?

NVM, I see that it does, at https://gitlab.com/supercdms/DataHandling/DataCat/-/blob/master/setup.py. The good news is that the slaclab datacat is public!

@zonca
Copy link
Member Author

zonca commented Nov 30, 2021

thanks @pibion I was looking on gitblit instead of Gitlab

@zonca
Copy link
Member Author

zonca commented Nov 30, 2021

it works! It was confusing that in the example notebook at https://gist.github.com/zonca/62784b06fabd72b254c2a7f7252eeafc, the class is named DataCatClient,
but it works the same after pip installing it:

from CDMSDataCatalog import CDMSDataCatalog
dc = CDMSDataCatalog()
files = dc.findData(Facility='SLAC', nFridgeRun=85, ProdTag='ProdG133', Series='09210915_110633', nMergeLevel=1, dofetch=True)
Need to download 161.1MiB ( 1 files ) from catalog
Do you want to proceed? (y/n): y
Total progress: 100%|███████████████████████████████████████████████████████████████████████████████| 161M/161M [00:28<00:00, 5.86MB/s]
Download finished

In [4]: files
Out[4]: [<CDMSDataset Class, Name: ProdG133_09210915_110633.root>]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants