Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add RDataFrame to the primer #724

Merged
merged 1 commit into from
Sep 2, 2022
Merged

Conversation

vepadulano
Copy link
Member

@vepadulano vepadulano commented Jan 18, 2022

This commit adds RDataFrame content to the ROOT primer. It also removes obsolete content about TNTuple, TSelector and PROOF Lite to discuss data analysis in terms of RDataFrame. Parts of the old content that were referring to the underlying data format have been kept and reworded under a new section about the ROOT dataset

Co-authored-by: Ivan Kabadzhov [email protected]
Co-authored-by: Enrico Guiraud [email protected]

@vepadulano vepadulano self-assigned this Jan 18, 2022
Copy link
Member

@eguiraud eguiraud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

build/724/primer/index.html:3417:<a href="https://root.cern.ch/doc/master/classROOT_1_1RDataFrame.html#parallel-execution">in the RDataFrame documentation</a>.</p>
build/724/primer/index.html:3430:<a href="https://root.cern.ch/doc/master/classROOT_1_1RDataFrame.html#python">in the RDataFrame documentation</a>.</p>

Found 2 links to root.cern.ch. Please change them to link to root.cern (no '.ch') instead.

@couet couet requested a review from eguiraud January 19, 2022 08:52
@couet
Copy link
Member

couet commented Jan 19, 2022

I see the NTuple paragraph showing how to use TNtuple has been completely removed. I understand we want to remove the PROOF part (as PROOF is now old software and has been declared as such). But TNtuple has not been marked as legacy code and many tutorials are using it.

Removing it from the primer is a "political" (I do not have better word) choice, which is perfectly understandable. I think it is fair this choice if done by the new young generation of the ROOT developers and not by an old developper like me. So I approve this PR.

@couet
Copy link
Member

couet commented Jan 19, 2022

Also, the original markdown version of the primer is still in the ROOT repository. It allows to generate printable versions of the primer (pdf, epub ...). With this PR this markdown primer will not be in sync with the html one in the ROOT web site.

My question is: Should we completely remove this markdown primer for the ROOT repo ?

This PR removes it: root-project/root#9619
This PR removes the Primer build: root-project/rootspi#108

@couet
Copy link
Member

couet commented Jan 19, 2022

One more comment:
At the top of the html Primer we have:

Original Authors: D. Piparo, G. Quast, M. Zeise

Given the major changes this PR provides, I guess it is worth mentioning the new authors (Ivan Enrico Vincenzo)

@couet
Copy link
Member

couet commented Jan 19, 2022

There is also the version of the Primer (an "interactive one") which is also tout of sync. Should it be removed also ?
https://github.com/root-project/rootspi/tree/master/ROOT-Primer

@eguiraud
Copy link
Member

eguiraud commented Jan 19, 2022

My question is: Should we completely remove this markdown primer for the ROOT repo ?

There is also the version of the Primer (an "interactive one") which is also tout of sync. Should it be removed also ?

You tell us 😬

@couet
Copy link
Member

couet commented Jan 19, 2022

You tell us

I made two PRs to remove the Markdown version

And, yes, I will be in favour to remove the "interactive one" also (need to check first from where it is accessible)

@vepadulano vepadulano force-pushed the rdf-in-primer branch 2 times, most recently from 4f2b280 to 576a767 Compare January 19, 2022 19:20
Copy link
Contributor

@jalopezg-git jalopezg-git left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM; just some minor comments.

primer/index.md Outdated Show resolved Hide resolved
primer/index.md Outdated Show resolved Hide resolved
primer/index.md Outdated Show resolved Hide resolved
primer/index.md Outdated
RDataFrame reads collections as the special type
[ROOT::RVec](https://root.cern/doc/master/classROOT_1_1VecOps_1_1RVec.html):
for example, a column where each element is an array of floating point numbers
can be read as a ROOT::RVecF. C-style arrays (with variable or static size),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that we talked about the RVec type (and not RVecF), I would prefer to stick to that, i.e.

Suggested change
can be read as a ROOT::RVecF. C-style arrays (with variable or static size),
can be read as a `ROOT::VecOps::RVec<float>`. C-style arrays (with variable or static size),

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should advertise the short aliases though

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I can write a short sentence describing the existence of the aliases

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a small sentence there. If it looks ok this can be merged

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to figure out what to do with the other versions of the primer before merging

primer/index.md Outdated Show resolved Hide resolved
primer/index.md Outdated Show resolved Hide resolved
primer/index.md Show resolved Hide resolved
@Axel-Naumann
Copy link
Member

Axel-Naumann commented Jan 20, 2022

Do we have a chance to keep a notebook version of the primer, maybe even sort of automatic? Or do yous think that it's not worth it?

@couet
Copy link
Member

couet commented Jan 21, 2022

I think it will be much simpler to maintain only one version of the Primer. We have plenty of tutorials in Notebook format. The Primer is really a manual. I am not sure it makes sense to have it as a Notebook. Moreover, this PR highlight the fact the Primer is something important to have, it evolves, it is not frozen. Several versions will be a nightmare to maintain. I would go toward simplification.

@Axel-Naumann
Copy link
Member

Axel-Naumann commented Jan 21, 2022

The Primer is really a manual.

How so - it seems to have lot of code?

My question isn't whether we can re-introduce duplication but whether we should really drop the notebook. I.e. can we either generate the notebook from the single source, or should the notebook be the single source?

@couet
Copy link
Member

couet commented Jan 21, 2022

How so - it seems to have lot of code?

Yes but in the "Manual" and in the "reference Guide"there is a lot of code too.

I must admit I do not really know how the Notebook is generated. I need to check. I do not even know where it can seen. I try to find it online yesterday without success.

This commit adds RDataFrame content to the ROOT primer. It also removes obsolete content about TNTuple, TSelector and PROOF Lite to discuss data analysis in terms of RDataFrame. Parts of the old content that were referring to the underlying data format have been kept and reworded under a new section about the ROOT dataset

Co-authored-by: Ivan Kabadzhov <[email protected]>
Co-authored-by: Enrico Guiraud <[email protected]>
@couet
Copy link
Member

couet commented Mar 7, 2022

Should this go in: https://github.com/root-project/NotebookPrimer ?

@couet couet merged commit 0ae075b into root-project:main Sep 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants