Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataCalibrated -- The worst name possible, and how we can make it better #208

Closed
relleums opened this issue Jan 22, 2017 · 3 comments
Closed
Assignees
Milestone

Comments

@relleums
Copy link
Member

Again I fell into the pit (#197) of assuming that the 'calibrated' in DataCalibrated is about the calibrations I am thinking of. In this particular case it turns out that the single pulse gain calibration, which mainly corrects for the individual pixel pre-amp magnification factors, is not yet applied to DataCalibrated. Both Data and Calibrated are imho very ambiguous names and must be replaced for future students. I propose to rename the variables and change the processing flow a bit:

We introduce a single pulse (pre-amp) gain calibration service as I started it in PR #207 .
We remove the single pulse (pre-amp) gain correction from BasicExtraction into a processor of its own maybe called fact.datacorrection.SinglePulseGainCorrection.

current name proposed name desc.
Data RawPixelTimeLines The raw ADC output from the FACT DRS boards
DataCalibrated PixelElectricPotentialTimeLines Here all DRS calibratin and artifact removal was applied
PixelPhotonChargelTimeLines Here the single pulse (pre-amp) gain correction was applied

Personally, I prefer TimeLines over TimeSerieses because of the plural 's' confusion.

Btw. Why is <fact.filter.DrsTimeCalibration outputKey="timeCalibConst" /> not a calibration service OR the other way round: Why are not all calibration services processors?

This topic is now hot again because there are now to processors which need the single pulse (pre-amp) gain correction (BasicExtraction and SinglePulseExtraction).
We have to unify the loading or in-code providing of calibration constants. Since there are currently different solutions to this in use I ask you: Which one do you prefer? How can we tidy this up?

@maxnoe
Copy link
Member

maxnoe commented Nov 7, 2017

TimeLine is semantically wrong, see

I would propose

  • adc_time_series
  • voltage_time_series

@maxnoe maxnoe added this to the v1.0.0 milestone Nov 7, 2017
@dneise
Copy link
Member

dneise commented Nov 8, 2017

The group of operations labeled as "calibration" varies in size and scope. E.g. the process of extracting the "charge" from a "time_series" is sometimes called calibration, c.f. MARS/Callisto.

The problem, as I understand @relleums, was not that he was confusing a time_series with a single charge value. So in a sense, the "time_series" was clear in the context and thus could be considered noise.

I think the proposal by @maxnoe is not sufficient, since it would not have solved the confusion described by @relleums. He new it was "voltages", he did not know if these voltages were already corrected for a certain effect or not.

The problem with naming these time_series as I see it is this. We have to perform a certain number of "calibration"-steps on the "raw" adc_time_series as they are written into these files ending in ".fits.fz", these include:

  • DRS amplitude calibration
  • DRS time calibration
  • spike removal
  • jump removal
  • single pulse gain calibration

and possibly more.

one could store the time series before and after each of these steps in this "DataItemMapDictThing" and try to find apropriate names to convey to a passerby which step produces a certain time_series, e.g.

  • time_series_after_drs_amplitude_calibration
  • time_series_after_jump_removal

But "after_jump_removal" does not tell me if the "drs_time_calibration" has been performed already or not yet, since one might not know the order of these steps.

writing the entire history of steps into the name of a time_series is utterly out of question.

So I wonder ... is there a solution at all? Or do we have to face the fact, that any name must omit some part of information? And that @relleums premise to assume he had a chance to understand what calibration steps were done or not done, by looking at the name of the thing, was simply wrong. Our system is so complicated that one can only understand what operations have been done on a certain time_series in the DataItem by carefully reading the code and the xml, since people might bypass certain processors by funny usage of keys for the processors in their xml.


As a solution I would propose this.

We have some experience after 5 years now and we kind of know what steps we want to perform on a time series before using it for anything. So this means there are two kinds of people, those who want to operate on a "ready to use" time_series. They should simply see the key "time_series" ... no need to show them any of the intermediate results.

And then there are people who are interested in optimizing one of these steps ... be it jump finding or drs-time-calibration ... these people I would consider experts ... they might need access to each and every intermediate step of the calibration-series in order to find out where a certain artifact was introduced. For them "adc_time_series" and "voltage_time_series" does not even start to solve any problem.


Fazit:

Let's remove all time_series from the DataItemThingy but the single one, that is useful. Let's call that one: "time_series"

@maxnoe
Copy link
Member

maxnoe commented Mar 1, 2018

I don't think this is worth the effort. It's only internal and never written out.

Please open again, if you have a strong different opinion.

@maxnoe maxnoe closed this as completed Mar 1, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants