DataCalibrated -- The worst name possible, and how we can make it better #208

relleums · 2017-01-22T16:19:15Z

Again I fell into the pit (#197) of assuming that the 'calibrated' in DataCalibrated is about the calibrations I am thinking of. In this particular case it turns out that the single pulse gain calibration, which mainly corrects for the individual pixel pre-amp magnification factors, is not yet applied to DataCalibrated. Both Data and Calibrated are imho very ambiguous names and must be replaced for future students. I propose to rename the variables and change the processing flow a bit:

We introduce a single pulse (pre-amp) gain calibration service as I started it in PR #207 .
We remove the single pulse (pre-amp) gain correction from BasicExtraction into a processor of its own maybe called fact.datacorrection.SinglePulseGainCorrection.

current name	proposed name	desc.
Data	RawPixelTimeLines	The raw ADC output from the FACT DRS boards
DataCalibrated	PixelElectricPotentialTimeLines	Here all DRS calibratin and artifact removal was applied
	PixelPhotonChargelTimeLines	Here the single pulse (pre-amp) gain correction was applied

Personally, I prefer TimeLines over TimeSerieses because of the plural 's' confusion.

Btw. Why is <fact.filter.DrsTimeCalibration outputKey="timeCalibConst" /> not a calibration service OR the other way round: Why are not all calibration services processors?

This topic is now hot again because there are now to processors which need the single pulse (pre-amp) gain correction (BasicExtraction and SinglePulseExtraction).
We have to unify the loading or in-code providing of calibration constants. Since there are currently different solutions to this in use I ask you: Which one do you prefer? How can we tidy this up?

The text was updated successfully, but these errors were encountered:

maxnoe · 2017-11-07T17:35:42Z

TimeLine is semantically wrong, see

I would propose

adc_time_series
voltage_time_series

dneise · 2017-11-08T08:48:32Z

The group of operations labeled as "calibration" varies in size and scope. E.g. the process of extracting the "charge" from a "time_series" is sometimes called calibration, c.f. MARS/Callisto.

The problem, as I understand @relleums, was not that he was confusing a time_series with a single charge value. So in a sense, the "time_series" was clear in the context and thus could be considered noise.

I think the proposal by @maxnoe is not sufficient, since it would not have solved the confusion described by @relleums. He new it was "voltages", he did not know if these voltages were already corrected for a certain effect or not.

The problem with naming these time_series as I see it is this. We have to perform a certain number of "calibration"-steps on the "raw" adc_time_series as they are written into these files ending in ".fits.fz", these include:

DRS amplitude calibration
DRS time calibration
spike removal
jump removal
single pulse gain calibration

and possibly more.

one could store the time series before and after each of these steps in this "DataItemMapDictThing" and try to find apropriate names to convey to a passerby which step produces a certain time_series, e.g.

time_series_after_drs_amplitude_calibration
time_series_after_jump_removal

But "after_jump_removal" does not tell me if the "drs_time_calibration" has been performed already or not yet, since one might not know the order of these steps.

writing the entire history of steps into the name of a time_series is utterly out of question.

So I wonder ... is there a solution at all? Or do we have to face the fact, that any name must omit some part of information? And that @relleums premise to assume he had a chance to understand what calibration steps were done or not done, by looking at the name of the thing, was simply wrong. Our system is so complicated that one can only understand what operations have been done on a certain time_series in the DataItem by carefully reading the code and the xml, since people might bypass certain processors by funny usage of keys for the processors in their xml.

As a solution I would propose this.

We have some experience after 5 years now and we kind of know what steps we want to perform on a time series before using it for anything. So this means there are two kinds of people, those who want to operate on a "ready to use" time_series. They should simply see the key "time_series" ... no need to show them any of the intermediate results.

And then there are people who are interested in optimizing one of these steps ... be it jump finding or drs-time-calibration ... these people I would consider experts ... they might need access to each and every intermediate step of the calibration-series in order to find out where a certain artifact was introduced. For them "adc_time_series" and "voltage_time_series" does not even start to solve any problem.

Fazit:

Let's remove all time_series from the DataItemThingy but the single one, that is useful. Let's call that one: "time_series"

maxnoe · 2018-03-01T13:05:00Z

I don't think this is worth the effort. It's only internal and never written out.

Please open again, if you have a strong different opinion.

relleums added enhancement help wanted labels Jan 22, 2017

maxnoe self-assigned this Nov 7, 2017

maxnoe added this to the v1.0.0 milestone Nov 7, 2017

maxnoe closed this as completed Mar 1, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DataCalibrated -- The worst name possible, and how we can make it better #208

DataCalibrated -- The worst name possible, and how we can make it better #208

relleums commented Jan 22, 2017

maxnoe commented Nov 7, 2017

dneise commented Nov 8, 2017 •

edited

Loading

maxnoe commented Mar 1, 2018

DataCalibrated -- The worst name possible, and how we can make it better #208

DataCalibrated -- The worst name possible, and how we can make it better #208

Comments

relleums commented Jan 22, 2017

maxnoe commented Nov 7, 2017

dneise commented Nov 8, 2017 • edited Loading

maxnoe commented Mar 1, 2018

dneise commented Nov 8, 2017 •

edited

Loading