Support for measure normalization #141

akariv · 2016-02-14T20:49:56Z

I've come to realize that measure normalization is a must if we want to enable any kind of data-set comparison in the future.

The reason is that datasets come with many different structures. To give a few examples, these datasets are ones I worked with in the past couple of weeks:

The Moldovan Budget, with measures for approved, adjusted and executed budgets
The Israeli budget: started with measures for expenditure, commitment_limit, personnel_cap and others, repeated for (approved, adjusted and partly for executed) - that is, the data contains a column called commitment_limit_adjusted. At some point, the format changed and now it's just expenditure, commitment_limit, personnel_caps, with separate lines for approved, adjusted and executed values.
The US budget: They have different data-sets for different 'phases' (authority & outlays), each dataset has 40+ measures - one per year since 1976 until 2021.

It is clear that each of these datasets can be described by the current FDP spec. But the spec does not provide any means to describe the 'implicit dimension' that's created by the multiple measures - e.g. for the Moldovan budget it's the phase dimension, while for the US budget it's the year.

To be able to properly compare these kind of datasets, the spec needs to provide a way to describe the difference between the measures in a systematic way.

To elaborate with an example, when describing the old-style Israeli budget, I'd like to be able to say that it contains two implicit dimensions - 'phase' and 'type(?)'. For each column containing a measure, the FDP would describe its meaning by providing values to these dimensions (e.g. the 'commitment_adjusted' column has 'commitment_limit' for 'type' and 'adjusted' for 'phase').

Once we have this kind of description, we'll be able to normalize these datasets, which has many benefits in comparing datasets, generic tools and analytics etc.

The text was updated successfully, but these errors were encountered:

pwalsh · 2016-02-15T06:04:43Z

+1

Not the same, but intrinsically related to #91

danfowler · 2016-03-22T10:06:13Z

+1 @akariv

Looking forward to seeing the type system you've developed.

rufuspollock · 2016-03-23T16:49:40Z

@akariv yes - this is important and something long thought about so very excited that we will be addressing it. In general we need some kind of "numeraire" concept that we use either explicitly in the spec or implicitly in the OS processing system.

BTW see this older issue on OS main issue tracker: openspending/openspending#284

Comments:

The US budget: They have different data-sets for different 'phases' (authority & outlays), each dataset has 40+ measures - one per year since 1976 until 2021.

Do you mean a measure per year - we should not allow that i think ... (there's an issue on this but can't find it right now)

akariv · 2016-06-05T06:40:51Z

Alright, so here's a suggestion - let me know what you think so we can proceed.

I propose we add a new kind of dimension attribute, which could be used to normalise the fiscal data into a 'single measure' dataset.

This new attribute should be very similar to a constant attribute, with the only difference being that instead a single value it would define a value per measure defined in the model.

"dimensions": {
  "phase": {
     "attributes": {
        "phase": {      
            "byMeasure": {  (This comes instead of `source` or `constant`)
               "approved": "Approved",
               "adjusted":  "Adjusted",
               "executed": "Executed"
             }
...

When loading the data for the datapackage, implementers might use this dimension's definition in order to convert data looking like this:

cofog,approved,executed
r&d,1000000,800000

To

cofog,phase,amount
r&d,Approved,1000000
r&d,Executed,800000

pwalsh · 2016-06-05T08:17:26Z

+1

pwalsh · 2017-09-07T16:19:17Z

Moving to frictionlessdata/datapackage-fiscal#3

danfowler added the Discussion label Mar 21, 2016

akariv mentioned this issue Jun 5, 2016

Worked up example needed of transforming datasets with per-line "direction" #91

Closed

pwalsh mentioned this issue Jan 5, 2024

"Concepts" enhancements for Fiscal Data Package frictionlessdata/datapackage-fiscal#3

Open

4 tasks

pwalsh closed this as completed Sep 7, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for measure normalization #141

Support for measure normalization #141

akariv commented Feb 14, 2016

pwalsh commented Feb 15, 2016

danfowler commented Mar 22, 2016

rufuspollock commented Mar 23, 2016

akariv commented Jun 5, 2016

pwalsh commented Jun 5, 2016

pwalsh commented Sep 7, 2017