`normalise()` should have consistent behaviour across DataSources: It should give mean=0 and std=1. #83

JackKelly · 2021-11-24T17:21:43Z

Describe the bug
In satellite and nwp data sources, normalise() does the right thing: it ensures that, on average, the means will be zero; and the std will be 1.

In gsp and pv data sources, normalise() rescales the values to be in the range [0, 1], which isn't exactly the same thing!

Expected behavior
For any data source that's used as an input to the model, we probably want means to be zero and std to be 1.

For the target, we may sometimes want to re-scale to [0, 1] (if, for example, we're using a sigmoid output layer). But we should probably ignore that for now 🙂

The text was updated successfully, but these errors were encountered:

peterdudfield · 2021-11-24T17:40:36Z

I know each definitely of normalise each different, but it seems right that we normalize some data to be ~N(0,1) and some to be between [0,1]

JackKelly · 2021-11-24T17:46:26Z

yeah, sorry, I think you're right that it's probably fine!

I think what's harmful is if some inputs are, like, orders of magnitude larger than some inputs. Then the model might struggle to learn which inputs are most informative (because the ones which are numerically larger will be "shouting the loudest" even if they're not actually very informative). Sure, you're right, having some inputs be ~N(0, 1) and some be in the range [0, 1] is probably fine!

You know far more maths than me, so more than happy to do whatever you think is best!

peterdudfield · 2021-12-16T15:48:36Z

Might be good to put a check pydantic validation on the one N~(0,1) say like |x|<10. Would have to work out the probability of that, but i reckon it could be like '1/'#particles in the universe)' then we will will truly catch the correct errors.

JackKelly added the bug Something isn't working label Nov 24, 2021

JackKelly added this to Nowcasting Nov 24, 2021

JackKelly moved this to Todo in Nowcasting Nov 24, 2021

JackKelly removed the bug Something isn't working label Nov 24, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`normalise()` should have consistent behaviour across DataSources: It should give mean=0 and std=1. #83

`normalise()` should have consistent behaviour across DataSources: It should give mean=0 and std=1. #83

JackKelly commented Nov 24, 2021

peterdudfield commented Nov 24, 2021

JackKelly commented Nov 24, 2021

peterdudfield commented Dec 16, 2021

normalise() should have consistent behaviour across DataSources: It should give mean=0 and std=1. #83

normalise() should have consistent behaviour across DataSources: It should give mean=0 and std=1. #83

Comments

JackKelly commented Nov 24, 2021

peterdudfield commented Nov 24, 2021

JackKelly commented Nov 24, 2021

peterdudfield commented Dec 16, 2021

`normalise()` should have consistent behaviour across DataSources: It should give mean=0 and std=1. #83

`normalise()` should have consistent behaviour across DataSources: It should give mean=0 and std=1. #83