Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update backtest scripts to use ocf-data-sampler, for site #313

Open
zaryab-ali opened this issue Feb 1, 2025 · 10 comments
Open

Update backtest scripts to use ocf-data-sampler, for site #313

zaryab-ali opened this issue Feb 1, 2025 · 10 comments
Assignees

Comments

@zaryab-ali
Copy link

@peterdudfield you mentioned this in phase 2 in openclimatefix/ocf-data-sampler#98
if my understaning is correct, we are trying to change ocf_datapipes to ocf-data-sampler and the changes need to made in this file https://github.com/openclimatefix/PVNet/blob/main/scripts/backtest_sites.py

any guidence or reference for getting started would be really helpful

@peterdudfield
Copy link
Contributor

I think @AUdaltsova has been working on this, so Ill let @AUdaltsova update you.

@AUdaltsova
Copy link
Contributor

Hi @zaryab-ali, thanks for jumping in! You are right, it refers to the script you've mentioned and this UK GSP script, which is what I've been working on. We still very much need the site version sorted as well, so the help would be very welcome if it's something you want to do!

The general idea behind these scripts is to create samples for a given time window, run them through a pretrained model, and save inference results. In general, I expect you'll be able to get rid of some functions that build the datapipe from scratch and use the site dataset as is. Things like ModelPipe I'd expect will stay, but might need updating to work with ocf-data-sampler instead of ocf_datapipes.

I appreciate that it's a lot to get through those scripts and what they're trying to do! Certainly took me a while :) So don't hesitate to ping me if you need help.

Also, I expect I'll be adding the uk gsp script soon, so if you want you can wait for that to have something for reference.

@AUdaltsova
Copy link
Contributor

@zaryab-ali hi again! Just wanted to let you know that I've put the version of uk gsp backtest I've been working on here, so you can take a look if you want! NB that this script uses an older version of ocf-data-sampler (0.19 I think), so still relies on ocf_datapipes a little bit - your version shouldn't have that! You also don't need to include the ensemble_member parameter as it's specific to my work. Also, it is definitely not perfect, so feel free to take creative liberties if you think you can do something better/cleaner (eg I was thinking that populating the data config pulled off HF with paths can be done a lot better. Speaking of which, you might want to create some fake data you can point to for debugging!)

Hope that's helpful, let me know if you have any questions!

@zaryab-ali
Copy link
Author

@AUdaltsova thank you for the reference code,
i will start working on this ASAP
also let me know if there is a slack channel or something that i can join for asking questions, or should i ask about any future queries right here

@AUdaltsova
Copy link
Contributor

@zaryab-ali no problem, thanks for volunteering! As far as I know we don't operate a community slack or anything similar at the moment, so feel free to just ping me here. We have a discussions page if you have other questions unrelated to this issue

@AUdaltsova
Copy link
Contributor

Hi @zaryab-ali, I think you've deleted your comment about data download, just wanted to check if you need any help with that?

@AUdaltsova AUdaltsova assigned zaryab-ali and unassigned AUdaltsova Feb 6, 2025
@zaryab-ali
Copy link
Author

@AUdaltsova thanks for reaching out, i was having an issue with the huggingface data but thats resolved, i was hoping you could help me with another issue, i followed the readme instructions to the dot but i am getting this error when running the save_batches.py
In 'config.yaml': Could not find 'datamodule/ocf_datapipes.yaml'

Image

@AUdaltsova
Copy link
Contributor

Hi! Glad to hear it's resolved! Re: config error, you need to change where datamodule points in the config.yaml, as it defaults to a nonexistent one (maybe we should remove that...). If you're trying to create batches, I'd point it to streamed_batches.yaml (there is one in config/datamodule folder in PVNet), but be sure to check the data config it's pointing to, especially the data paths. You can look at data configs we have on HF for reference

@zaryab-ali
Copy link
Author

zaryab-ali commented Feb 13, 2025

@AUdaltsova hi again, i've made the necessary changes to backtest_sites.py ( might be a little rough and might need reviewing and changes ), i wanted to ask which branch of pvnet should i create a pull request in

@zaryab-ali
Copy link
Author

zaryab-ali commented Feb 14, 2025

also i think the example_configuation.yaml needs to be updated to something like this
`input_data:
default_history_minutes: 120(removed)
default_forecast_minutes: 480(removed)

site:
file_path: plcholder
metadata_file_path: plceholder
time_resolution_minutes: 15
interval_start_minutes: -120 # Change from 0 to -120 to get 120 minutes of history
interval_end_minutes: 480 # This is correct as is (matches your old forecast_minutes)
dropout_timedeltas_minutes: null
dropout_fraction: 0
`
and the names of the column in metadata.csv given in here need to be updated to work with ocf data sample

let me know if i am mistaken about any of this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants