Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add standard_name to catalog #19

Merged
merged 16 commits into from
Jul 29, 2024
Merged

add standard_name to catalog #19

merged 16 commits into from
Jul 29, 2024

Conversation

aradhakrishnanGFDL
Copy link
Collaborator

No description provided.

@aradhakrishnanGFDL
Copy link
Collaborator Author

aradhakrishnanGFDL commented Jul 29, 2024

associated with issue #2

addresses fast option.
slow option is also coded in, but needs verifying, testing, etc.

how to test fast option to get this incorporated?


from GFDL ws-

conda activate catalogbuilder 
(or conda activate /nbhome/Aparna.Radhakrishnan/conda/envs/catalogbuilder)

go to your cloned repo's scripts. 

e.g 

cd /home/a1r/github/noaa-gfdl/CatalogBuilder/catalogbuilder/scripts

I have a test config in tests/config-cfname.yaml 
adjust output_path 

Run 
gen_intake_gfdl.py --config ../tests/config-cfname.yaml

(use from fre-cli to test as needed) 

expected output

The module intakebuilder is not installed. Do you have intakebuilder in your sys.path or have you activated the conda environment with the intakebuilder package in it? 
Attempting again with adjusted sys.path 
/home/a1r/github/noaa-gfdl/CatalogBuilder/catalogbuilder/intakebuilder/gfdlcrawler.py
No paths given, using yaml configuration
input_path : /archive/am5/am5/am5f7b10r0/c96L65_am5f7b10r0_amip/gfdl.ncrc5-deploy-prod-openmp/pp/
output_path : /home/a1r/github/noaa-gfdl/catalogs/c96L65_am5f7b10r0_amip
headerlist : ['activity_id', 'institution_id', 'source_id', 'experiment_id', 'frequency', 'realm', 'table_id', 'member_id', 'grid_label', 'variable_id', 'time_range', 'chunk_freq', 'platform', 'dimensions', 'cell_methods', 'standard_name', 'path']
output_path_template : ['NA', 'NA', 'source_id', 'NA', 'experiment_id', 'platform', 'custom_pp', 'realm', 'cell_methods', 'frequency', 'chunk_freq']
output_file_template : ['realm', 'time_range', 'variable_id']
Missing cols from metadata sources: ['activity_id', 'institution_id', 'table_id', 'member_id', 'grid_label', 'dimensions', 'standard_name']
Found existing file! Overwrite? (y/n)y
writing..
/home/a1r/github/noaa-gfdl/CatalogBuilder/catalogbuilder/scripts/gen_intake_gfdl.py:117: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['standard_name'].loc[(df['variable_id'] == k)] = v
JSON generated at: /home/a1r/github/noaa-gfdl/catalogs/c96L65_am5f7b10r0_amip.json
CSV generated at: /home/a1r/github/noaa-gfdl/catalogs/c96L65_am5f7b10r0_amip.csv

Copy link
Contributor

@ceblanton ceblanton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks functional but a little rough as you note. We can improve it later, and maybe rename the "slow" option to something more palatable :)

@aradhakrishnanGFDL aradhakrishnanGFDL merged commit 7598edf into main Jul 29, 2024
3 checks passed
@aradhakrishnanGFDL aradhakrishnanGFDL deleted the 2-stdname branch July 30, 2024 19:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants