-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
yaml-tools - easily examine, compare, manipulate, and breakdown/assemble YAML files #187
Comments
This |
@guoqing-noaa These yaml tools will be useful when looking through some of the giant combined yamls. We are having some ongoing discussions about how to best save these files in the repo since looking at the giant yamls can make it very hard to review or make edits. The best strategy for now is to probably focus on reviewing the individual yaml files in |
@guoqing-noaa, @SamuelDegelia-NOAA and @hu5970 These yaml tools are good. Let's discuss how to implement these tools in RDASApp. |
@ShunLiu-NOAA @delippi @SamuelDegelia-NOAA @hu5970 To clarify, today I only presented YAML tools for developers/users to easily navigate and manipulate YAML files. How to mange YAML files in RDASAapp and rrfs-workflow is another topic, we can have a small group meeting to further discuss this. A few thoughts I have now:
|
Thank you for the presentation today @guoqing-noaa. I think this will probably be a point of discussion for us for a while going forward. I agree that we should make sure the yaml files in RDASApp can be read by the yaml tools (issue #186). Those tools are useful for parsing through very large files like we want to use for the ctests. I will update and test the yaml files in #184 for this. Regarding your point 3, this would mean that the yaml files in I think our current way of doing things using |
@SamuelDegelia-NOAA That's a good point! Previously in the GSI world, EnKF will get the OMB from GSI, so we don't need to do observer using the same obs configuration duplicately. Do we know why we cannot do a similar thing in JEDI? Sorry I am not familiar with this part. |
For the GETKF, the observer needs to be run on the modulated and real ensemble members. This means we cannot use the hofx files produced by the EnVar step and instead need to run the observer separately for GETKF (the hofx files will be much larger). JEDI also expects different settings for the GETKF yaml file (i.e., the |
@SamuelDegelia-NOAA Thanks again for your good point about reusing the I support we make every effort to make each YAML file can be used out-of-the-box without much extra effort and avoid manually adding spaces to align the ======== Here is a demo with the
we can see they are the same.
We can see that
Also to clarify, these tools are expected to help developers, aiming to streamline things and reduce tedious manual editing, without increasing burdens. There is no intention to mandate anyone to use them. If one can use other tools/methods to achieve similar capabilities, that is great. |
Thank you @guoqing-noaa for the helpful example! We will give some thoughts to this. |
A heads up, the merge of different VAR YAML files into one is expected to be much easier than the |
I added more documentation:
For the merge capability, here are some examples (taken from the above tutorial):
|
@ShunLiu-NOAA @hu5970 @SamuelDegelia-NOAA @delippi |
FYI, Here is a simple bash script which check all
|
ok, I understand that some people are not comfortable about specifying the "query string". The But for JEDI-specific YAML files, we can make it much easier for users by wrapping some details in the bash scripts. I have developed 5 BASH scripts based on
For details, please visit: |
@guoqing-noaa, I feel like this is more complicated than needed. The tools might be good to use for some instances, but really people should be just looking at the obs space yamls individually. They are short enough to handle. Don't even look at the large generated yaml--there's no reason to. You've made some bash scripts with "almost no learning curves", well that is exactly what the gen_yaml.sh has done but simpler--I promise you that it is. You just simply comment out the files you don't want to cat to your super.yaml. The obs space yamls are written to be modular and so that you can easily compare side by side and reuse for new obtypes since most things could be carried over. We don't need to have yamls that work "out of the box" because it is simple enough to just cat them. Furthermore, this is all just temporary until we move to JCB to build our yamls. |
In NCO operation, almost everything is fixed and some aspects get much simpler. But the research and development will have much more needs on efficiently handling the final giant YAML files than you may anticipate. I see many of such needs. It takes time to cover all of them but here I can give a few examples beyond our normal data analysis tasks:
Anyway, the final giant YAML file is the “gold standard” when we talk about DA configurations among different experiments, different research partners, different applications (verification, DA monitoring, etc), and more. And the |
yaml-tools
YAML
is one of the core components of the JEDI system. Efficiently handling YAML files is crucial for utilizing JEDI in both research and operational development. We need simple, intuitive, and user-friendly YAML tools to help scientists easily examine, compare, manipulate, and breakdown/assemble YAML files.Python offers PyYAML module, which is powerful for developers to control details over YAML files. However, it comes with a learning curve and requires coding/debugging.
On the other hand,
yq
a lightweight and portable command-line YAML, JSON and XML processor. While useful, it lacks a few key features that are essential for JEDI YAML file manipulation:cost function
, but currently, as far as I know,yq
does not support handling spaces in key names.yq
does not provide a quick way to view top-level keys at the current nesting level.yq
does not support traversing a YAML file to output a tree structure of its keys.A PyYaml-based
yaml-tools
repository is developed to address the above limitations. This repo includes the following utilities:1.
ycheck
This script just loads a yaml file and then dumps data to stdout. If a yaml file contains non-standard elements, it will halt and provide detailed error information.
ycheck sample.yaml
2.
yquery
This script queries a given element using a query string.
shallow
is the default behavior which output the top level keys at the current nesting level3.
ybreakdown
This script breaks down a YAML file into individual elements, from top to bottom, and generates a corresponding directory tree. Each intermediate sub-YAML file is dumped into its respective directory, making it easy to examine a given YAML file structure step by step.
ybreakdown sample.yaml
Mini Tutorial
This repository assumes the current Python environment has installed the
PyYAML
module.On NOAA RDHPCS,
PyYAML
can be found in the RDASAppEVA
Python environment.1. ycheck
You will get the following error message:
You can diff this file with
samples/mpasjedi_en3dvar.yaml
to see what changes can fix this error.2. yquery
3. ybreakdown
Under the
observers
subdirectory, you can see 16 observers and you can compare configurations from different observers.The text was updated successfully, but these errors were encountered: