Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mapping between cube-sphere grid and latitude and longitude #2694

Open
beiduqiu opened this issue Jan 19, 2025 · 10 comments
Open

Mapping between cube-sphere grid and latitude and longitude #2694

beiduqiu opened this issue Jan 19, 2025 · 10 comments
Assignees
Labels
category: Question Further information is requested topic: GCHP Related to GCHP only topic: Regridding Related to regridding files before or after run

Comments

@beiduqiu
Copy link

Your name

Zifan WANG

Your affiliation

Washington University in Saint Louis

Please provide a clear and concise description of your question or discussion topic.

Hi, I am trying to find the mapping between the cube-sphere grid in the simulation such as (6,24,24) for C24 resolution and the latitude and longitude in the simulation data. Could you tell me how this mapping can be done in GCHP or point out the location in the source code that does this mapping?

@beiduqiu beiduqiu added the category: Question Further information is requested label Jan 19, 2025
@lizziel
Copy link
Contributor

lizziel commented Jan 21, 2025

Hi @beiduqiu, do you mean regridding the inputs to the run resolution or the outputs to lat-lon? Is there a specific thing you are trying to do in the simulation, such as print lat and lon coordinates for an (I,J,L) cell?

@lizziel lizziel self-assigned this Jan 21, 2025
@lizziel lizziel added topic: GCHP Related to GCHP only topic: Regridding Related to regridding files before or after run labels Jan 21, 2025
@msulprizio
Copy link
Contributor

Please also see this feature in gcpy: geoschem/gcpy#277 added by @yantosca

@beiduqiu
Copy link
Author

beiduqiu commented Jan 21, 2025

Hi @beiduqiu, do you mean regridding the inputs to the run resolution or the outputs to lat-lon? Is there a specific thing you are trying to do in the simulation, such as print lat and lon coordinates for an (I,J,L) cell?

Hi, I’m currently working on workload prediction and load balancing for GCHP. My goal is to estimate the workload for each column using source data, such as meteorology or Hemco data. However, I’ve noticed that the source data is assigned based on lat-lon grids, whereas the simulation uses cubed-sphere grids.

To address this, I’m looking to build a mapping between the source data and the KPP steps for each column. At the moment, I only have a mapping between (columns, KPP steps for each column). To achieve my goal, I need to establish a mapping between lat-lon grids and columns.

Would you have any advice or suggestions on how to efficiently construct this mapping or insights into relevant tools that could assist in this process? I believe if I can print the lat-lon address for an (I,J,L) will be very helpful.

@lizziel
Copy link
Contributor

lizziel commented Jan 22, 2025

Hmm, I'm not quite following. The input data, e.g. meteorology, is regridded upon MAPL read to the cubed-sphere grid. You can output this data on cubed-sphere rather than look at lat-lon. Or perhaps I am misunderstanding?

If you would like to print lat-lon you can see it from State_Grid parameters:

!=========================================================================
! Derived type for Grid State
!=========================================================================
TYPE, PUBLIC :: GrdState
!----------------------------------------
! User-defined grid fields
!----------------------------------------
#if defined( MODEL_WRF ) || defined( MODEL_CESM )
! Grid numbers for WRF and CESM, for each CPU to run multiple instances of GEOS-Chem. These numbers are unique-per-core (local).
! A pair of (Input_Opt%thisCPU, State_Grid%CPU_Subdomain_ID) is needed to uniquely identify a geographical region.
INTEGER :: CPU_Subdomain_ID ! Grid identifier number (local) (WRF: domain number, CESM: chunk number/lchnk)
INTEGER :: CPU_Subdomain_FirstID ! First grid identifier number (local) in this CPU
#endif
CHARACTER(LEN=255) :: GridRes ! Grid resolution
REAL(fp) :: DX ! Delta X [degrees longitude]
REAL(fp) :: DY ! Delta Y [degrees latitude]
REAL(fp) :: XMin ! Minimum X value [degrees longitude]
REAL(fp) :: XMax ! Maximum X value [degrees longitude]
REAL(fp) :: YMin ! Minimum Y value [degrees latitude]
REAL(fp) :: YMax ! Maximum Y value [degrees latitude]
INTEGER :: NX ! # of grid boxes in X-direction
INTEGER :: NY ! # of grid boxes in Y-direction
INTEGER :: NZ ! # of grid boxes in Z-direction
LOGICAL :: HalfPolar ! Use half-sized polar boxes?
LOGICAL :: Center180 ! Is the Int'l a model midpoint (T) or edge (F)?
LOGICAL :: NestedGrid ! Is it a nested grid sim?
INTEGER :: NorthBuffer ! # buffer grid boxes on North edge
INTEGER :: SouthBuffer ! # buffer grid boxes on South edge
INTEGER :: EastBuffer ! # buffer grid boxes on East edge
INTEGER :: WestBuffer ! # buffer grid boxes on West edge
!----------------------------------------
! Grid fields computed in gc_grid_mod.F90
!----------------------------------------
INTEGER :: GlobalNX ! NX on the global grid
INTEGER :: GlobalNY ! NY on the global grid
INTEGER :: NativeNZ ! NZ on the native-resolution grid
INTEGER :: MaxChemLev ! Max # levels in chemistry grid
INTEGER :: MaxStratLev ! Max # levels below strat
INTEGER :: MaxTropLev ! Max # levels below trop
INTEGER :: XMinOffset ! X offset from global grid
INTEGER :: XMaxOffset ! X offset from global grid
INTEGER :: YMinOffset ! Y offset from global grid
INTEGER :: YMaxOffset ! Y offset from global grid
! Arrays
REAL(fp), POINTER :: GlobalXMid (:,:) ! Lon centers on global grid [deg]
REAL(fp), POINTER :: GlobalYMid (:,:) ! Lat centers on global grid [deg]
REAL(fp), POINTER :: GlobalXEdge(:,:) ! Lon centers on global grid [deg]
REAL(fp), POINTER :: GlobalYEdge(:,:) ! Lat centers on global grid [deg]
REAL(fp), POINTER :: XMid (:,:) ! Lon centers [degrees]
REAL(fp), POINTER :: XEdge (:,:) ! Lon edges [degrees]
REAL(fp), POINTER :: YMid (:,:) ! Lat centers [degrees]
REAL(fp), POINTER :: YEdge (:,:) ! Lat edges [degrees]
REAL(fp), POINTER :: YMid_R (:,:) ! Lat centers [radians]
REAL(fp), POINTER :: YEdge_R (:,:) ! Lat edges [radians]
REAL(fp), POINTER :: YSIN (:,:) ! SIN( lat edges )
REAL(fp), POINTER :: Area_M2 (:,:) ! Grid box area [m2]
#if defined( MODEL_GEOS )
! Are we in the predictor step?
LOGICAL :: PredictorIsActive
#endif
END TYPE GrdState

The mapping itself is done via ESMF. You can get the regridding weights and mapping using GCPy (see earlier comment from @msulprizio)

@lizziel
Copy link
Contributor

lizziel commented Jan 22, 2025

Tagging @sdeastham

@sdeastham
Copy link
Contributor

I think your answer is what's needed @lizziel - happy to jump in if I can help though!

@beiduqiu
Copy link
Author

beiduqiu commented Jan 27, 2025

Thanks for your reply @lizziel . However, I still have some questions about the GCHP mode. I would like to confirm my understanding of the data sources used in GCHP simulations and clarify a few points regarding data handling and grid mapping:

Understanding of Simulation Data:
From my understanding, the meteorology and emission data are the primary inputs driving GCHP simulations. Chemistry data, on the other hand, serve as a set of rules for calculations and remain constant during the simulation. The data in the NC4 files, which are stored in the simulation folders, are formatted as (time, latitude, and longitude) and with hourly granularity. Is this interpretation correct?

Handling Missing Emission Data:
While the meteorology data are consistently available every year, day, and hour, I’ve noticed that the emission data are incomplete, with certain years missing. How does GCHP handle simulations when emission data for specific time periods are unavailable? Does it fill gaps with default values?

Grid_State Values in fullchem_mod:
I attempted to print the Grid_State information before the chemistry computation starts in fullchem_mod (under geos-chem/GEOS_core), but I found that most of the values were zero, except for NX, NY, and NZ. Additionally, arrays that should contain latitude and longitude information were all null. Could you help me identify why this might be happening?

Mapping Between Lat-Lon Grid and Cube-Sphere Grid:
Could you provide guidance on how to map latitude-longitude grids to the cube-sphere grid? Is the gcpy module the only tool available for this task, or are there other alternatives within GCHP or the source code?

Initial Settings for Emission Data:
My understanding is that the emission data represent the changes or fluxes in emissions over time. However, I am unclear about the initial state of the emission data. Does GCHP require an explicit initial state for emissions at the beginning of a simulation? If so, could you provide more details about how this is handled?

@lizziel
Copy link
Contributor

lizziel commented Jan 27, 2025

Understanding of Simulation Data:
From my understanding, the meteorology and emission data are the primary inputs driving GCHP simulations. Chemistry data, on the other hand, serve as a set of rules for calculations and remain constant during the simulation. The data in the NC4 files, which are stored in the simulation folders, are formatted as (time, latitude, and longitude) and with hourly granularity. Is this interpretation correct?

I don't think the input data can be generalized in this way. Our meteorology is a combination of hourly and 3-hourly, time-averaged, constant and instantaneous, 2D and 3D. By "Chemistry data" do you mean data in the Chem_Inputs folder? These are not all constant. For example, LAI is dynamic.

Handling Missing Emission Data:
While the meteorology data are consistently available every year, day, and hour, I’ve noticed that the emission data are incomplete, with certain years missing. How does GCHP handle simulations when emission data for specific time periods are unavailable? Does it fill gaps with default values?

GCHP uses the nearest year available for missing emissions. This is set in ExtData.rc at the top of the file, with Ext_AllowExtrap: .true.. Some emissions have a climatology option in HEMCO_Config.rc.

Grid_State Values in fullchem_mod:
I attempted to print the Grid_State information before the chemistry computation starts in fullchem_mod (under geos-chem/GEOS_core), but I found that most of the values were zero, except for NX, NY, and NZ. Additionally, arrays that should contain latitude and longitude information were all null. Could you help me identify why this might be happening?

I assumed the state_grid variables were set for GCHP, at least the global ones. Did you try GlobalXmid and GlobalYmid?

Mapping Between Lat-Lon Grid and Cube-Sphere Grid:
Could you provide guidance on how to map latitude-longitude grids to the cube-sphere grid? Is the gcpy module the only tool available for this task, or are there other alternatives within GCHP or the source code?

The mechanics of the mapping within GCHP is done in MAPL/ESMF. If there is a GCHP array you want in both cubed-sphere and lat-lon then the most efficient approach I think is to create a diagnostic for it and output that at both lat-lon and cubed-sphere. If you want input data on cubed-sphere instead of the native lat-lon you could output that from the model as well, on either grid. In general, you can output diagnostics as time-averaged or instantaneous and specify the grid. All of that is done within configuration file HISTORY.rc. That being said, the best approach really depends on exactly what you are trying to do, what problem you are trying to solve.

@beiduqiu
Copy link
Author

Thanks for the quick response, @lizziel. I’m working on building a mapping between the input data and the workload for each column. My goal is to leverage reinforcement learning to predict the workload from the source data before the simulation starts.

The challenge is that the source data is provided in a lat-lon format, while in GCHP, I only have access to the workload for each column which is in the cubed-sphere grid. I need to identify which parts of the source data correspond to specific columns in the chemistry computation.

Would you have any suggestions on how to approach this mapping or any relevant resources I could explore? Any relevant resources or guidance would be greatly appreciated.

I assumed the state_grid variables were set for GCHP, at least the global ones. Did you try GlobalXmid and GlobalYmid?
Thanks for pointing that out, now I can see the latitude and longitude information for each core.

Image
I have got the following information. There are 13 x edges and 13 y edges and therefore there will be 144 areas which are consistent with the 144 columns for each processor(C24 resolution, 24cores). I will try to figure out how to do the mapping between the areas circled by lat-lon to the cubed-sphere columns.

@lizziel
Copy link
Contributor

lizziel commented Jan 28, 2025

If you need a map prior to running GCHP then I recommend using the ESMF tools to either regrid the file or simply get the mapping weights. Note that the problem is not just different grids (cubed-sphere vs lat-lon) but also different resolutions. You will need to be able to create the mapping on the fly and ESMF can do this. You can develop your own tools in python to regrid the data directly or use GCPy, which likely needs some modification since we focus only on regridding restart files which are in a different cubed-sphere file format.

A few resources for offline regridding and mapping weight generation:

  1. Offline regridding section of GCPy ReadTheDocs
  2. gridspec python package written by Liam Bindle (formerly at WashU)
  3. sparselt package also developed by Liam

Are you using machine learning? I would think any deep learning algorithm would be able to incorporate the grid differences into its model without the need for manual mapping.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: Question Further information is requested topic: GCHP Related to GCHP only topic: Regridding Related to regridding files before or after run
Projects
None yet
Development

No branches or pull requests

4 participants