Kerchunk scan_grib, index file fast refrencing and Streaming of Weather data

Based on the dynamic-Grib-chunking method, the innovative use of Kerchunk scan_grib with Grib index files significantly reduces the need to scan all Grib files. This method offers major advantages in reducing the costs involved in scan_grib—whereas reading all the Grib files in the FMRC to make references (for example, comparing it with GFS per run, there are 240 hours, or GEFS, which involves 2400 Grib files for 30 members, or ECMWF with 86 files of 4GB files for 50 members) typically requires scanning every file, this approach needs only two file to scans to generate sample metadata instead of scanning of full list of files.

As kerchunk facilitates the creation of references that can be converted into a virtual Zarr dataset. Facilitating Analysis Ready Cloud Optimized (ARCO) Dataset and Cloud Native Operations(CNO). Utilizing a Dask cluster, this virtual Zarr dataset can be streamed, supporting transmission and real-time processing through scalable parallel processing. This capability enables users to access and interact with the data, select variables, and subset regions and timesteps, which is not feasible with the downloading Grib file. Although Grib supports binary subsetting, it remains a method of data downloading that involves transferring an entire file from a server to a local device before it can be accessed or used. This method can be compared to the downloading of MP4 files versus the streaming of it in HTML5, where a close comparison can be drawn.

Weather Data vs. Video Streaming

Aspect	Video Streaming (HTML5)	Weather Data (Kerchunk)
Download Workflow	Full video download for playback.	Full GRIB file download for analysis.
Streaming Workflow	Stream segments on demand using adaptive bitrate.	Stream slices on demand using Kerchunk metadata.
Metadata Handling	Indexed file for frames, timecodes, and bitrates.	Indexed metadata for variables, timestamps, region(lat/lon) and Ensemble member etc.
Efficiency	Lower bandwidth; no full downloads needed.	Lower bandwidth and storage usage.
Scalability	Scales easily across devices and networks.	Scales horizontal using Dask cluster DAG

GFS in AWS

Documentation modified from

Step1: Make virtual dataset for a day in paraquet format, using file run_day_gfs_gik.py

1. Use kerchunk scan_grib to crete metadata of GFS grib files
2. Use the metadata mapping to build an index table of every grib message from the .idx files
3. Combine the index data with the metadata to build any FMRC slice (Horizon, RunTime, ValidTime, BestAvailable)

Step 2: Read paraquet file and stream into zarr, using file run_day_stream_gfs_gik_to_zarr.py

1. Paraquet file with 15 variables refences into zarr and store it in GCS

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
01-GFS-zstore-save-dgc-method-working-old-method.ipynb		01-GFS-zstore-save-dgc-method-working-old-method.ipynb
01-gefs-scan-grib.py		01-gefs-scan-grib.py
LICENSE		LICENSE
README.md		README.md
a1_main.py		a1_main.py
a1_utils.py		a1_utils.py
dynamic_zarr_store.py		dynamic_zarr_store.py
dynamicgribchunking.ipynb		dynamicgribchunking.ipynb
requirements.txt		requirements.txt
run_day_async_gfs_kind.py		run_day_async_gfs_kind.py
run_day_gfs_gik.py		run_day_gfs_gik.py
run_day_stream_gfs_gik_to_zarr.py		run_day_stream_gfs_gik_to_zarr.py
run_dynzs_gfs.py		run_dynzs_gfs.py
run_ecmwf.py		run_ecmwf.py
run_ecmwf_buildidx.py		run_ecmwf_buildidx.py
run_ecmwf_scangrib_fmrc.py		run_ecmwf_scangrib_fmrc.py
run_gfs_aws_daily_gik.py		run_gfs_aws_daily_gik.py
run_gfs_aws_monthly_gik.py		run_gfs_aws_monthly_gik.py
sgutils.py		sgutils.py
test_dynamic_zarr_store.py		test_dynamic_zarr_store.py
test_gfs_z00_date_paraquet_make.py		test_gfs_z00_date_paraquet_make.py
test_gfs_z00_plot_day_tp.py		test_gfs_z00_plot_day_tp.py
ut_sgutils.py		ut_sgutils.py
utils.py		utils.py
utils_gefs_aws.py		utils_gefs_aws.py
utils_gfs_aws.py		utils_gfs_aws.py
utils_gfs_aws_v2.py		utils_gfs_aws_v2.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Kerchunk scan_grib, index file fast refrencing and Streaming of Weather data

Weather Data vs. Video Streaming

GFS in AWS

About

Releases

Packages

Languages

License

icpac-igad/grib-index-kerchunk

Folders and files

Latest commit

History

Repository files navigation

Kerchunk scan_grib, index file fast refrencing and Streaming of Weather data

Weather Data vs. Video Streaming

GFS in AWS

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages