Streaming data from COGs: #19

feefladder · 2024-10-31T19:00:19Z

Thought I'd open an issue here for good measure.

Basically, what Sebastian Lague says here: Viewing the entire earth at high resolution would require streaming in data dynamically based on the location on earth.

I've been tackling async tiff reading, which should become geotiff reading soon enough. Now, lots of world data is in Cloud-Optimized Geotiff format, so this is a tracking issue for progress on that. I've also created an issue over at frewsxcv/rgis/#124, because that is the same. I thought, if targeting 2 bevy projects simultaneously, that would make for a solid API of some bevy-cog implementation.

I'll be rambling here a bit on possible implementations if that's ok :)

kurtkuehnert · 2024-11-04T08:16:49Z

Async Geotiff support would be awesome. Due to its special terrain tile format, I am unsure how applicable COGs are for bevy terrain, but we could always implement a converter layer.

feefladder · 2024-11-04T11:19:18Z

My crate tiff2 for experimentation on making image-rs/image-tiff async, to be merged into image-tiff whenever.
upstream issue: georust/geotiff#13
So asyncness has to "stabilize" in my crate, then pass through geotiff, so that could take qute some time still. I have planned started work on that next week.

I think they have the perfect structure, except for the padding around the tile. (Or I misunderstand the concept of a Quadtree).

What I think is that a COG already has overviews, which would directly translate to layers (z-index) of the quadtree.

I don't exactly get where the loading code would end up in your codebase. I made #18 as an exploration on where I think the changes would be to allow for a pluggable loading backend.
I'm not at all knowledgeable on bevy still, so the entire ECS is a bit alien. The main problem I have now is how to get a CogDecoder instance linked to a specific tile atlas. Ideally, it would be initialized with where we want to be loading tiles from, so it can prefetch tile offsets.

this is the preliminary tiff2 api:

#[tokio::test]
    async fn test_concurrency_recover() {
        let decoder = CogDecoder::from_url("https://enourmous-cog.com")
            .await
            .expect("Decoder should build");
        decoder
            .read_overviews(vec![0])
            .await
            .expect("decoder should read ifds");
        // get a chunk from the highest resolution image
        let chunk_1 = decoder.get_chunk(42, 0).unwrap();
        // get a chunk from a lower resolution image
        if let OverviewNotLoadedError(chunk_err) = decoder.get_chunk(42, 5).unwrap_err() {
            // read_overviews changes state of the decoder to LoadingIfds
            // also this scope makes sure we drop mutable access to decoder
            decoder.read_overviews(chunk_err.level).await;
        }
        // actually not so nice to let the user handle the compressed data, but it does allow for separate handling of io vs cpu "tasks"
        let chunk_2 = decoder.get_chunk(42, 5);
        let compressed_data = (chunk_1.await, chunk_2.await);
        // this part of the api I haven't thought out yet:
        // mainly; ChunkDecoder api vs decoder api and that one needs an Arc<ChunkOpts> to get a ChunkDecoder, which depends on the overview.
        // let images = compressed_data.iter_mut(|compressed| decoder.decode_chunk(
    }

Do you have any ideas how that could work with bevy_terrain? (the initialization)
Here are more questions I have, which will go faster if I could get some guidance 🥰

kurtkuehnert · 2024-11-05T09:07:30Z

Unfortunately, I am quite busy with work/uni right now. I am focusing on getting my master thesis finished, so I do not have much time to help you.

The way bevy terrain currently works (and the version designed for my thesis) is that all terrain data has to be preprocessed into tiles of the same CRS (similar to s2 with a custom mapping function) in an offline preprocessing step.
It assumes, that all attachments share the same CRS and the tiles align perfectly with the quadtree.
This is crucial for the rendering to work properly and efficiently.

In the future we would like to support the loading of arbitrary chunked/tiles datasets (be it COGs or similar) with different CRSs than bevy terrain. To facilitate this, we will have to build a fast and efficient real-time preprocessing pipeline, that can generate tiles requested by bevy terrains quadtrees (including border and mipmap information for proper sampling) from the arbitrary source datasets. This will probably have to be built as a GPU-accelerated architecture since the current CPU processing using GDAL is quite slow.

This real-time projection and retiling pipeline will be quite complex and difficult to build. I plan to build something like this in the future, but currently, my focus is still more on the rendering side of things. Having async geotiff/COG reading by then would be greatly appreciated. :)

kurtkuehnert · 2024-11-05T09:09:59Z

For planar terrains, that might be a bit simpler. Assuming that the source images are of the same CRS, you might be able to get some basic tiling working (albeit without border information), by loading chunks of the source file or its overviews. But for arbitrary geotiff support, we would have to build a reprojection stage as well.

feefladder · 2024-11-05T14:20:40Z

This will probably have to be built as a GPU-accelerated architecture since the current CPU processing using GDAL is quite slow.

On the rust side of things, that would be using geodesy/geo. Let me see if there's interest there in having GPU acceleration. Anyways, that'd be a piece of code that finds uses elsehwere, so belongs in a separate lib crate?

For planar terrains, that might be a bit simpler. Assuming that the source images are of the same CRS, you might be able to get some basic tiling working (albeit without border information), by loading chunks of the source file or its overviews. But for arbitrary geotiff support, we would have to build a reprojection stage as well.

This was exactly my idea. Especially outputs of machine learning predictions and covariates are always matching, so would directly be useful here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Streaming data from COGs: #19

Streaming data from COGs: #19

feefladder commented Oct 31, 2024

kurtkuehnert commented Nov 4, 2024

feefladder commented Nov 4, 2024

kurtkuehnert commented Nov 5, 2024

kurtkuehnert commented Nov 5, 2024

feefladder commented Nov 5, 2024

Streaming data from COGs: #19

Streaming data from COGs: #19

Comments

feefladder commented Oct 31, 2024

kurtkuehnert commented Nov 4, 2024

feefladder commented Nov 4, 2024

kurtkuehnert commented Nov 5, 2024

kurtkuehnert commented Nov 5, 2024

feefladder commented Nov 5, 2024