-
Notifications
You must be signed in to change notification settings - Fork 408
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Samplers: Add PointGeoSampler #800
base: main
Are you sure you want to change the base?
Conversation
@ak3ra please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.
Contributor License AgreementContribution License AgreementThis Contribution License Agreement (“Agreement”) is agreed to by the party signing below (“You”),
|
@ak3ra Thanks for the contribution! Here's a link to the contributing docs which can give some guidance for fixing the failing github actions. You'll need to run |
@ak3ra I might find some time to work on this this week. Do you mind if I push commits directly to your PR branch? |
Thanks @isaaccorley I will follow the guidelines ! @adamjstewart, I don't mind at all. Looking forward to having this feature :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ak3ra and I met over zoom, discussed some of these changes.
I may have changed my mind on the best way to design this halfway through reviewing, get through the end before making too many changes lol. Let me know if any of this isn't clear. I'll let you keep hacking on this for a while and I'm happy to step in and try implementing things myself if you want.
Once we get this working properly, we can worry about testing and code style.
@@ -4,6 +4,7 @@ | |||
"""TorchGeo samplers.""" | |||
|
|||
import abc | |||
from email import generator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can remove this import, it isn't used
"""Samples from an intersection of points and image GeoDatasets. | ||
|
||
This sampler should take in a pair of coordinates and region of interests and | ||
returns image patches containing the coordinates of interest. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should have:
.. versionadded:: 0.4
at the end to document that it only exists in TorchGeo 0.4+
image_dataset: GeoDataset, ## for exampler raster dataset of sentinel 2 | ||
points: GeoDataset, ## e.g GBIF dataset containing coordinate points |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if image_dataset
is the best name, since it could also be used for vector labels. Maybe tile_dataset
or scene_dataset
. But maybe those aren't clear either. Not sure.
Let's call points
point_dataset
instead.
self, | ||
image_dataset: GeoDataset, ## for exampler raster dataset of sentinel 2 | ||
points: GeoDataset, ## e.g GBIF dataset containing coordinate points | ||
roi: Optional[BoundingBox] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'll need to add a size
parameter to control how large the returned bounding boxes are
|
||
self.hits = [] | ||
|
||
for hit in self.image_dataset.index.intersection(tuple(self.points.index.bounds),objects=True): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for hit in self.image_dataset.index.intersection(tuple(self.points.index.bounds),objects=True): | |
for hit in self.index.intersection(tuple(self.points.index.bounds),objects=True): |
This will use the self.index
created via GeoDataset.__index__
which will take into account the roi
parameter.
Also, I don't think you want to take the intersection of image_dataset
and the points bounding box. Basically, we want a list of points that intersect with an image. Take a look at torchgeo.datasets.geo.IntersectionDataset._merge_dataset_indices
. I think we want something like that, although we only need a single for loop (loop over all points). For each point, if it intersects with the image dataset, then add it to the list of hits. So maybe something like this:
for point in point_dataset.index.intersection(point_dataset.index.bounds, objects=True):
if list(self.index.intersection(point.bounds)):
self.hits.append(point)
generator = torch.randperm | ||
|
||
for idx in generator(len(self)): | ||
yield BoundingBox(*self.hits[idx].bounds) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of returning the entire bounding box of the hit, we'll want to return a smaller patch of size self.size
. You can use get_random_bounding_box
(used above in RandomGeoSampler
) to do this. First, start with the point (hit) you want to sample. Then, create a bounding box that is +/- self.size
in both directions.
If you want to get really technical, we should also be careful of points that are on the edge of an image. In that case, we should technically only sample from the part of the bounding box that is within an image. I'm not sure what the best way to handle that would be. An idea would be to ignore what I said above about self.hits
only being a list of points and instead:
- Convert
point_dataset
to an rtree index where each bounding box is now (point +/-self.size
) in every direction - Compute the intersection dataset of both:
dataset = image_dataset & new_point_dataset
- Sample like we normally do in
RandomGeoSampler
What do you think about that approach? I think it's actually simpler and more robust.
@adamjstewart it looks like the next week and a half is going to be a little too busy for me to focus on this fully, |
I got buried in prelim and conference prep, so I also haven't gotten a chance to work on this. |
@ak3ra is this PR still a work in progress? Not sure if it's still relevant to your research or not. |
This PR attempts to implement a PointGeo Sampler.
The
PointGeoSampler
takes in an Image dataset and a dataset containing (lat/lon) points (eg GBIF, INaturalist), finds the intersection of the two datasets and returns a sample bounding box of the image dataset which contains the lat/lon coordinates.See #723 for design discussion