Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Place for function that returns features in 'x' whose centroid is within 'y' implementing binary predicate #2061

Closed
Robinlovelace opened this issue Dec 15, 2022 · 5 comments

Comments

@Robinlovelace
Copy link
Contributor

I'm planning to put a function that implements a binary predicate that returns features in 'x' whose centroid is in 'y'. I suspect that {sf} is not the best place for this functionality because this could be seen as mission creep and there's not GEOS equivalent as far as I know. The common practical issue that this function solves is when you have a subsetting object that intersects with the boundaries of 'x' but you only want features of 'x' that are 'mostly inside' 'y'. Illustration, you have a boundary represented by the union of the green area below but st_intersects() and returns the rarely wanted features in grey. No existing binary predicate that I know of returns the more frequently needed green features. This function could also be useful when boundary inaccuracies are at play.

image

See reprex here which is the default place to implement this idea but happy to consider this for an {sfextras} type package or elsewhere: ropensci/stplanr#511

So more of a question than an issue, thoughts v. welcome!

@edzer
Copy link
Member

edzer commented Dec 15, 2022

Would st_join(x, y, largest = TRUE) help? See https://r-spatial.org/book/07-Introsf.html#spatial-joins

Example:

par(mfrow = c(2, 1), mar = rep(0,4))                                                  
library(sf)                                                                           
demo(nc, echo = FALSE)                                                                
nc <- st_transform(nc, 32119) # NC state plane, m                                     
nc |> st_geometry() |> st_union() |> st_centroid() -> pt                              
j = st_intersects(pt, nc)[[1]]                                                        
i = which(lengths(st_touches(nc, nc[pt,])) > 0)                                       
region = nc[c(i, j),]                                                                 
nc |> st_geometry() |> plot()                                                         
region |>                                                                             
  st_geometry() |>                                                                    
  st_union() |>                                                                       
  st_buffer(units::set_units(-10, km)) |>                                             
  plot(col = 'red', add = TRUE, border = 'grey')                                      
j = st_join(region, nc, largest = TRUE)                                               
st_geometry(j) |> plot(add = TRUE, col = '#77777777')                                 
                                                                                      
# larger region:                                                                      
nc |> st_geometry() |> plot()                                                         
region |>                                                                             
  st_geometry() |>                                                                    
  st_union() |>                                                                       
  st_buffer(units::set_units(10, km)) |>                                              
  plot(col = 'red', add = TRUE, border = 'grey')                                      
j = st_join(region, nc, largest = TRUE)                                               
st_geometry(j) |> plot(add = TRUE, col = '#77777777')  

x

@Robinlovelace
Copy link
Contributor Author

Hi Edzer, yes that helps a lot! Would be interested to see benchmarks for these (just out of interest, not really relevant for this use case) and suspect that this could be an opportunity to expand {sf} documentation.

Happy to have a bash at both and get back. I still think that there is merit in a new function for ease-of-use and to reduce n. lines of coded needed to get to the output and plan to forge ahead with that in {stplanr} unless someone comes up with a better place for such a function to live. Plan to use largest = TRUE in the function if that works out quicker or in other ways better (although there are sometimes cases when the centroid based approach is useful).

@edzer
Copy link
Member

edzer commented Dec 15, 2022

(although there are sometimes cases when the centroid based approach is useful).

Of course. There are also sometimes cases where the polygon centroid lies outside the polygon.

@Robinlovelace
Copy link
Contributor Author

Of course. There are also sometimes cases where the polygon centroid lies outside the polygon.

Indeed for area weighted centroid, meaning that 'point on surface' is in many times a better option: I have fallen foul of this while subsetting administrative zones (MSOAs) in regions based on geographic centroids. centroid_type = "point_on_surface" could be a default to mitigate that.

@edzer
Copy link
Member

edzer commented Dec 15, 2022

Polygon centroid has a clearly defined meaning, I would avoid conflating it. st_centroid has an option to compute it for the largest polygon only in case of mulipolygons.

@edzer edzer closed this as completed Mar 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants