Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

census areas import #425

Merged
merged 18 commits into from
Dec 15, 2020
Merged

Conversation

michaelkirk
Copy link
Collaborator

Currently broken - I know at least one reason why (comment inline), but wanted to check in on direction.

This assumes you have a file like this downloaded and in your data/system directory for any city using the census travel model:
population_areas.geojson.zip

{
    "type": "FeatureCollection",
    "name": "nyc_block_populations",
    "crs": { "type": "name", "properties": { "name": "urn:ogc:def:crs:OGC:1.3:CRS84" } },
        "features": [
            { 
                "type": "Feature", 
                "properties": { 
                    "GISJOIN": "G36000500516005011", 
                    "population": 123 
                }, 
                "geometry": { 
                    "type": "MultiPolygon", 
                    "coordinates": [ [ [ [ -73.769348000110327, 40.857214999638309 ], [ -73.769408999338665, 40.857162000206102 ], ...] ] ] 
                } 
            },
            ...
        ]
 }

@@ -31,3 +31,6 @@ members = [
# update dependencies often).
[profile.dev.package."*"]
opt-level = 3

[patch.crates-io]
geojson = { git = "https://github.com/georust/geojson", branch = "mkirk/try_from" }
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in general I think it's a little nicer to patch things here at the top level rather than updating across all the individual crates. It's maybe a little more convenient, but more importantly, it works for transitive deps too, which you can't conveniently edit.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. I always forget about this trick

p.into_inner()
.0
.into_points()
impl From<geo::Polygon<f64>> for Polygon {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you ok with adding the idiomatic conversation traits? It's a little easier to discover this way since I know where to look rather than custom named methods.

I could add more for other geo types, but only added the minimum of what we needed for now.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, these're better! I probably wrote the old thing before I understood From / Into


let interiors: Vec<geo::LineString<f64>> = poly
.rings
.map(|rings| rings.into_iter().map(geo::LineString::from).collect())
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I sort of assumed that rings were interior only and that geom::Polygon.points was the exterior - but after looking at the docs more, I'm not clear.

Maybe if rings is set then rings.0 is the exterior? Is that right?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Polygon only has half support for holes... preserving the original rings (both exterior and interior) is nice for generating outlines later. Places calling buggy_new or precomputed (notably PolyLine::make_polygons) don't fill out the rings.

If rings is set, then [0] is the exterior, correct. points contain every point in both the exterior and interior, with indices defining the triangles.

I wonder if it would be simpler and maybe more performant to just make Polygon store an exterior Ring and some interior Rings, and delay triangulation entirely until it's needed for rendering. contains_pt also uses the triangles now, but there might be other algorithms for figuring that out without triangulating. Maybe geom::Polygon could actually just be a zero-cost wrapper around geo::Polygon! Disclaimer, I haven't thought through these ideas.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your thoughts.

I think it'd be nice to be uniform about where the exterior lives. either always polygon.rings[0] or as a new polygon.exterior.

Maybe geom::Polygon could actually just be a zero-cost wrapper around geo::Polygon
I am in theory on board with this!
I wonder if it would be simpler and maybe more performant to just make Polygon store an exterior Ring and some interior Rings, and delay triangulation entirely until it's needed for rendering

I haven't spent time surveying the existing usage of geom::polygon - but maybe it makes sense to have separate entities for "polygon" vs. "polygon that I'm going to draw"

I'm not intending to change anything drastically in this PR, but I'll try to keep it in mind as I work with more of the existing geometry code in ABStreet.

use geo::algorithm::map_coords::MapCoordsInplace;
let mut geo_map_area: geo::Polygon<_> = map_area.clone().into();
geo_map_area.map_coords_inplace(|c| {
let projected = geom::Pt2D::new(c.0, c.1).to_gps(bounds);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a better way to get the map poly in wgs84?

Though maybe it's irrelevant, because the other question I need to answer is: is there a blessed way to convert a poly from wgs84 to map coords?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GPSBounds::convert? Or if you want to enforce all points are in bounds, try_convert?

for feature in collection.features.into_iter() {
let population = feature.property("population");
let total_population = match population {
Some(serde_json::Value::Number(n)) => n
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

geojson prop parsing is depressingly verbose. I hope this gets better soon! georust/geojson#157

let bytes = abstutil::slurp_file(&path)?;
debug!("parsing geojson at path: {}", &path);

// TODO - can we change to Result<_,Box<dyn std::error::Error>> and avoid all these map_err?
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My instincts would be to change it all to Box<dyn Error>, but maybe there's a reason you prefer String errors. If so, do you have any advice on making this less verbose?

I'd love to be able to write:

let str = String::from_utf8(bytes)?;
timer.start("parsing geojson");
let geojson = str.parse::<GeoJson>()?;

Have you worked with anyhow at all? It might be a nice middle ground in all these places where we return String errors, which aren't really meant to be "handled" other than just logged.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should say, I don't have much experience with anyhow... just some reading and conversations with other people who are happy with it for "non-graceful" error handling (and it's sister crate this_error for graceful library error handling)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woops, missed this thread. https://crates.io/crates/anyhow looks nice, pretty much what we need! I've avoided the Rust error ecosystem because there's so much churn, but this one seems simple and popular. I'd be fine moving everything over to this.

Copy link
Collaborator

@dabreegster dabreegster left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems fine so far. What's the one big TODO you mentioned -- requiring extra QGIS work first?

Also all of the more careful conversions to/from geo types raises a question: Should lots of geom types just wrap geo types internally and expose a thin API on top? The only reason I rolled my own was because I couldn't find crates to do things like thickening a polyline, and at the time there were a bunch of Vec2-type libraries, none of which had most of what I needed starting out. But now, it's more clear that the geo ecosystem has much more careful handling of things like polygons with holes.

@@ -31,3 +31,6 @@ members = [
# update dependencies often).
[profile.dev.package."*"]
opt-level = 3

[patch.crates-io]
geojson = { git = "https://github.com/georust/geojson", branch = "mkirk/try_from" }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. I always forget about this trick

}
}

impl From<Polygon> for geo::Polygon<f64> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woo, thanks for fixing this! When I originally wrote this, I didn't know about earcutr for triangulating holes, and I think I confused multipolygons / polygons with inner portions.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated these From implementations now that I fully understand what ABStreet geom::Polygon's expect.

let bytes = abstutil::slurp_file(&path)?;
debug!("parsing geojson at path: {}", &path);

// TODO - can we change to Result<_,Box<dyn std::error::Error>> and avoid all these map_err?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I've bounced back and forth how to deal with error types. I don't think there's anywhere that we actually need to look for different error cases, but many places where we need to stringify an error, so I've tried making most APIs use String. But this is internal to popdat, so popdat::generate_scenario can be the one to map_err instead.

p.into_inner()
.0
.into_points()
impl From<geo::Polygon<f64>> for Polygon {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, these're better! I probably wrote the old thing before I understood From / Into


let interiors: Vec<geo::LineString<f64>> = poly
.rings
.map(|rings| rings.into_iter().map(geo::LineString::from).collect())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Polygon only has half support for holes... preserving the original rings (both exterior and interior) is nice for generating outlines later. Places calling buggy_new or precomputed (notably PolyLine::make_polygons) don't fill out the rings.

If rings is set, then [0] is the exterior, correct. points contain every point in both the exterior and interior, with indices defining the triangles.

I wonder if it would be simpler and maybe more performant to just make Polygon store an exterior Ring and some interior Rings, and delay triangulation entirely until it's needed for rendering. contains_pt also uses the triangles now, but there might be other algorithms for figuring that out without triangulating. Maybe geom::Polygon could actually just be a zero-cost wrapper around geo::Polygon! Disclaimer, I haven't thought through these ideas.

use geo::algorithm::map_coords::MapCoordsInplace;
let mut geo_map_area: geo::Polygon<_> = map_area.clone().into();
geo_map_area.map_coords_inplace(|c| {
let projected = geom::Pt2D::new(c.0, c.1).to_gps(bounds);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GPSBounds::convert? Or if you want to enforce all points are in bounds, try_convert?

@michaelkirk
Copy link
Collaborator Author

Seems fine so far. What's the one big TODO you mentioned -- requiring extra QGIS work first?

The only thing blocking me was getting points back into map-space. Now that I know about GPSBounds::convert I should be good!

Copy link
Collaborator Author

@michaelkirk michaelkirk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this PR is ready for review.

Since the scenario isn't currently wired in, it's a little hard to evaluate if it actually works.

I'm working on integrating the scenario and doing some low-hanging perf stuff since the current situation is prohibitively slow. If you'd prefer I can include that stuff in this PR, otherwise I'll plan to open a follow up one.

@michaelkirk
Copy link
Collaborator Author

I should say, with debug logging enabled, there is a bunch of output like this:

[2020-12-15T00:54:12Z DEBUG popdat::distribute_people] Distributing 5 residents to 9 buildings. 100% of this area overlapped with the map, scaled residents accordingly.
[2020-12-15T00:54:12Z DEBUG popdat::distribute_people] Distributing 14 residents to 18 buildings. 99% of this area overlapped with the map, scaled residents accordingly.
[2020-12-15T00:54:12Z DEBUG popdat::distribute_people] Distributing 0 residents to 2 buildings. 99% of this area overlapped with the map, scaled residents accordingly.
[2020-12-15T00:54:12Z DEBUG popdat::distribute_people] Distributing 0 residents to 1 buildings. 100% of this area overlapped with the map, scaled residents accordingly.
[2020-12-15T00:54:12Z DEBUG popdat::distribute_people] Distributing 1,573 residents to 10 buildings. 100% of this area overlapped with the map, scaled residents accordingly.

So that's at least somewhat promising.

@michaelkirk michaelkirk marked this pull request as ready for review December 15, 2020 01:05
@michaelkirk
Copy link
Collaborator Author

Note: I moved the "integration" code and the perf stuff into #426. If you'd prefer to look at them all together, take a look at #426.

I'd recommend squashing the commits in this branch... there was quite a bit of churn.

@dabreegster
Copy link
Collaborator

LGTM except for a few small style bits!

If you'd prefer I can include that stuff in this PR, otherwise I'll plan to open a follow up one.

Followup is fine. This PR is mostly getting the geo<->geom interop up to par.

impl From<geo::Polygon<f64>> for Polygon {
fn from(poly: geo::Polygon<f64>) -> Self {
let (exterior, interiors) = poly.into_inner();
Polygon::with_holes(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why I was using buggy_new.. a quick audit of convex_hull() and intersection() doesn't turn up anything that should be used in map importing, so I think this should work fine. Just never updated this call-site.

rings.into_iter().map(geo::LineString::from).collect();
Self::new(exterior.into(), interiors)
} else {
let exterior_coords = poly
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note this will break for a polygon produced by polyline.make_polygons, for instance, because points won't form a ring. It looks like on the geo side, it'll pts.push(pts[0]) on input if needed. That'll still turn out to be nonsense for this, since the points will go back and forth.

I think this is fine for now. Ideally every geom::Polygon will properly have an exterior ring. aka, we have to whittle down the callers of Polygon::buggy_new and Polygon::precomputed. Some of those will be hard, maybe impossible (like triangulated polygons coming from lyon).

geom = { path = "../geom" }
log = "0.4.11"
map_model = { path = "../map_model" }
rand = "0.7.0"
rand_xorshift = "0.2.0"
sim = { path = "../sim" }
serde_json = "1.0.60"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please sort. I really wish cargo fmt would get support for this :(

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know why this is so hard for me. Thanks for your diligence!

use crate::CensusArea;
use abstutil::Timer;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: For consistency, we should still split the imports based on category... std, external crates, other crates in this workspace, the current crate. I guess we don't have any contributors using an IDE that forces this order, so not a huge deal either way.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

happy to abide

todo!()
pub fn find_data_for_map(map: &Map, timer: &mut Timer) -> anyhow::Result<Vec<CensusArea>> {
// TODO eventually it'd be nice to lazily download the info needed. For now we expect a
// prepared geojson file to exist in data/system/<city>/population_areas.geojson
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

system makes sense to me, because we'll need this at runtime. (Just thinking aloud)

"system/{}/population_areas.geojson",
map.get_name().city
));
let bytes = abstutil::slurp_file(&path).map_err(|s| anyhow!(s))?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I guess String doesn't implement std::error::Error, so we need the conversion here. I guess a good refactor would change all of the result types in all abst code to use anyhow::Result.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep exactly.

@@ -20,8 +20,12 @@
#[macro_use]
extern crate log;

#[macro_use]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I'm nitpicking style, maybe we should sort this above the log too. Although honestly if rustfmt doesn't do it, stuff like this is just busywork. :\

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it that you'd prefer to keep the alphabetical within the sections? That sounds fine - just making sure I understand.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think that's the general idea. With macro_use I guess it's not a hard-and-fast rule, because I think the order of these can have a real effect if there are interdependencies.

Copy link
Collaborator Author

@michaelkirk michaelkirk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK - I think this is all fixed up.

@dabreegster
Copy link
Collaborator

LGTM, thanks! Merging, then I'll look at the next PR

@dabreegster dabreegster merged commit 8b9b3ce into a-b-street:master Dec 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants