-
-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Should pyosmium-up-to-date respect an .osm.pbf's bounds #256
Comments
I don't know of a good solution for this problem. Your workaround is what some people have tried, I remember seeing scripts to that effect floating around. The problem is that the OSM data model basically makes this impossible to do cleanly, you end up implementing some heuristic. Protomaps can do minutely updates of extracts, but I think you need a complete database for that also. And download.openstreetmap.fr offers mintely updates extracts, so apperently they have solved this somehow, but I don't know how they do it. |
Got it, but this means the running pyosmium-up-to-date and then osmium extract should work and do the trick? The only downside is some wasted downloaded data but in the end I'll get a an osm.pbf that adheres to the bounds and is up to date, yes?
I do use protomaps' .pmtiles format in my pipeline generated by tilemaker. Meaning my pipeline looks something like
The great thing with these small .pmtiles files is that I can host them e.g. on a github page and if the update pipeline above is fast enough (a few minutes) I could even have a github action generating a new .pmtiles file and checking it in. |
Just one word of warning here: due to the way the OSM data is structured, you should always use a bounding box that is some 50-100km larger than what you need. OSM objects around the fringes of your extract may move in and out of the bounding box and that is not always captured correctly during the updates. |
I'll take into consideration adding bbox cutting to pyosmium-up-to-date. It might have to wait for the next rewrite of the tool, though. |
No. That's the problem, there is no way to make sure this will always work except by having a complete OSM database. It will usually work, but as lonvia said, if you have objects near the boundary moving in and out, it can break. Or weird relations or so. |
Ow, okay thank you folks I didn't know about the need for a 50-100 km buffer. That will make it quite a bit more heavy for the very small extracts I'm working with. Do I still need the 50-100km buffer even if I only care about objects always within the bounds and never crossing the bounds? For example let's say I have a <10 MB extract of a very small area (e.g. small remote town) where I only care about buildings. Do I still need the 50-100km buffer? Where can I learn more about this? Does it boil down to understanding changesets and how they're getting generated? What would be a good way then to update these <10MB small extracts? Simply re-downloading a .osm.pbf e.g. from Geofabrik and using osmium extract or using a 50-100km buffer in the first place? |
There is no simple answer here. It all depends on what you are doing with the data and what kinds of glitches in the data you are prepared to work with/ignore. The buffer is simply a way to reduce the number of glitches you might have, if something happens near the border of your extract, it is not foolproof. You have to understand the OSM data model and what data is in changes and what isn't. All that being said, if you don't do anything fancy with relations this is not going to be a big problem in practice. Use a buffer big enough that all objects you care about are well inside the extract. And do a clean re-import every half year or so so that if something was messed up you start from a clean setup every once in a while. |
Maybe an example helps to illustrate what kind of glitches we talk about: Say, you want to make an extract of Görlitz. Given its situation right at the Polish border, you cut the extract along the river Neisse. That works well when you create the extract, you have now all buildings on the western side of the river. You happily apply diffs to the extract until one day a mapper realizes that one of the buildings was in fact put on the wrong side of the river. They move the building from the eastern bank to the western bank. That means the building should now appear in your extract. However, there is a small problem. Because of the topological nature of the OSM data model, you move a building by changing the coordinates of the nodes that make up the building. You do not touch the OSM way with the actual building information. So when you get the diff with the change, there is the new position of the nodes, but there is no information about the OSM way describing the building. The way was not changed, so it is not in the diff. And because you are working with an extract, you don't have the information about the way either because when you cut the extract it was outside the area of interest. The moved building will not appear on your map. So the 50-100km is a very conservative estimate how much mappers are moving things around on the map in a way that creates these kind of glitches. If you would work only with node data, you wouldn't need any buffer at all. If you are interested in only the buildings, a 2-5 km buffer is probably sufficient already. |
Aah! Thank you so much folks now I understand the constraints a bit better - I didn't know about this! 🙌 I will add a buffer then and make sure to re-import from scratch every now and then 👍 |
Hi there! Suppose I have used osmium extract to generate a small (< 10 MB) .osm.pbf file of an area from a snapshot and I have used the
--set-bounds
options so that the bounds get written into the file header.I want to keep this small file up to date e.g. on a daily basis by running
pyosmium-up-to-date
but when I do so it looks likeHere is the osmium fileinfo output on the .osm.pbf pyosmium-up-to-date generates:
I wanted to flag this behavior because it was unexpected to me and I'm not sure if this is by design.
My workaround for now is the following
a. run pyosimium-up-to-date (~ 100 MB)
b. re-run osmium extract as in step 2 to re-cut for the specific bounds (< 10 MB)
Thank you! Also happy for any pointers on how other folks keep their small extracts up to date!
The text was updated successfully, but these errors were encountered: