forked from keepright/keepright
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME
executable file
·421 lines (268 loc) · 19 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
/_ _ _ _ _ . _ /__/_
/\ /_'/_'/_/ / / /_// //
/ _/
Data Consistency Checks for openstreetmap.org
----------------------------------------------
openstreetmap.org (OSM) provides a wiki-style means of creating a world-wide street map where everybody is encouraged to contribute. This is a collection of scripts that will examine part of the OSM database and try to find errors that should be corrected by users. As a result you get ugly lists of errors and are invited to correct them.
This document explains how to run data consistency checks on your own database and set up a webpage presenting the results.
PREREQUISITES
-------------
Packages required on Linux:
php5
php5-cli
apache
postgis
postgresql >= 8.3 with matching release of postgis (postgresql-8.3-postgis)
postgres-client
php5-mysql
php5-pgsql
php5-intl (support for utf-8)
php5-idn (support for IDN domain names)
mysql-server
mysql-client
phpMyAdmin
phpPgAdmin
sun-java7-jre
wget
wput
bzip2
Optional:
joe
mc
You will need both Postgres and MySQL because the checks require GIS functions and the error-presentation scripts rely on MySQL. They will not be recoded to use Postgres because you won't find Postgres on many webhosters.
Using sun-java7-jre is not optional. You need java7 by Sun (now Oracle), at least release 6.
The checks depend on a copy of the OSM database, split up in parts. Using only a subset of the planet file will result in false-positives because ways are cut in two at the border. To avoid this the splitting is done with overlapping borders - the border regions are included in both adjacent dumps. In the end errors in the overlapping area are discarded.
The planet is split up in currently 85 parts, so called 'schemas'. They are processed sequentially and independently.
It looks like the osmosis plugin heavily depends on the osmosis version being used. The plugin is tested and works with osmosis_0.42.
THE BIG PICTURE
---------------
This is the whole process from getting the planet file, running the checks, publishing the check results and collecting user comments.
error_view is the resulting table containing all errors. It's the source for the map presentation.
backend scripts running on processing servers:
main.php
update source code
loop over all database schemas and process them one by one
finally start all over
process_schema.php
do all that is necessary for processing a single schema:
prepeare database (create db tables)
diff-update planet file
load database with planet file
run the checks
export & upload results to web server
prepareDB.php
create database tables, activate postGIS
planet.php
call osmosis with options for diff-updating a planet file part
let osmosis use a custom plugin called 'pl' that creates special dump files
osmosis plugin 'pl'
PostgreSqlMyDatasetDumpWriter.java
create dump files suitable for loading with COPY commands in PostgreSQL
the format mainly differs from the current 'snapshot' format in that all geometries
are in meters instead of lat/lon
it was established before the current 'snapshot' format evolved and cannot be changed
with realistic effort any more
prepare_helpertables.php
update redundant columns
prepare_countries.php
create structures needed for boundary processing
run-checks.php
start all the check routines found in config file error_types.php
0010_*.php ... 9999_*.php
compare old and new errors, update error states
rebuild the error_view table
export_errors.php
export error_view to dump file
webUpdateClient.php
upload error_view to web server
start procedures on web server for loading the new file
communicating with webUpdateServer.php
frontend scripts running on web server:
report_map.php
myText.js, myTextFormat.js
main display script including the map and myText layer
derived from OpenLayers using an extended version of the Text layer
points.php
deliver error entries to the client browser
selecting errors matching error type selection and current viewport of map
comment.php
receive user feedback and store it on the webserver's comments table
PLANET FILE MANAGEMENT & SQUID
------------------------------
The planet file is split in appriximately 85 rectangular areas called 'schemas' (this wrong term evolved in the early days, because every part of the planet resides in its own database schema). Have a look at config/planet.odg for the splitting layout.
When processing a file osmosis will download all diffs since last update and apply them to the schema's planet file. As the planet diffs include updates for the whole planet osmosis includes objects out of scope to the current schema's planet file. That is why cutting the schema's planet file has to be repeated after the diff-update.
All of these files are diff-updated individually. That means you always work with the most recent version of each file but you end up downloading the same diff files over and over. That's where the web proxy squid comes into play: Squid caches all web access. It speeds up your downloads and avoids unnecessary traffic (the saving is by a factor of 85 - 1).
Setting up squid is quite easy. On Debian/ubuntu Linuxes do something like this:
> aptitude install squid
change the config file /etc/squid/squid.conf to increase overall cache size to 1000MB if you like (default is 100MB which is a little bit small). Choose cache size big enough to hold all planet diffs that are needed for updating even the oldest schema in the loop (depending on loop cycle time).
cache_dir ufs /var/spool/squid 1000 16 256
restart squid
>/etc/init.d/squid restart
tell your osmosis:
add this line to ~/.osmosis:
JAVACMD_OPTIONS="-Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128 "
my ~/.osmosis looks like this:
JAVACMD_OPTIONS=" -Xmx2500m -Djava.io.tmpdir=/media/big_harddisk/tmp/ -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128 "
The website check (#410) too benefits from an http proxy. Have a look at the respective options in you keepright user config file (~/.keepright). Increasing the cache size to 5000MB for use with the website check seems appropriate.
SETTING UP LOCAL DATABASES
--------------------------
[Don't skip this section if you already have a local database!]
This project uses a modified version of the "simple PostgreSQL schema" as specified in osmosis/script/pgsql_simple_schema.sql, which is part of the source distribution of Osmosis. This means that the base tables are the same, but there are additional columns providing redundancy. This redundancy is used to boost performance of the queries as it can save some joins. For example the ways table has the number of nodes, as well as the id and lat/lon of the first and last node as additional columns; in way_nodes you find lat/lon of the nodes.
The downside is, you cannot use a default database. And you have to use a modified version of Osmosis to convert the planet file. A plugin for Osmosis is provided with the sources. It teaches Osmosis a new option --pl that will create dump files with parts of the redundancy needed.
This is the short form of an article on the wiki http://wiki.openstreetmap.org/wiki/Mapnik/PostGIS
Tuning PostgreSQL configuration for performance of OSM databases is an adventure. Since PostgreSQL 9.0 you can use the pgtune tool. It creates a modified version of your postgresql.conf depending on main memory installed and depending on the usage type you provide (DW seems to be the best matching one).
These are setup parameters you could set before starting out manually:
>>>Tune database parameters
edit /etc/postgresql/8.3/main/postgresql.conf and add/modify these parameters:
shared_buffers = 1024MB
work_mem = 128MB
maintenance_work_mem = 128MB
wal_buffers = 512kB
checkpoint_segments = 20
max_fsm_pages = 1536000
effective_cache_size = 512MB
autovacuum = off
>>>assert the auto-vacuum daemon being shut down
joe /etc/crontab
comment out any auto-vacuum-daemon entry
>>>Tune shmmax kernel parameter
joe /etc/sysctl.conf
edit/add the parameter
kernel.shmmax=300000000
after that reboot the machine or simply execute
sysctl -w kernel.shmmax=300000000 && /etc/init.d/postgresql-8.3 restart
>>>Optionally turn off postgres user authentication for local access
joe /etc/postgresql/8.3/main/pg_hba.conf
Add this line:
local all all trust
This is a security risk. You will not need a password when using the command line psql shell. Most probably you'll use phppgadmin and won't need this.
>>> Alternatively to turning off local password prompting you may create a .pgpass file
joe ~/.pgpass
add a line of this form: hostname:port:database:username:password
127.0.0.1:*:*:keepright:yourpasswordhere
chmod 0600 ~/.pgpass
>>>Create the new user
su - postgres
createuser keepright
Shall the new role be a superuser? (y/n) y
You needn't create the postgres database, as the updateDB script will do that automatically. But you have to set the password for the keepright user inside postgres.
Still as user postgres start the psql shell:
> psql
postgres=# ALTER ROLE keepright WITH PASSWORD 'shhh!';
ALTER ROLE
just in case the scripts don't work as expected: creating the database and installing postGIS is easy if you're using postgresql>=9.1
CREATE DATABASE osm WITH OWNER = osm;
inside the newly created database just run
CREATE LANGUAGE plpgsql; -- (should already be there)
CREATE EXTENSION postgis;
Wondering why the auto-vac-daemon ist shut off?
The daemon will start analyzing and vacuuming tables every few hours to keep index performance up on a high level. But this consumes large amounts of IO bandwidth and disturbes normal operation. Vacuuming is done by hand throughout the scripts because there are many temporary tables that need analyzing and the daemon never comes at the right time. Basically it is done once after loading data and then manually after creation of temp tables and adding indexes.
For inserting actual data take a look at updateDB.php, planet.php and config: These scripts download a planet dump from the net or diff-update an already existing set of planet excerpts and insert the planet files contents in a database. In config you can define the databases and the coordinates of the areas.
Don't forget to adapt the appropriate configuration variables to match your database credentials in ~/.keepright.
OSMOSIS_BIN has to point to the location where you have put the osmosis executable.
Configuration is split in two parts: config/config is the default file. This file will always be read first and it will be updated via svn to add new setup options. You will want to make settings differ from the standard settings. Therefore you can change the file ~/.keepright which includes only the system-specific settings (this file will be created upon the first run of main.php). ~/.keepright will be read after the built-in config file so any settings made here will overwrite the default.
Finally you will have to setup a MySQL database and user for the destination tables needed by the presentation scripts if you want to run a web server. Update ~/.keepright and webconfig.inc.php with the new database credentials.
RUNNING THE CHECKS
------------------
First of all you need to specify database credentials in ~/.keepright.
Second, take a look at the list of error types in config/error_types.php. Here you may specify which types of checks should be executed. Anything different from zero will enable a job.
Assuming you already have a populated database you start checking by calling run-checks.php from the shell:
> php run-checks.php 1 20 30 40
will start the checks 20, 30 and 40 on the database schema called 1. Providing check numbers on the command line is optional. If none are given, all checks are run if they are enabled in ~/.keepright.
When processing has finished you will have (among others) a newly created table called public.error_view. Here you can find records for all errors that exist. This postgres table will get transferred into MySQL by export_errors.php and webUpdateClient.php.
As time goes by you will update your database and maybe errors are getting corrected. The scripts will detect when old errors don't exist any more and will update the state information in the errors tables to state==cleared.
VISUALIZING RESULTS
-------------------
report_map.php is used for displaying errors on the map. This script displays a slippy map using an exra layer to draw icons. Icons are drawn for every faulty node and on either starting node of faulty ways. They display some hint about the error when hovered. Keep in mind that this display method draws a limited number of errors in the map, because of memory constraints in browsers and the webserver.
WRITING YOUR OWN CHECKS
-----------------------
Take a look at the existing checks to see how they work. Then take a look at the template file 0000_template.php. If you write a new check you also have to mention it in the config file.
Keep in mind that all checks are included using include() inside a while loop that is running inside run-checks.php. Surprisingly that doesn't matter much, with some exceptions:
Any checks run in the same scope, you are even allowed to declare functions (inside the while-loop!) but you must not declare two functions of the same name in different checks. Also don't rely on global variables not being used at the beginning of your script (maybe another check did already initialize a variable of the same name). The same is valid for temporary tables you may need. Always check if a table already exists before creating it. At the end of the script drop any tables you have created.
Maybe in the future I will change this into an oop-styled buch of classes, but up to now it is working great this way.
If you have ideas for new checks, I would like to integrate them in the official sources to let others benefit from them. So please let me know! And please let me assign a unique check numer for your checks to avoid collisions.
SCHEMA SPLITTING WORKFLOW
--------------------------
when planet schema files grow beyond certain limits it is necessary to further split them.
e.g. splitting old schema '1' into '86' and '87'
determine optimum splitting boundaries to achieve equally sized files of up to 4GB uncompressed xml data:
/home/harald/OSM/osmosis-0.36/bin/osmosis --rx 1.osm --tee 2 --bb left=-30 top=85 right=1.8 bottom=52.3 idTrackerType=BitSet completeWays=yes completeRelations=yes --wx 86.osm --bb left=-30 top=52.3 right=1.8 bottom=49.3 idTrackerType=BitSet completeWays=yes completeRelations=yes --wx 87.osm > log 2>&1 &
update config blocks in config/schemas.php
use planet.php to get boundaries including padding:
php planet.php --cut 1.osm 86 87
again use osmosis to split files using definitive boundaries
in planet directory copy config directory of old planet file to new directories
in 0130_islands.php check that there is a starting point in every schema, add some as required
copy old error records to both new schemas:
first duplicate errors into secondary new schema:
insert into public.errors(error_id, error_type, object_type, object_id,
state, first_occurrence, last_checked, lat, lon, "schema", msgid, txt1,
txt2, txt3, txt4, txt5)
SELECT error_id, error_type, object_type, object_id, state,
first_occurrence, last_checked, lat, lon, '87', msgid, txt1, txt2, txt3, txt4, txt5
FROM public.errors
where "schema"='1';
last move errors from old schema into primary new schema:
update public.errors set "schema"='86' where "schema"='1';
update make.sh to include the new schemas and exclude the old ones
run checks
on webserver db:
copy comments from old schema to new ones:
INSERT INTO `comments`(`schema`, `error_id`, `state`, `comment`, `timestamp`, `ip`, `user_agent`)
SELECT '86', `error_id`, `state`, `comment`, `timestamp`, `ip`, `user_agent` FROM `comments`
WHERE `schema`='1'
UPDATE comments SET `schema`='87' WHERE `schema`='1';
update table schemata:
drop old schema line and add new schema with updated boundaries
delete old planet file, old error_view table from webserver, old config file sections
RUNNING ON WINDOWS
------------------
install packages from these locations:
PHP
http://windows.php.net/download/
PostgreSQL
http://www.enterprisedb.com/products-services-training/pgdownload#windows
choose PostgreSQL 9.2 or later, choose the x64 flavor
PostGIS
http://postgis.refractions.net/download/windows/
choose version 2 or later
install PostGIS using application stack builder shipping with PostgreSQL
bzip2
http://sourceforge.net/projects/gnuwin32/files/bzip2/1.0.5/bzip2-1.0.5-bin.zip/download
SVN client (Apache SVN)
http://www.sliksvn.com/en/download
Java Runtime Environment
http://www.oracle.com/technetwork/java/javase/downloads/
make sure you catch the 64 bit version of Java
Add some directories to PATH environment variable pointing to the programs you just installed (change paths accordingly to match your environment):
C:\Program Files\SlikSvn\bin;C:\Program Files (x86)\bzip2\bin;C:\Program Files\php5;C:\Program Files (x86)\Java\jre7\bin
Create an OSM directory (eg. C:\OSM\) and check out the source files. In a cmd window type
C:
cd \
md OSM
cd OSM
mkdir keepright
svn co svn://svn.code.sf.net/p/keepright/code/ keepright
Create a copy of the config.php file you find in C:\OSM\keepright\config\config.php.template and rename it to userconfig.php in the same directory. This file replaces the ~/.keepright file used on Linux environments and is never overwritten by svn updates.
Open the file in your favourite text editor and change the paths accordingly.
Create a file called C:\users\<your user name>\osmosis.bat and give it the following content:
set JAVACMD=C:\Program Files\Java\jre7\bin\java.exe
set JAVACMD_OPTIONS=-Xmx2500m -Djava.io.tmpdir=C:\temp\
increase the memory limit in case osmosis should crash.
in case you want to use a http proxy add this part to your JAVACMD_OPTIONS
" -Dhttp.proxyHost=<your.proxy.host> -Dhttp.proxyPort=3128 "
in OSM\osmosis\bin\osmosis.bat you may have to change the next-to last line to include your JAVACMD in apos (") in case it contains spaces just like this:
SET EXEC="%JAVACMD%" %JAVACMD_OPTIONS% -cp "%PLEXUS_CP%" -Dapp.home="%MYAPP_HOME%" -Dclassworlds.conf="%MYAPP_HOME%\config\plexus.conf" %MAINCLASS% %OSMOSIS_OPTIONS% %*
Create/modify your php.ini file to enable the PostgreSQL-Extension
If you downloaded the non-installer version of php you need to create a php.ini file yourself. Just copy and rename php.ini-production in your php directory to php.ini. Find the extension_dir setting and point it to the ext directory in your php installation:
extension_dir = "C:\Programme\php5\ext"
find and uncomment the line loading the PostgreSQL-Extension:
extension=php_pgsql.dll
LEGAL STUFF
-----------
Sources are licensed under GPLv2.
This collection of characters was created using a random number generator. I don't think these files are useful for anything or anyone. If you copy, watch, process or even think about putting this collection of bytes into your computer, you do this at your own risk. Don't blame me.
IMPRESSUM
---------
This work is done without commercial background, just for my personal pleasure. I would be very happy if it was helpful for the OSM Project.
If you like to contact me, my mailbox at the austrian server of gmx is labelled keepright