-
Notifications
You must be signed in to change notification settings - Fork 30
Problems
There was always going to be some discrepency between rows in and rows updated:
- new/terminated UPRNs
- "invalid" addresses (e.g: PO Boxes are an address but you probably can't register to vote with one)
- Missing UPRNs
- Wrong UPRNs
yada yada yada
- Some UPRNs have 1-to-1 mapping with PAF/UDPRN
- This is what you've got in AddressBase Standard
- UPRNs are hierarchical
- Sometimes you've got parent UPRNs in the data from councils
- Sometimes you've got the children, but not generally both
- You probably can't assume that if you know the polling station for the parent you can infer that is true for all of the children
- A child UPRN doesn't necessarily have to have the same postcode as its parent UPRN
- AddressBase Standard and the ONSUD doesn't give you enough to infer parent/child relationships
- It is non-trivial to answer the question "do we have coverage for all properties with this postcode" and involves an unclear series of assumptions
- This is partly about parent/child UPRNs but there are some child UPRNs that are a delivery point and some parents which aren't
- UPRNs fall into 4 categories:
- D (about 28m of these) - in the PAF, in AddressBase Standard
- C (about 4.8m of these) - not in the PAF
- L (about 2.4m of these) - not in the PAF
- N (about 1.3m of these) - not in the PAF
- Ns are things that do have UPRNs but aren't addresses for whatever reason
- Cs and Ls aren't always children (2.3m/7.2m are not)
- Multiple "geographic addresses" (special meaning) can share a "delivery point"
Examples:
- Properties which have been subdivided and registered as sep properties (e.g: for council tax, etc) and hence do have their own UPRN but where the front door only has one letterbox so Royal Mail doesn't recognise property as multiple delivery points (one UDPRN).
- Situations where mail is delivered to a single point and then distributed to multiple geographic addresses by internal post service (e.g: within an institution, or shared space):
- University --> halls of residence/academic depts
- Building with front desk/concierge type setup and then offices/flats inside
..and other things
-
Error rate from only using UPRNs that are a "delivery point" is not massive, but ignoring this systematically makes product less useful for users in these type of properties (and created more "we don't knows" where previously we can make 'everyone with those postcodes goes here' assumptions). People who live in houses internally subdivided into flats are more likely to be
- young people
- private renters
- urban areas
- moving around frequently
-
Simultaneously, students in halls of residence are also a special case of this problem. e.g: SA2 8PP has 1 UPRN in 'standard' AB but 78 'geographic addresses' in AB premium (I think this was a problem for Coventry, York?). Our current approach can create a problem for them.
-
Doing what we do now abstracts a lot of this in areas where postcode assigns to one station. It allows us to import addresses the council reckon are real even if they aren't in addressbase standard or don't have a uprn and in fuzzy edges, we can at least allow the people at 37A and 37B High Street to click on 37 High Street (with AB standard we can't aggregate child UPRNs back up)
-
It also seems that properties matching this pattern are more likely to be supplied without UPRNs by some councils.
-
Also (obviously) this doesn't seem to be handled exactly consistently across councils and/or EMSs.
- We have AB Standard and Premium, but not Pro (for some reason) - EC?
- AddressBase standard doesn't have this. Premium does and we do have it BUT...
- Minimum DB import is ~50Gb. In order to store the raw files, processed files DB and recommended indexes, you need over 100Gb on VM
- It took most a of a day to build it
- Querying performance is not suitable for real-time use
- In that form it is not a spatial dataset. Unsure of time to convert.
- Denormalisation time (in 3NF representation, its not really usable as a real-time data source for some query types - it would need optimisation/flattening)
- Its..complicated: https://www.ordnancesurvey.co.uk/docs/user-guides/addressbase-products-getting-started-guide.pdf (diagram on page 45 is helpful to understand model). Getting an address out looks like:
SELECT
abp_blpu.uprn,
abp_organisation.organisation,
abp_lpi.sao_start_number, abp_lpi.sao_start_suffix, abp_lpi.sao_end_number, abp_lpi.sao_end_suffix, abp_lpi.sao_text,
abp_lpi.pao_start_number, abp_lpi.pao_start_suffix, abp_lpi.pao_end_number, abp_lpi.pao_end_suffix, abp_lpi.pao_text,
abp_street_descriptor.street_description,
abp_street_descriptor.town_name, abp_street_descriptor.administrative_area,
abp_blpu.postcode_locator,
abp_blpu.latitude, abp_blpu.longitude,
abp_blpu.x_coordinate, abp_blpu.y_coordinate
FROM abp_blpu
JOIN abp_lpi ON abp_blpu.uprn=abp_lpi.uprn AND abp_lpi.language='ENG'
JOIN abp_street ON abp_lpi.usrn=abp_street.usrn
JOIN abp_street_descriptor ON abp_street.usrn=abp_street_descriptor.usrn AND abp_street_descriptor.language='ENG'
LEFT JOIN abp_organisation ON abp_blpu.uprn=abp_organisation.uprn
WHERE abp_lpi.end_date IS NULL
AND abp_blpu.uprn=49072844 ORDER BY abp_blpu.uprn;
-
Kensington + Chelsea - W8 6AH:
- 217000190 and 217000193 are in standard
- 217000191, 217000192, 217000194 and 217000195 are not
- 217000191, 217000192, 217000194 appear in data from council
-
Swansea - SA2 0AB Data from council contains a mix of
- UPRNs that have sub-properties which are aggregated into the top-level UPRN
- Sub-properties listed individually
-
Ceredigion:
- SA48 8AF has some nice examples of type N UPRNs
-
Torbay - a child UPRN doesn't necessarily have to have the same postcode as its parent UPRN:
- 4 Castle Chambers, 147 Union Street, Torquay, TQ1 4BU
- 5 Castle Chambers, 147 Union Street, Torquay, TQ1 4BT
(note different postcodes!) are both child properties of
- Castle Chambers, 147 Union Street, Torquay, TQ1 4BT
note in this case: https://mapit.mysociety.org/postcode/TQ1%204BT.html and https://mapit.mysociety.org/postcode/TQ1%204BU.html have the same centroid, but this prob isn't always the case.
I guess you get this situation where one of the sub-UPRNs is a business who is a large postcode user.
Our current approach is probably hiding a lot of these things.