Project Meeting 2020.10.20

Technical call

Need to fix https://github.com/ActivitySim/activitysim/pull/349 asap
- SEMCOG got stuck running the example according to the website instructions
- We'll pull and release a new version today
- Probably better not to lock down the version but let it take advantage of dependency updates and fix issues as they arise
Pre-computing / caching to support the TVPB
Jeff making progress
- He's working on understanding the tradeoffs of pre-computing versus on-demand
- He implemented not re-calculating duplicate tap-to-tap utilities and this sped up Marin by 4x
- He implemented caching for tap-to-tap utilities, but this was less advantageous
- He thinks we need to cache the n-best-paths list for omaz,dmaz,tod,demographic_segment
- So maybe we cache it using a fast and multi-process friendly technology such as arrow/feather
- And then we either pre-compute or possibly update on-demand
- For a full sample, pre-compute might be better, but for a 100 HH sample, maybe on-demand is better
- It depends on how sparse the data is
Discussion
- I tried my best to explain things, but I think Doyle needs to help explain next time
- What's the dimensionality of the problem?
  - Marin TM2: 6000 mazs, 6200 taps, the average MAZ to TAP ratio for walk access is 114, the average for drive is 7. Note that this does not include trimming of MAZ to TAP pairs based on if the TAP is further away and doesn't serve new lines. If we cropped to the 1.2 miles used for the tap_serves_new_lines function (aka tapLines) then we get 63 taps per maz
- We think it makes sense to pre-compute the path components, but we're not sure about the N-best tap pairs
- Pre-computing seems like a reasonable/understandable/simple solution - just compute the components, save them, and look them up later. It may not be completely optimal, but it also might be easier for code maintenance and developer use than something a bit better and fancier
- So the question becomes does pre-computing create too big a file and how sparse is the data set and therefore is the tradeoff not worth it
- This depends a bit on the settings we spec'd that are consistent with TM2:
  - max_paths_across_tap_sets: 3, which is the number of N-best tap pairs in total to keep for each omaz,dmaz,tod,demographic_segment
  - max_paths_per_tap_set: 1, which is the number of N-best tap pairs to keep within each skim set (premium, local, etc.)
- 6000 mazs * 63 taps * 63 taps * 6000 mazs * 5 time periods * 3 demographic segments = 2,143,260,000,000 (2 trillion)
- This is a big number so we want to implement a solution that considers the tradeoffs between pre-computing and caching and may use an efficient disk-based storage solution and/or on-demand functionality
- Jeff to give us an update next week
Profiling of memory usage:
- MTC skims 6.7gb in memory, 826 skims * 5 time periods * 1475 zones
- SEMCOG skims 47gb 1480 skims * 5 time periods * 2900 zones
- @Stefan to add PSRC numbers
- Next time discuss stats on pipeline table sizes
- Multiprocessing creates lots of table using chunksize and so this uses lots of memory as well
Discuss estimation feature completion progress and #354 next time

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Project Meeting 2020.10.20

Technical call

ActivitySim

Clone this wiki locally