Skip to content

Project Meeting 2021.01.12

Ben Stabler edited this page Jan 13, 2021 · 8 revisions

Technical Call

  • Discuss Doyle completing estimation mode for trip mode choice
    • Trip mode choice now done and was quite complex and needed to update the 2 and 3 examples as well
    • Results not exactly the same since we merged in some other changes that changed results
    • I can now review the estimation branch
    • Doyle to now work on performance tuning (see below)
  • Discuss Jeff Newman's nmtf estimation notebook
    • Will rebase his cdap and nmtf notebooks off of estimation branch
    • And send me note to review when ready
    • Jeff will also test the other estimation notebooks and update them to be more automatable/testable since they're sort of software and sort of training materials right now
    • Then we'll pause the task and discuss how to make fewer smarter / more generic notebooks rather than a bunch of separate notebooks for each submodel
  • Discuss Joe's copyright and licensing history
    • The basic plan is to:
    • Update the license so it says Copyright AMPORF
    • Make sure work for hire is in the contracts
    • Add a Contributor License Agreement like this one and have GitHub automatically manage it with Pull Requests
    • Refactor out the orca code under a bench contract work order
    • The license will stay BSD-3
    • Let's discuss the partner MOU/payment agreement on Thursday
  • Discuss PSRC progress and need for location sampling improvements
    • 40k MAZ model with 100k HHs is up and running from start to finish
    • Ran into issues with location sampling since it's considering all 40k alternatives every time for every chooser
    • Would be good to do something like DaySim's two stage sampling approach - basically do it at the taz level and then pick an maz within each taz based on the maz's share of the size term. It also pre-calculates the taz to taz probability matrix and then just draws random numbers and picks a zone.
    • We could also filter on size term > 0 before sol ving expressions, which would help with sparse alternative sets like school/university
    • These issues are similar to the performance improvements tasks below so let's add this to that discussion for consideration
    • Using importance sampling like DaySim is a good approach and we'll need to implement something like it for the SANDAG cross border model
    • Could do it at the start of each process since its fast and then we don't have to persist / share across processes
  • Discuss Doyle's list of potential performance improvements
    • expression file optimization
      • good to make these templates as smart as possible since new users rely on them
      • could speed up the model maybe 50%?
    • finish adaptive chunking
      • automatically determine chunksize based on available memory
      • explore adaptive chunking based on actual memory usage rather than ‘registered’ data objects
    • deduplicate alternatives as discussed for ARC
    • cache logsums, which needs categories/market segments defined
      • would help a lot; would require a lot of plumbing updates
    • data format and size optimization.
      • (e.g. rightsize numbers and convert strings to factors)
    • pipeline optimization
      • alternative pipeline file format (e.g. feather)
      • improve control over checkpointing (pipeline footprint and read/write time)
      • Alex likes this one, see his email
    • two stage location sampli ng for PSRC like DaySim
    • run arc and optimize where appropriate
    • run psrc and optimize where appropriate
    • We will run the ARC and PSRC versions of the model and review submodel and component runtimes and memory usage to help inform our discussion of what to work on
    • Review logs and snakeviz profiler
    • Stefan and Guy/Clint to share setups
    • It would be good to understand level-of-effort since maybe we can do a few smaller easier ones and then a big one or two?
Clone this wiki locally