Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RODARS pipeline #1103

Merged
merged 46 commits into from
Feb 14, 2025
Merged

RODARS pipeline #1103

merged 46 commits into from
Feb 14, 2025

Conversation

gabrielwol
Copy link
Collaborator

@gabrielwol gabrielwol commented Nov 29, 2024

What this pull request accomplishes:

  • Construction event data (RODARs)!
  • rodars_pull.py daily DAG which runs on Morbius.
    • Pulls RODARs data from ITSC in two parts:
      • Issues: only the latest update (timestamputc) for each issueid.
      • Issue locations: corresponding location details for latest issues.
    • Data is processed using functions in rodars_functions.py, including: unnesting json data, converting binary coordinates to postgres readable format, data type conversion.
    • Finally, data is inserted into congestion_events.rodars_issues and congestion_events.rodars_issue_locations.
      • Notably older issue_locations are deleted using this on insert trigger. I used this method because I thought the number of locations per issue could theoretically change, resulting in orphaned rows, unaffected by an on conflict do update.
  • 5 new lookup tables taken from the documentation in itsc_factors schema: direction, lanesaffectedpattern, locationblocklevel, roadclosuretype_new, roadclosuretype_old. Used to convert codes to human readable format in issue_locations view.
    • Only the itsc_factors.lanesaffectedpattern lookup could use review: I devised the lane_open, lane_closed columns to give a numeric interpretation of how many lanes are open/closed. I used 0.5 for partial closures/slowdowns.
    • Also checkout the function itsc_factors.get_lanesaffected_sums which translates these codes into numeric columns for ease of use in the main view. A lanesaffectedpattern could be something like 'LOLCLCWO' (lane open, lane closed, lane closed, sidwalk open) (Try: SELECT lane_open_auto, lane_closed_auto, lane_open_bike, lane_closed_bike, lane_open_ped, lane_closed_ped, lane_open_bus, lane_closed_bus FROM itsc_factors.get_lanesaffected_sums('LOLCLCWO'))

Issue(s) this solves:

What, in particular, needs to reviewed:

  • Does this live in the right schema (congestion_events)? View/table names?
  • Decision to only keep latest issue definition (every update is logged in ITS Central).
  • Review the main view and the readme.

What needs to be done by a sysadmin after this PR is merged

  • Refresh data_scripts on Morbius.

@gabrielwol gabrielwol added the New Data for creating pipelines for new datasets label Nov 29, 2024
@gabrielwol gabrielwol self-assigned this Nov 29, 2024
@gabrielwol gabrielwol linked an issue Nov 29, 2024 that may be closed by this pull request
@gabrielwol gabrielwol marked this pull request as ready for review January 10, 2025 21:54
@gabrielwol
Copy link
Collaborator Author

Ready to review! Will add some more usage examples to readme at some point.

@gabrielwol gabrielwol requested review from radumas and chmnata February 7, 2025 20:36
Copy link
Member

@radumas radumas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commenting on the documentation.

Pleased that the docs are extensive, you have thought out the limitations of the data, done some exploratory analysis and the use of the callouts for extra punchy information.

I have some thoughts on re-ordering the README, but maybe we merge this and solicit feedback from users.

I have added some small editorial comments on language.

I think the big thing missing in terms of documentation is that parent folders should have a brief description of the contents of this folder so that people can know where to find it.

Also should this folder be more accurately called something like lane_obstructions? Or road_permits? Since the data are for events and construction (right?)

events/construction/readme.md Outdated Show resolved Hide resolved
events/construction/readme.md Outdated Show resolved Hide resolved
Copy link
Collaborator

@chmnata chmnata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just some v smol clean up request, :gabe-approved:

CASE iil.centreline_id WHEN 0 THEN NULL ELSE iil.centreline_id END AS centreline_id,
COALESCE(
centreline_latest.geom,
--find geoms for centreline that do not appear in centreline_latest
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ohhhh nice

events/construction/sql/insert-rodars_issue_locations.sql Outdated Show resolved Hide resolved
@gabrielwol gabrielwol force-pushed the 1088-explore-new-rodars-data branch from 7f407b0 to 0e11ccf Compare February 11, 2025 17:21
@gabrielwol gabrielwol force-pushed the 1088-explore-new-rodars-data branch from 0e11ccf to d58eab4 Compare February 11, 2025 17:26
@gabrielwol
Copy link
Collaborator Author

@radumas readme changes ready for re-review:

  • renamed folder to road_permits, moved existing contents of events into events/special_events and added events level readme.
  • updated reference in top level data-sources readme + removed old road_closure folder

Copy link
Member

@radumas radumas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Thank you!

@radumas
Copy link
Member

radumas commented Feb 13, 2025

Is there a bug in the linting action that it isn't annotating?

@gabrielwol
Copy link
Collaborator Author

Is there a bug in the linting action that it isn't annotating?

I also noticed previously on this PR there were some lines being erroneously annotated with line length sqlfluff error. Perhaps time to look into upgrading sqlfluff action.

@gabrielwol gabrielwol merged commit 6995363 into master Feb 14, 2025
5 of 6 checks passed
@gabrielwol gabrielwol deleted the 1088-explore-new-rodars-data branch February 14, 2025 14:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
New Data for creating pipelines for new datasets
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Explore New RODARS data
3 participants