Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RODARS pipeline #1103

Merged
merged 46 commits into from
Feb 14, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
46 commits
Select commit Hold shift + click to select a range
fc69200
#1088 explore new rodars data
gabrielwol Oct 28, 2024
5d22d5a
#1088 initial rodars schema
gabrielwol Oct 29, 2024
9b23cca
#1088 add new divisions, fix null geometry issue, rename
gabrielwol Nov 20, 2024
3392d6e
#1088 RODARs DAG
gabrielwol Nov 29, 2024
09886de
#1088 bug fix: update each locationindex geom separately
gabrielwol Nov 29, 2024
5bbfd3a
#1088 readme updates
gabrielwol Nov 29, 2024
444a208
#1088 switch to congestion_events schema, separate issues & locations
gabrielwol Dec 6, 2024
0ad9ef3
#1088 update dag to 2 tasks
gabrielwol Dec 6, 2024
d7c3f90
#1088 lookup tables
gabrielwol Dec 6, 2024
93edebe
#1088 fix issue insert
gabrielwol Dec 6, 2024
477712b
#1088 lanes affected fixes
gabrielwol Dec 9, 2024
33d2318
#1088 issues_locations -> issue_locations
gabrielwol Dec 10, 2024
e82d5ea
#1088 convert timestamps to eastern, data type fixes, misc
gabrielwol Dec 10, 2024
7cc012d
#1088 vds_bot -> events_bot
gabrielwol Dec 12, 2024
1a364bb
#1088 add on insert delete trigger to locations
gabrielwol Dec 12, 2024
15a84c0
#1088 update permissions
gabrielwol Dec 12, 2024
f19268e
#1088 add back date filters
gabrielwol Dec 12, 2024
0ff339f
#1088 remove TM3 issues
gabrielwol Dec 20, 2024
243f127
#1088 itsc_locations updates
gabrielwol Dec 20, 2024
aa03126
#1088 rename itsc->rodars
gabrielwol Dec 20, 2024
8388929
#1088 rename itsc->rodars (files)
gabrielwol Dec 20, 2024
13d8329
#1088 rename itsc->rodars (py)
gabrielwol Dec 20, 2024
6ecc84a
#1088 rename itsc->rodars (py)
gabrielwol Dec 20, 2024
212ef9a
#1088 rename itsc->rodars (harder)
gabrielwol Dec 20, 2024
be45817
#1088 smol readme update
gabrielwol Dec 20, 2024
5fc4f5f
#1088 sqlfluff
gabrielwol Jan 10, 2025
4bda1db
#1088 remove some dated exploration work
gabrielwol Jan 10, 2025
b91b205
#1088 fix rodars_factors -> itsc_factors inconsistency
gabrielwol Jan 10, 2025
15bd9b4
#1088 rm outdated update sql
gabrielwol Jan 10, 2025
9c8cf7d
#1088 readme, comment updates
gabrielwol Jan 10, 2025
04e15ab
#1088 format callouts properly?
gabrielwol Jan 10, 2025
d6361b3
#1088 format callouts properly? try 2
gabrielwol Jan 10, 2025
a1af45c
#1088 add lanesaffectedpattern codes+description
gabrielwol Jan 13, 2025
e851353
#1088 get centreline_geom even when not in _latest
gabrielwol Jan 13, 2025
9920cb4
#1088 readme updates; add example queries, data dictionary
gabrielwol Jan 13, 2025
6d5d318
#1088 remove extra >s
gabrielwol Jan 14, 2025
cc9cd5b
#1134 add data-sources tag to DAGs
gabrielwol Jan 15, 2025
f9c3e6f
#1088 fluff
gabrielwol Feb 5, 2025
ee74b51
#1088 Update rodars_pull.py
gabrielwol Feb 7, 2025
471a236
#1088 remove bdit_humans permissions from raw tables
gabrielwol Feb 7, 2025
9608666
#1088 minor sql+readme review updates
gabrielwol Feb 11, 2025
9120a68
#1088 restructure events folder and add top level readme
gabrielwol Feb 11, 2025
4f3718d
#1088 road_closures -> road_permits (not all are closed)
gabrielwol Feb 11, 2025
f3a26ba
update higher level readmes, remove oudated road_closure folder #1088
gabrielwol Feb 11, 2025
d58eab4
#1088 minor permissions updates, table comments
gabrielwol Feb 11, 2025
57ed95c
#1088 fix paths in py
gabrielwol Feb 11, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 7 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ For those curious about what data we manage is released on OpenData, see the [Op
- [Bluetooth Detectors](#bluetooth-detectors)
- [Collisions](#collisions)
- [Cycling App (inactive)](#cycling-app-inactive)
- [Events (inactive)](#events-inactive)
- [Events](#events)
- [GIS - Geographic Data](#gis---geographic-data)
- [Assets](#assets)
- [Red Light Cameras](#red-light-cameras)
Expand All @@ -26,13 +26,12 @@ For those curious about what data we manage is released on OpenData, see the [Op
- [Incidents (inactive)](#incidents-inactive)
- [INRIX (inactive)](#inrix-inactive)
- [Parking (inactive)](#parking-inactive)
- [Road Closure (inactive)](#road-closure-inactive)
- [TTC (inactive)](#ttc-inactive)
- [Volume Data](#volume-data)
- [Miovision - Multi-modal Permanent Video Counters](#miovision---multi-modal-permanent-video-counters)
- [RESCU - Loop Detectors (inactive)](#rescu---loop-detectors-inactive)
- [Short-term Counting Program](#short-term-counting-program)
- [VDS](#vehicle-detector-station-vds)
- [Short-term Counting Program](#short-term-counting-program)
- [Vehicle Detector Station (VDS)](#vehicle-detector-station-vds)
- [Watch Your Speed signs](#watch-your-speed-signs)
- [Weather](#weather)
- [Open Data Releases](#open-data-releases)
Expand Down Expand Up @@ -60,11 +59,13 @@ The collisions dataset consists of data on individuals involved in traffic colli

The Cycling App collected OD and trip data until 2016.

## Events (inactive)
## Events

[`events/`](events/)

How do special events impact traffic in the city? Data sources include the City's Open Data and TicketMaster.
How does construction and special events impact traffic in the city?
- City road permitting data (RoDARs)
- (oudated) Special events from City's Open Data and TicketMaster

## GIS - Geographic Data

Expand Down Expand Up @@ -117,11 +118,6 @@ Data collected from a variety of traffic probes from 2007 to 2016 for major stre

This contains R and SQL files for pulling parking lots and parking tickets from Open Data. They might be useful but haven't been documented or automated.

## Road Closure (inactive)
[`road_closure/`](road_closure/)

This directory contains a Python file to pull and parse the XML feed of road closures. This process hasn't been automated (and more recent versions of the API use JSON).

## TTC (inactive)
[`ttc/`](ttc/)

Expand Down
24 changes: 17 additions & 7 deletions dags/readme.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
- [Airflow](#airflow)
- [DAG Naming](#dag-naming)
- [Contents of DAGs Folder](#contents-of-dags-folder)
- [Bluetooth](#bluetooth)
- [Weather](#weather)
- [Misc](#misc)
- [Here](#here)
Expand All @@ -12,6 +13,7 @@
- [**common\_tasks.py**](#common_taskspy)
- [**dag\_functions.py**](#dag_functionspy)
- [Miovision](#miovision)
- [RoDARS](#rodars)
- [VDS (Formerly RESCU)](#vds-formerly-rescu)
- [WYS](#wys)
- [Common Tasks](#common-tasks)
Expand All @@ -33,6 +35,9 @@ You may notice that many older DAGs have not been renamed to this standard: it i
## Contents of DAGs Folder
**Only put DAGs for data intake in this folder,** DAGs for data processing related to projects should be in their respective project repositories.

### Bluetooth
- [bluetooth_check_readers_temp.py](bluetooth_check_readers_temp.py): undocumented.

### Weather
- **[weather_pull.py](weather_pull.py)**: [readme](../weather/README.md#data-pipeline---weather_pull-dag).
- Deprecated: [pull_weather.py](pull_weather.py)
Expand All @@ -44,7 +49,8 @@ You may notice that many older DAGs have not been renamed to this standard: it i
### Here
- **[pull_here.py](pull_here.py)**: [readme](../here/traffic/README.md#probe_path).
- **[pull_here_path.py](pull_here_path.py)**: [readme](../here/traffic/README.md#path).
- HERE Aggregations: [citywide_tti_aggregate.py](citywide_tti_aggregate.py).
- HERE Aggregations: [tti_aggregate.py](tti_aggregate.py).
- Deprecated: [citywide_tti_aggregate.py](citywide_tti_aggregate.py)

### Replication
- [**replicators.py**](replicators.py): creates collisions and counts replicator DAGs as part of the MOVE -> bigdata replication process.
Expand All @@ -57,7 +63,7 @@ You may notice that many older DAGs have not been renamed to this standard: it i

### GIS
- [**assets_pull.py**](assets_pull.py): [readme](../gis/assets/README.md#assets).
- [**gcc_layers_pull.py**](gcc_layers_pull.py): [readme](../gis/gccview/README.md#gccview-pipeline).
- [**gcc_layers_pull.py**](gcc_layers_pull.py): [readme](../gis/gccview/README.md#gccview-pipeline). Runs on Morbius, Bancroft.
- [**vz_google_sheets.py**](vz_google_sheets.py): [readme](../gis/school_safety_zones/README.md#2-the-automated-data-pipeline).
- Deprecated: [**pull_interventions_dag.py**](pull_interventions_dag.py).

Expand Down Expand Up @@ -90,16 +96,20 @@ Contains helper functions to be used in multiple DAGs.
- **[miovision_check.py](miovision_check.py)**: [readme](../volumes/miovision/api/readme.md#miovision_check).
- Deprecated: [pull_miovision.py](pull_miovision.py), [check_miovision.py](check_miovision.py).

### RoDARS
- **[rodars_pull.py](rodars_pull.py)**: [readme](../events/road_permits/readme.md#rodars-dag). Runs on Morbius.

### VDS (Formerly RESCU)
- **[vds_check.py](vds_check.py)**: [readme](../volumes/vds/readme.md#vds_check-dag).
- **[vds_pull_vdsdata.py](vds_pull_vdsdata.py)**: [readme](../volumes/vds/readme.md#vds_pull_vdsdata-dag).
- **[vds_pull_vdsvehicledata.py](vds_pull_vdsvehicledata.py)**: [readme](../volumes/vds/readme.md#vds_pull_vdsvehicledata-dag).
- **[vds_check.py](vds_check.py)**: [readme](../volumes/vds/readme.md#vds_check-dag). Runs on Morbius.
- **[vds_pull_vdsdata.py](vds_pull_vdsdata.py)**: [readme](../volumes/vds/readme.md#vds_pull_vdsdata-dag). Runs on Morbius.
- **[vds_pull_vdsvehicledata.py](vds_pull_vdsvehicledata.py)**: [readme](../volumes/vds/readme.md#vds_pull_vdsvehicledata-dag). Runs on Morbius.
- Deprecated: [check_rescu.py](check_rescu.py).

### WYS
- [**refresh_wys_monthly.py**](refresh_wys_monthly.py): [readme](../wys/api/readme.md#wys_monthly_summary).
- [**pull_wys.py**](pull_wys.py): [readme](../wys/api/readme.md#pull_wys).
- [**wys_pull.py**](wys_pull.py): [readme](../wys/api/README.md#dag).
- [**refresh_wys_monthly.py**](refresh_wys_monthly.py): [readme](../wys/api/README.md#wys_monthly_summary).
- [**wys_check.py**](wys_check.py): Contains additional data quality checks for `pull_wys`.
- Deprecated [**pull_wys.py**](pull_wys.py).

## Common Tasks

Expand Down
68 changes: 68 additions & 0 deletions dags/rodars_pull.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
import os
import sys
from functools import partial
from datetime import datetime, timedelta

from airflow.decorators import dag, task
from airflow.providers.postgres.hooks.postgres import PostgresHook
from airflow.models import Variable

DAG_NAME = 'rodars_pull'
DAG_OWNERS = Variable.get('dag_owners', deserialize_json=True).get(DAG_NAME, ['Unknown'])

repo_path = os.path.abspath(os.path.dirname(os.path.dirname(os.path.realpath(__file__))))
sys.path.insert(0, repo_path)

from events.road_permits.rodars_functions import (
fetch_and_insert_issue_data, fetch_and_insert_location_data
)
from dags.dag_functions import task_fail_slack_alert, get_readme_docmd

README_PATH = os.path.join(repo_path, 'events/road_permits/readme.md')
DOC_MD = get_readme_docmd(README_PATH, DAG_NAME)

default_args = {
'owner': ','.join(DAG_OWNERS),
'depends_on_past': False,
'start_date': datetime(2024, 11, 27),
'email_on_failure': False,
'email_on_retry': False,
'retries': 1,
'retry_delay': timedelta(minutes=5),
'retry_exponential_backoff': True, #Allow for progressive longer waits between retries
'on_failure_callback': partial(task_fail_slack_alert, use_proxy = True),
'catchup': True,
}

@dag(
dag_id=DAG_NAME,
default_args=default_args,
max_active_runs=1,
template_searchpath=[
os.path.join(repo_path,'events/road_permits/sql')
],
doc_md=DOC_MD,
tags=['bdit_data-sources', 'rodars', 'pull', 'itsc_central'],
schedule='0 4 * * *' #daily at 4am
)

def rodars_dag():
@task
def pull_rodars_issues(ds = None):
"Get RODARS data from ITSC and insert into bigdata `congestion_events.itsc_issues`"
itsc_bot = PostgresHook('itsc_postgres')
events_bot = PostgresHook('events_bot')
fetch_and_insert_issue_data(select_conn=itsc_bot, insert_conn=events_bot, start_date=ds)

@task
def pull_rodar_locations(ds = None):
"Get RODARS data from ITSC and insert into bigdata `congestion_events.itsc_issue_locations`"
itsc_bot = PostgresHook('itsc_postgres')
events_bot = PostgresHook('events_bot')
fetch_and_insert_location_data(select_conn=itsc_bot, insert_conn=events_bot, start_date=ds)
#add a delete task to remove outdated revisions?

#these tasks are not dependent, but this helps so only one fails at a time
pull_rodars_issues() >> pull_rodar_locations()

rodars_dag()
10 changes: 10 additions & 0 deletions events/readme.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# Introduction

This folder contains information on data sources related to road closures and special events.

## [road_permits](./road_permits/)
This folder contains [Road Disruption Activity Reporting System (RoDARS)](https://www.toronto.ca/services-payments/streets-parking-transportation/road-restrictions-closures/road-disruption-activity-reporting-system-rodars/) which is pulled daily from ITS Central by the [`rodars_pull` DAG](../dags/rodars_pull.py).
The describes permitted road construction and event related road closures with data dating back to ~2012, with more accurate data starting in mid-2024 (RoDARS "New").

## [special_events](./special_events/)
This folder contains an outdated (2017) attempt at archiving special events happening around the City of Toronto for traffic impact analysis from two data sources: City of Toronto Open Data and Ticketmaster.
Loading