Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scrape the AI legislation PDFs #1

Open
5 tasks
mkalish opened this issue Dec 31, 2024 · 0 comments
Open
5 tasks

Scrape the AI legislation PDFs #1

mkalish opened this issue Dec 31, 2024 · 0 comments

Comments

@mkalish
Copy link
Contributor

mkalish commented Dec 31, 2024

Description

Implement a scraping tool to iterate over the AI legislation table and download the most recent version of each legislation

AirTable: https://airtable.com/appJHkVlGCmNLPLXL/shrFvhCRSOE9dbbSz/tbl2tL08FmzPrXgT2

Acceptance Criteria

  • A re-usable tool can be executed to scrape the AI legislation
  • If the legislation is enacted, scrape that, otherwise scrape the most recent one
  • Each scraped legislation should save in the following format {STATE}_{BILL-TITLE}_{BILL-STATUS}_{BILL-DATE}.html
    • Example: ALABAMA_HB168_ENACTED_2024_04_18.html
  • Each scraped legislation should also include a relevant metadata file i.e ALABAMA_HB168_ENACTED_2024_04_18.md that should include
    • Bill title
    • State
    • Bill status
    • Legislative session
  • All the scraped bills should go in a directory with that days date. Expected structure:
    • data/raw/20241231/ALABAMA_HB168_ENACTED_2024_04_18.html

Developer notes

  • Might require creating an account with AirTable
  • The bills are exposed as HTML and should be saved that way so we can keep the raw data and later tickets will parse it
  • Most of the metadata can be found in the airtable
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant