-
-
Notifications
You must be signed in to change notification settings - Fork 311
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
0672 spider chi northwest home equity #1007
base: main
Are you sure you want to change the base?
0672 spider chi northwest home equity #1007
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! This is looking great so far, and I just have some comments on parsing each detail page rather than loading from the summary since there's some extra information on the detail pages. Let me know if you have any questions
continue | ||
meeting = Meeting( | ||
title=self._parse_title(item), | ||
description="Past Meetings", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can also be an empty string
timezone = "America/Chicago" | ||
start_urls = ["https://nwheap.com/category/meet-minutes-and-agendas/"] | ||
|
||
def parse(self, response): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at this overall, I think it would be best if we could scrape each meeting URL separately. It seems like they don't generally have much information on the detail page, but at the very least we'll be able to get a unique URL for each meeting's source
, and in some cases they've included dial-in information and links like this page https://nwheap.com/2020/11/30/december-10-2020-meeting-minutes-and-agenda-annual-budget-meeting/
chi_ssa_25
might be a good example to go off of for this. parse
should yield requests to each visible detail page, and a separate method like _parse_detail
can yield a single Meeting
for each detail page
|
||
def _parse_links(self, item): | ||
"""Parse or generate links.""" | ||
link = item.xpath("a/@href").get() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This link should go in the source, since links is more for things like agendas and minutes, but this will likely change with parsing each detail page
"Main Office" if addr.startswith("3234 N. Central Ave") else "Unknown place" | ||
) | ||
return { | ||
"address": addr, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since there's a static address for "Main Office" we can include that here if it matches, but this will also change when parsing the detail page
Sorry for going AWOL, has this tracker been finished, or is it still in the state where I left it? Either way is fine -- i shouldn't have gone missing for the better part of a year -- but if it is still unfinished i would be glad to finish things out. Just unclear about the second pull request, whether its a new start or an edit of what i started. |
I am unclear as to the status of this tracker, so I am going to try to start fresh on a different board meeting in need of tracking. If this one is still unfinished though, and could use my help finishing out, let me know because I don't want to interfere if someone else has taken over the issue. I will periodically check back here though. Again, sorry for going AWOL after submitting the initial pull request. That's definitely on me. |
Summary
Issue: #966
Replace "ISSUE_NUMBER" with the number of your issue so that GitHub will link this pull request with the issue and make review easier.
Checklist
All checks are run in GitHub Actions. You'll be able to see the results of the checks at the bottom of the pull request page after it's been opened, and you can click on any of the specific checks listed to see the output of each step and debug failures.
Questions
Include any questions you have about what you're working on.