-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add scraped page archive #1
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
4066bf8
to
737f9ac
Compare
@ondenman the rubocop tidying only needs to be for the (NB you don't necessarily need to actually include Rubocop+config in these sort of changes, but there's no harm in doing so, so now that it's already here it's OK to keep it) |
737f9ac
to
cd62b98
Compare
Source was pointing to :git not :github source as defined above
The only two relevant pages I could find from the outgoing legislature were the Commissions and the list of female deputies.
The PR description here is a bit misleading — this isn't really archiving any pages, only a PDF file which presumably is never going to change. As such this change probably isn't that useful (other than a single run to archive that PDF once). However, I suspect we should also archive (though not process) http://www.assemblee-nationale.ga/34-deputes/168-bureaux-des-commissions/ and http://www.assemblee-nationale.ga/34-deputes/153-les-femmes-deputes/ I've added an extra commit to pick up those two pages. |
I've configured morph, though I haven't set it to run every day, I doubt it's actually going to change again between now and the election. |
What does this do?
Uses scraped-page-archive to archive all pages scraped.
Why is this needed?
There's an election coming up (01/12/2016), and it's likely that the data on the official site will disappear, meaning any data we're not already picking up will be lost. Archiving it now gives us the chance to go back and re-scrape later even if it disappears.
Relevant Issue(s):
everypolitician/everypolitician-data#20544
Checklists:
Scraper change:
Adding Archiving:
Gemfile change:
github:
protocol, not simplygit: