Skip to content

Commit

Permalink
Merge pull request #1 from everypolitician-scrapers/add-scraped-page-…
Browse files Browse the repository at this point in the history
…archive

Add scraped page archive
  • Loading branch information
tmtmtmtm authored Nov 23, 2016
2 parents 0838585 + c06ca08 commit 18fed99
Show file tree
Hide file tree
Showing 5 changed files with 74 additions and 12 deletions.
12 changes: 12 additions & 0 deletions .rubocop.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
AllCops:
Exclude:
- 'Vagrantfile'
- 'vendor/**/*'
TargetRubyVersion: 2.3

inherit_from:
- https://raw.githubusercontent.com/everypolitician/everypolitician-data/master/.rubocop_base.yml
- .rubocop_todo.yml

Style/AndOr:
Enabled: false
Empty file added .rubocop_todo.yml
Empty file.
19 changes: 12 additions & 7 deletions Gemfile
Original file line number Diff line number Diff line change
@@ -1,14 +1,19 @@
# frozen_string_literal: true
# It's easy to add more libraries or choose different versions. Any libraries
# specified here will be installed and made available to your morph.io scraper.
# Find out more: https://morph.io/documentation/ruby

source "https://rubygems.org"
source 'https://rubygems.org'
git_source(:github) { |repo_name| "https://github.com/#{repo_name}" }

ruby "2.0.0"
ruby '2.0.0'

gem "scraperwiki", git: "https://github.com/openaustralia/scraperwiki-ruby.git", branch: "morph_defaults"
gem "pry"
gem "colorize"
gem "nokogiri"
gem "open-uri-cached"
gem 'scraperwiki', github: 'openaustralia/scraperwiki-ruby',
branch: 'morph_defaults'
gem 'pry'
gem 'colorize'
gem 'nokogiri'
gem 'open-uri-cached'
gem 'pdf-reader'
gem 'scraped_page_archive'
gem 'rubocop'
48 changes: 44 additions & 4 deletions Gemfile.lock
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
GIT
remote: https://github.com/openaustralia/scraperwiki-ruby.git
remote: https://github.com/openaustralia/scraperwiki-ruby
revision: fc50176812505e463077d5c673d504a6a234aa78
branch: morph_defaults
specs:
Expand All @@ -11,32 +11,64 @@ GEM
remote: https://rubygems.org/
specs:
Ascii85 (1.0.2)
addressable (2.5.0)
public_suffix (~> 2.0, >= 2.0.2)
afm (0.2.2)
ast (2.3.0)
coderay (1.1.0)
colorize (0.7.7)
crack (0.4.3)
safe_yaml (~> 1.0.0)
git (1.3.0)
hashdiff (0.3.0)
hashery (2.1.1)
httpclient (2.6.0.1)
httpclient (2.8.2.4)
method_source (0.8.2)
mini_portile (0.6.2)
nokogiri (1.6.6.2)
mini_portile (~> 0.6.0)
open-uri-cached (0.0.5)
parser (2.3.1.4)
ast (~> 2.2)
pdf-reader (1.3.3)
Ascii85 (~> 1.0.0)
afm (~> 0.2.0)
hashery (~> 2.0)
ruby-rc4
ttfunk
powerpack (0.1.1)
pry (0.10.1)
coderay (~> 1.1.0)
method_source (~> 0.8.1)
slop (~> 3.4)
public_suffix (2.0.4)
rainbow (2.1.0)
rubocop (0.45.0)
parser (>= 2.3.1.1, < 3.0)
powerpack (~> 0.1)
rainbow (>= 1.99.1, < 3.0)
ruby-progressbar (~> 1.7)
unicode-display_width (~> 1.0, >= 1.0.1)
ruby-progressbar (1.8.1)
ruby-rc4 (0.1.5)
safe_yaml (1.0.4)
scraped_page_archive (0.5.0)
git (~> 1.3.0)
vcr-archive (~> 0.3.0)
slop (3.6.0)
sqlite3 (1.3.10)
sqlite_magic (0.0.3)
sqlite3 (1.3.12)
sqlite_magic (0.0.6)
sqlite3
ttfunk (1.4.0)
unicode-display_width (1.1.1)
vcr (3.0.3)
vcr-archive (0.3.0)
vcr (~> 3.0.2)
webmock (~> 2.0.3)
webmock (2.0.3)
addressable (>= 2.3.6)
crack (>= 0.3.2)
hashdiff

PLATFORMS
ruby
Expand All @@ -47,4 +79,12 @@ DEPENDENCIES
open-uri-cached
pdf-reader
pry
rubocop
scraped_page_archive
scraperwiki!

RUBY VERSION
ruby 2.0.0p648

BUNDLED WITH
1.13.5
7 changes: 6 additions & 1 deletion scraper.rb
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,9 @@
# encoding: utf-8

require 'scraperwiki'
require 'open-uri'
# require 'open-uri'
require 'pdf-reader'
require 'scraped_page_archive/open-uri'

# require 'colorize'
# require 'pry'
Expand Down Expand Up @@ -42,3 +43,7 @@ def scrape_list(url)
ScraperWiki.save_sqlite([:id], term, 'terms')

scrape_list('http://www.assemblee-nationale.ga/object.getObject.do?id=190')

# archive some pages for later processing
open('http://www.assemblee-nationale.ga/34-deputes/168-bureaux-des-commissions/')
open('http://www.assemblee-nationale.ga/34-deputes/153-les-femmes-deputes/')

0 comments on commit 18fed99

Please sign in to comment.