-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Bump version to 0.0.1dev4 * Add .travis.yml * Add build status to README * Remove nightly build from travis * Loosen up scraped items in tests * Scrape just 1 item for category scrapes in tests * Split tests into meaningful modules * Require attrs >= 19.2.0 * Use pytest * Register marks, double quotes in travis env * Different kind of quote * Maybe these quotes
- Loading branch information
Showing
10 changed files
with
130 additions
and
95 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
language: python | ||
|
||
os: | ||
- linux | ||
|
||
python: | ||
- "3.6" | ||
- "3.7" | ||
- "3.8" | ||
|
||
env: | ||
- SPIDER=ilarticle | ||
- SPIDER=isarticle | ||
- SPIDER="not spider" | ||
|
||
install: | ||
- pip install -r requirements.txt -r requirements-dev.txt | ||
|
||
script: pytest -m "$SPIDER" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,7 @@ | ||
# finscraper | ||
|
||
[![Build Status](https://travis-ci.com/jmyrberg/finscraper.svg?branch=master)](https://travis-ci.com/jmyrberg/finscraper) [![Documentation Status](https://readthedocs.org/projects/finscraper/badge/?version=latest)](https://finscraper.readthedocs.io/en/latest/?badge=latest) | ||
|
||
![finscraper cover](https://github.com/jmyrberg/finscraper/blob/master/docs/cover.jpg?raw=true) | ||
|
||
The library provides an easy-to-use API for fetching data from various Finnish websites: | ||
|
@@ -37,6 +39,4 @@ repository up-to-date all by myself - pull requests are more than welcome! | |
|
||
--- | ||
|
||
[![Documentation Status](https://readthedocs.org/projects/finscraper/badge/?version=latest)](https://finscraper.readthedocs.io/en/latest/?badge=latest) | ||
|
||
Jesse Myrberg ([email protected]) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
[pytest] | ||
markers = | ||
spider: Mark test as a spider test. | ||
isarticle: Mark test as a isarticle test. | ||
ilarticle: Mark test as a ilarticle test. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,2 @@ | ||
nose==1.3.7 | ||
pytest==5.4.2 | ||
twine |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,4 @@ | ||
attrs==19.2.0 | ||
pandas==1.0.3 | ||
selenium==3.141.0 | ||
scrapy==2.1.0 | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,45 @@ | ||
"""Module for testing ILArticle.""" | ||
|
||
|
||
import pytest | ||
pytestmark = [pytest.mark.spider, pytest.mark.ilarticle] | ||
|
||
from finscraper.spiders import ILArticle | ||
|
||
|
||
def test_ILArticle_with_category(): | ||
# Test scraping | ||
spider = ILArticle('ulkomaat').scrape(1) | ||
df = spider.get() | ||
assert len(df) >= 1 | ||
assert len(df.columns) == 8 | ||
|
||
# Test continuing scraping | ||
df2 = spider.scrape(1).get() | ||
assert len(df2) >= len(df) + 1 | ||
|
||
# Save and load spider | ||
jobdir = spider.save() | ||
spider = ILArticle.load(jobdir) | ||
|
||
df3 = spider.scrape(1).get() | ||
assert len(df3) >= len(df2) + 1 | ||
|
||
|
||
def test_ILArticle_no_params(): | ||
# Test scraping | ||
spider = ILArticle().scrape(10) | ||
df = spider.get() | ||
assert len(df) >= 10 | ||
assert len(df.columns) == 8 | ||
|
||
# Test continuing scraping | ||
df2 = spider.scrape(10).get() | ||
assert len(df2) >= len(df) + 10 | ||
|
||
# Save and load spider | ||
jobdir = spider.save() | ||
spider = ILArticle.load(jobdir) | ||
|
||
df3 = spider.scrape(10).get() | ||
assert len(df3) >= len(df2) + 10 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,51 @@ | ||
"""Module for testing ISArticle.""" | ||
|
||
|
||
import pytest | ||
pytestmark = [pytest.mark.spider, pytest.mark.isarticle] | ||
|
||
from finscraper.spiders import ISArticle | ||
|
||
|
||
def test_ISArticle_with_category(): | ||
# Test scraping, no chromedriver | ||
spider = ISArticle('ulkomaat').scrape(1) | ||
df = spider.get() | ||
assert len(df) >= 1 | ||
assert len(df.columns) == 8 | ||
|
||
# Test scraping with chromedriver | ||
spider = ISArticle('ulkomaat', allow_chromedriver=True).scrape(1) | ||
df = spider.get() | ||
assert len(df) >= 1 | ||
assert len(df.columns) == 8 | ||
|
||
# Test continuing scraping | ||
df2 = spider.scrape(1).get() | ||
assert len(df2) >= len(df) + 1 | ||
|
||
# Save and load spider | ||
jobdir = spider.save() | ||
spider = ISArticle.load(jobdir) | ||
|
||
df3 = spider.scrape(1).get() | ||
assert len(df3) >= len(df2) + 1 | ||
|
||
|
||
def test_ISArticle_no_params(): | ||
# Test scraping | ||
spider = ISArticle().scrape(10) | ||
df = spider.get() | ||
assert len(df) == 10 | ||
assert len(df.columns) == 8 | ||
|
||
# Test continuing scraping (poor results, no driver) | ||
df2 = spider.scrape(10).get() | ||
assert len(df2) >= len(df) + 10 | ||
|
||
# Save and load spider | ||
jobdir = spider.save() | ||
spider = ISArticle.load(jobdir) | ||
|
||
df3 = spider.scrape(10).get() | ||
assert len(df3) >= len(df2) + 10 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters