-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unit test 及新聞來源修正 #19
Open
yookoala
wants to merge
10
commits into
ronnywang:master
Choose a base branch
from
yookoala:backport-improvement
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
工作中途,發現中央社 (CNA) 和公視新聞網 (PTS) 新聞來源不能正常解析,順便寫了相關的更正。 |
1. 將 SimpleTest 加入到 stdlibs 2. 新增 tests/run-all.php ,自動尋找、執行所有 tests 內的 *.test 檔案 3. 修改 webdata/init.inc.php ,在 unit test 模式下跳過伺服器連接程序
不直接使用 url-normalizer.js 代碼,改為用 submodule 方式連結。 工作彈性比較大。
原有的 Crawler_*::crawl 頗多重覆,而且難以進行 unit test,故作出下 列改動。 將原有的 Crawler_*::crawl 方法拆開重寫︰ 1. 將取得目錄 HTML 的部份改寫成 crawlIndex 方法 2. 將解析 crawlIndex 方法所得,取得連結的部份,改寫成 findLinksIn 3. 移去所有資料庫操作 新加入 Crawler::crawl 方法,將原有 Crawler_*::crawl 內,將連結加入 資料庫的部份,集中到該處,並修改 webdata/scripts/crawler-new.php 改為使用 Crawler::crawl 方法抓取連結。
這批 unit test 的目標是固定現況,有些新聞來源似乎有問題,需要維護 修改,已經紀錄在相關的 README.md 上,日後會修正。
將 Crawler::updateContent 之中與資料庫沒有直接關係的部份,分拆成 Crawler::prepareContent,以方便在 unit test 之中引用。
在所有 parse 測試前,先對內容執行 Crawler::prepareContent。這比較合 乎正式的流程,也處理掉先前沒有 iconv 而產出 big5 的問題。
容許 Travis CI 在每次 push 時,自動執行所有 unit test 測試。
中央社在 2014/12/12 改版。本修改更正有關新聞來源,並更正相關的 unit test。
公視新聞網似乎在 2014/12/12 修改了 og:title 的產生機制,以致出現錯誤。 本修改已修正相關問題。並更新了相關的 unit test。
yookoala
force-pushed
the
backport-improvement
branch
from
December 20, 2014 10:47
2084404
to
960d7c3
Compare
雖然修改頗大,但希望能認真考慮使用 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Crawler_*
類別,頗易受新聞網站更新所害,需要維護。但到需要維護時,由於舊版網站已經不存在,難以比對判定是否有結構改動,或者改動在甚麼地方出現。另外在改動時,在原有的程式架構下頗難自動測試,因此想修改程式架構,持續測試目前的Crawler_*
類別。另外有一些零星的改善,包括︰
git submodule
連結 url-normalizer.js,讓源碼控制較為靈活。