-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathRELEASE_NOTES
63 lines (54 loc) · 2.21 KB
/
RELEASE_NOTES
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
V 1.6-b1
-HerladSun support added
-Logging in using Celenium/PhantomJS, scraping with custom downloader
-Added webdriver option to per-url regexps
V 1.4-b1
-Foxsports support added
-Foxsports concurrent requests per domain set to 1
-log file config removed; All logging is now done through root logger;
-(Foxsports) exceptions in media_ex.py (related to fp presence in containers) fixed
V 1.3-b1
-Watoday support added
-no_duplicate_videos mechanism added
--duplicate (within one session) videos are replaced with symbolic links
--video filename is extracted by YDl in advance, so makes sense to disable the mechanism if no duplicates expected
-added hls_prefer_native option; ffmpeg hangs on network lags;
V 1.2-b1
-WesternBulldogs support added
-refactored Yahoo to comply with new status logging system
-all spiders accept start_url now
V 1.1-b4
*list lost*
V 1.1-b3
-fixed to download all videos, including duplicated in different articles
V 1.1-b2
-per-spider log separation in the same index log file
-per-url xpaths added
-spider name(s) entered as command line parameters
-support for AFL video categories crawling
-support for tricky categories like draft
-major AFL categories tested
-fixed last page loading
-added 2 spiders to setup.sh
V 1.1-b1
-added request/response url assertion check (user to use www. for the start url when needed)
-image download errors reported per-image
-"Spider._active_requests" removed
-added spaces between all text paragraphs
-afl: greedy article text extraction
-afl: videos are loaded with MediaPipeline
-afl: added support for ooyala player videos accessible through content-id html get request
-afl: spider separated
-generic spider separated
V 1.0-b2
-setup script added; creates dot-directory with config inside and schedules runs with cron
-configs logs and index moved to dot-directory at users home
-log directories named with time
V 1.0-b1
-indexing mechanism added; based on crc64 as link fingerprint
-indexing covered with unit tests
-articles already present on disk are overwritten now
-video state (result) added to linkwise log
-per-page duplicates handled
-current page links internal storage changed (cpu efficiency)
-order removed from metadata (as meaningles with regard to multiple runs)