Skip to content

Commit

Permalink
setup: prepare version 1.5.0 (#317)
Browse files Browse the repository at this point in the history
* prepare version 1.5.0

* complete changelog
  • Loading branch information
adbar authored Mar 30, 2023
1 parent 2014082 commit 2639b24
Show file tree
Hide file tree
Showing 4 changed files with 26 additions and 8 deletions.
18 changes: 18 additions & 0 deletions HISTORY.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,24 @@
## History / Changelog


### 1.5.0


Extraction:
- fixes for metadata extraction with @felipehertzer (#295, #296), @andremacola (#282, #310), and @edkrueger (#303)
- pagetype and image urls added to metadata by @andremacola (#282, #310)
- add as_dict method to Document class with @edkrueger in #306
- XML output fix with @knit-bee in #315
- various smaller fixes: lists (#309), XPaths, metadata hardening

Navigation:
- transfer URL management to courlan.UrlStore (#232, #312)
- fixes for spider module

Maintenance:
- simplify code and extend tests
- underlying packages htmldate and courlan, update setup and docs


### 1.4.1

Expand Down
4 changes: 2 additions & 2 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ def get_long_description():
"brotli",
"cchardet >= 2.1.7; python_version < '3.11'", # build issue
"faust-cchardet >= 2.1.18; python_version >= '3.11'", # fix for build
"htmldate[speed] >= 1.4.1",
"htmldate[speed] >= 1.4.2",
"py3langid >= 0.2.2",
"pycurl >= 7.45.2",
],
Expand Down Expand Up @@ -110,7 +110,7 @@ def get_long_description():
"charset_normalizer >= 3.0.1; python_version < '3.7'",
"charset_normalizer >= 3.1.0; python_version >= '3.7'",
"courlan >= 0.9.0",
"htmldate >= 1.4.1",
"htmldate >= 1.4.2",
"justext >= 3.0.0",
"lxml >= 4.9.2",
"urllib3 >= 1.26, < 2",
Expand Down
10 changes: 5 additions & 5 deletions tests/eval-requirements.txt
Original file line number Diff line number Diff line change
@@ -1,19 +1,19 @@
trafilatura==1.4.0
trafilatura==1.5.0

# alternatives
beautifulsoup4==4.11.1
beautifulsoup4==4.12.1
boilerpy3==1.0.6
#dragnet==2.0.4 # unmaintained!
goose3==3.1.12
goose3==3.1.13
html2text==2020.1.16
html-text==0.5.2
inscriptis==2.3.1
inscriptis==2.3.2
justext==3.0.0
newspaper3k==0.2.8
news-please==1.5.22
readabilipy==0.2.0
readability-lxml==0.8.1
resiliparse==0.13.7
resiliparse==0.14.3

# additional data
#jparser==0.0.20
Expand Down
2 changes: 1 addition & 1 deletion trafilatura/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
__author__ = 'Adrien Barbaresi and contributors'
__license__ = 'GNU GPL v3+'
__copyright__ = 'Copyright 2019-2023, Adrien Barbaresi'
__version__ = '1.4.1'
__version__ = '1.5.0'


import logging
Expand Down

0 comments on commit 2639b24

Please sign in to comment.