Releases: ilius/pyglossary
4.6.0
Changes since 4.5.0
Dependency change
We now require Python 3.9 or a later version.
Bug fixes
-
Fix exception in
scripts/plugin-index.py
: 8a94b8c -
StarDict: Fix writing to
.zip
file produced empty zip, and fix bad test -
dictunformat: fix #367: add option
headword_separator
, default to;
-
AppleDict source: fix #407 missing quotes for title, and refactor duplicate codes
-
DictionaryForMIDs: remove
|
from word when normalizing, fix punctuation regex, use Unix newlines -
StarDict: use Unix newline when reading and writing .ifo file on Windows
-
Fix bug of
glos.addEntryObj(dataEntry)
adding empty file becausetmpDataDir
is not set untilglos.read()
- Set and create
tmpDataDir
onglos.tmpDataDir
access, and add test, #424
- Set and create
-
Fix
scripts/wiki-formats.py
, #428 -
Dictd / Dict.org: fix exception on Windows
Features
-
Support sorting by an ICU locale, see Sorting section of README
-
Add Gtk4 interface
--ui=gtk4
/--gtk4
- still buggy and not as functional as Gtk3 or Tkinter interfaces
-
Add flag
--optimize-memory
, config keyoptimize_memory
- To enable entry compression on
--indirect
- Not enabled by default (it was previously always compressed)
- To enable entry compression on
-
Allow plugin's
reader.open()
to return anIterator
for progress bar- Implement for Tabfile (reading info/metedata)
- Implement for AppleDict Binary (reading
KeyText.data
)
-
Add read and write support for StarDict Textual File (.xml), #348
-
Add support for writing Yomichan dictionary files, #395 by @tomtung
-
StarDict reader: support
.syn.dz
file, #410 -
JMDict: support examples, #383
-
Add read support for JMnedict, #386
-
Add flag
--skip-duplicate-headword
, configskip_duplicate_headword
, #365- Zim reader: remove option
skip_duplicate_words
, #365
- Zim reader: remove option
-
Add flag
--trim-arabic-diacritics
, configtrim_arabic_diacritics
, #366 -
Add read support for IUPAC goldbook (.xml), #355
-
Add write support for DIKT JSON
-
StarDict writer: limit memory usage by using SQLite for
idx
andsyn
data, #409 -
CSV: add newline option, defaulting to Unix-style
-
Aard2 Slob writer: add option
file_size_approx_check_num_entries
-
Add
scripts/diff-glossary
andscripts/view-glossary
Improvements
-
When remove HTML tags, also replace
<div>
with\n
, #394 by @tomtung- Treat
<div>
the same way<p>
is treated.
- Treat
-
Mobi: add
mobi7-forcing
switch tokindlegen
command, #374 by @holyspiritomb -
Octopus MDict: ignore directories with
same_dir_data_files
, #362 -
StarDict reader: handle definitions with mixed types/formats
-
Dictfile: strip whitespaces from word and defi before going through entry filters
-
BGL: strip whitespaces from word and defi before going through entry filters
-
Improvement in
glos.write
: avoid printing exception for invalid encoding -
Remove empty logs in
glos.convert
-
StarDict reader: fix validating
sametypesequence
, and add test -
glos.convert
: Allow an existing empty directory as output path -
TextGlossaryReader
: replacenextPair
method withnextBlock
which returns resource files as third item -
ui_cmd_interactive: allow converting several times before exiting
-
Change title tag for Greek from
<big>
to<b>
-
Update language data set (
langs.json
) -
ui/main.py
: print 1-line error instead of full exception onImportError
-
ui/main.py
: Windows: try Tkinter before Gtk -
ebook_base.py
: avoidshutil.move
on Windows, #368 -
TextGlossaryReader
: fix loading info and some refactoring, #370 36b9cd8 -
Entry
: Allowword
to betuple
inEntry(word=...)
-
glos.iterInfo()
returnIterator
rather thanIterable
-
Zim: change dependency to
libzim>=1.0
, and some comments -
Mobi: work with kindlegen executable in
PATH
directories, #401 -
ui: limit the length of option comments in Format Options dialog
-
ui_gtk: improvement: show (last) critical error on status bar
-
ui_gtk: set intial focus
-
ui_gtk: improvements in About tab
-
ui_tk: revert most
ttk
widgets totk
because the theme doesn't match -
Add SVG icon, #414 by @proletarius101
-
Prevent exception/traceback on Ctrl+C
-
Optimize progress bar
-
Aard2 slob: show info log before and after
slobWriter.finalize()
, #437
Removed features
-
Remove read support for Wiktiomary Dump, #48
-
Remove support for Sdictionary Binary and Source
Octopus MDict MDX: features and improvements
-
Support MDict V3 fomrat by updating
readmdict
, #385 by @xiaoqiangwang -
Fix files created without UUID in header, #387 by @xiaoqiangwang
- MdxBuilder 4.0 RC2 and before creates files without UUID header
-
Decode mdict title & description if they're bytes, #393 by @tomtung
-
readmdict
: Skip zlib decompress exceptions, #384 -
readmdict
: Use__name__
as logger name, and add 2 debug logs, #384 -
readmdict
: improve exception msg for xxhash, #385
XDXF: fixes / imrovements, issue #376
- Support
<categ>
- Support embedded tags in
<iref>
- Fix ignoring
<mrkd>
- Fix extra newlines
- Get rid of warning for
<etm>
- Fix/improve newline and space issues
- Fix and improve tests
- Update url for format description
- Support any tag/string in
<ex>
, #396 - Support reading compressed files directly (
.xdxf.gz
,.xdxf.bz2
,.xdxf.lzma
) - Allow using XSL using
--write-options=xsl=True
- Update XSL
- Other improvements in XDXF to HTML transformation
AppleDict Binary: features, bug fixes, improvements, refactoring
-
Fix css name on
html_full=True
-
Fix using
self._encoding
when should useutf-8
-
Fix internal links, #343
- Remove
x-dictionary:d:
prefix fromhref
- First fix for
x-dictionary:r:
: use title if present - Add
bword://
prefix tohref
(unless it points to http/https) - Read entry IDs on open and fix links with
x-dictionary:r:
- Remove
-
Add plistlib to dependencies
-
Add tests
-
Replace
<entry ...>
with<div>
-
Fix bad exception formatting
-
Fixes from PR #436
-
Extract AppleDict meta-info (langs, title, author), #418 by @soshial
-
Progress Bar on
open()
/ loadingKeyText.data
-
Improve memory usage of loading
KeyText.data
-
Replace
appledict_bin.py
withappledict_bin
directory and more refactoring
Glossary class (glossary.py
)
-
Lots of refactoring in
glossary.py
- Improve the design and readability
- Reduce complexity of methods
- Move some code into new classes that
Glossary
inherits from - Improve error messages
-
Introduce
glossary_v2.py
, and maintain API backward-compatibility forglossary.py
(as far as documented)- See README.md for sample code.
Refactoring
-
Fix style errors using
ruff
based on pyproject.toml configuration -
Remove all usages of pyglossary.plugins.formats_common
-
Use
str.startswith(tuple)
andstr.endswith(tuple)
-
Reduce complexity of
Glossary
methods -
Rename entry filter
strip
totrim_whitespaces
-
Some refactoring in StarDict reader
-
Use f-string equal syntax added in Python 3.8
-
Use
str.removeprefix
andstr.removesuffix
added in Python 3.9 -
langs/writing_system.py
:- Change
iso
field to list - Add new scripts
- Add
getAllWritingSystemsFromText
- More refactoring
- Change
-
Split up
TextGlossaryReader.loadInfo
method -
plugin_manager.py
: make some methods private
Documentation
-
Update plugins' documentation
-
Glossary: add comments about
entryFilters
-
Update
config.rst
-
Update
doc/entry-filters.md
-
Update
README.md
-
Update
doc/sort-key.md
-
Update
doc/pyicu.md
-
Update
plugins/testformat.py
-
Add types for arguments and result of all functions/methods
-
Add types for r/w options in reader/writer classes
-
Fix a few incorrect type annotations
-
README.md
: Add document for adding data entries, #412 -
Update bgl_info.md and move it from
pyglossary/plugins/babylon_bgl/
todoc/babylon/
Testing
-
Add test for DSL -> Tabfile conversion
-
dsl_test.py
: fix method names not starting withtest_
-
StarDict reader: better testing for handling definitions with mixed types
-
StarDict writer: much better testing, coverage of
stardict.py
: from %62 to %83 -
Refactoring and improvements in tests of Glossary, along with new tests
-
Add test for dictunformat -> Tabfile
-
AppleDict (source) tests: validate plist file contents
-
Allow forking and branching
pyglossary-test
repo -
Fix some failing tests on Windows
-
Slob: test
file_size_approx
-
Test Tabfile -> SQL conversion
-
Test StarDict error/warning for sortKeyName with and without locale
-
Print useful messages for unhandled warnings
-
Improve logs
-
Add
showDiff=False
arg tocompareTextFiles
andconvert
Packaging
-
Update and refactor
Dockerfile
andrun-with-docker.sh
Dockerfile
: chan...
4.5.0
Changes since 4.4.1
Bug fixes
-
Fix 2 log messages in
glos._resolveConvertSortParams
-
Fixes and improvements in Dictfile (.df) reader
- Fix exception: disable loading info (Dicfile does not support info)
- TextGlossaryReader: prevent producing duplicate data entries
- This fixes:
error in DataEntry.save: [Errno 2] No such file or directory: ...
becauseentry.save()
moves the temp file to output path - This bug only existed for Dictfile (.df) format.
- This fixes:
- Remove extra colon, #358
- Remove some extra newline
- And add test for Dictfile to/from Tabfile
-
Fix not cleaning up temp directory on return with error from
glos.convert
Features
-
ui_gtk: add a "General Options" button that opens a dialog for:
- Settings for
sort
andsortKey
- Checkbox for SQLite mode
- Check boxes for config params:
save_info_json
,lower
,skip_resources
,rtl
,enable_alts
,cleanup
,remove_html_all
- Settings for
-
Add support for
--sort-key random
to shuffle entries
Performance improvements
-
Performance improvement: remove
gc.collect()
calls inGlossary
and*EntryList
- Not needed since Python 3.8
- Change minimum python requirement to 3.8 in
README.md
-
Do not import all plugin modules (only import two plugins that are used)
- Load json file
plugins-meta/index.json
instead - In debug mode, all plugin modules are still imported and validated
- User plugins are still imported
- Load json file
Other improvements
- Improve detection of languages from glossary name, and add tests
- Update
langs.json
: add new 3-letter codes for 25 languages glos.preventDuplicateWords
andglos.removeHtmlTagsAll
: prevent adding filter twiceglos.cleanup
: reset path list to avoid (non-critical) error if called again- Minor improvements in
Glossary.init()
DataEntry.save
: onFileNotFoundError
show a 1-line error instead oflog.exception
- ui_gtk: create a new
Glossary
object every time Convert button is clicked - Add docstring for
Glossary.init
Unit testing
- Update
tests/glossary_errors_test.py
- Add missing cleanup for some temp file
- add test for LDF to/from Tabfile
Refactoring
-
Plugins: replace import of
formats_common
from currect directory withpyglossary.plugins.formats_common
-
Fix
logging.warn
method is deprecated, usewarning
instead, PR #360 by @BoboTiG -
Fix
DeprecationWarning: invalid escape sequence
, PR #361 by @BoboTiG -
Move some functions from
glossary_utils.py
tocompression.py
-
Move some methods from
Glossary
to new parent classesPluginManager
andGlossaryInfo
-
Some refactoring in
plugin_prop.py
andplugin_manager.py
- Rename
plugin.pluginModule
toplugin.module
- Minimize direct access to
plugin.module
,plugin.readerClass
orplugin.writerClass
- Add some new properties to
PluginProp
- Remove a log from
glossary.py
- Disable validation of plugins unless in debug mode
plugin_prop.py
: fix checking debug level
- Rename
-
sq_entry_list.py
: renamesortColumns
tosqliteSortKey
-
Some refactoring around
setSortKey
betweenGlossary
,EntryList
andSqEntryList
-
Remove
Entry.sqliteSortKeyFrom
and related classmethods -
Some more simplification in
glossary.py
-
Remove
Entry.defaultSortKey
-
Some style fixes
-
iter_utils.py
: remove unusedkey=
argument fromunique_everseen
-
Refactor ui_gtk and update config comments
-
extractInlineHtmlImages
: avoid writing file within sub func
4.4.1
Changes since 4.4.0
Bug fixes
- Automatically create
cacheDir
onGlossary.init()
- Fixes exception in SQLite mode
Features
ui_cmd_interactive
: support settingsortKey
Improvements and documentation
- Wiktionary Dump: remove detect-by-extension
glossary.py
: update docstrings forsortKeyName
sort_keys.py
: adddesc
toNamedSortKey
- Update
doc/sort-key.md
4.4.0
Changes since 4.3.0
Breaking changes
-
Remove partial sorting support (obsolete feature)
- Remove
--sort-cache-size
flag in command line - (For library users) Remove
sortCacheSize
argument toglos.write
andglos.convert
- Remove
-
Re-design sorting and
sortKey
parameters-
Breaking change for library users, and user plugins that need sorting (
sortOnWrite = ALWAYS
) -
Change
glos.convert
- Replace argument
sortKey
(Callable) withsortKeyName
(str
) - Add argument
sortEncoding
(str) defaulting toutf-8
- Replace argument
-
Change
glos.write
- Replace argument
sortKey
(Callable) withnamedSortKey
(sort_keys.NamedSortKey
) - Add argument
sortEncoding
(str
) defaulting toutf-8
- Replace argument
-
Change
glos.sortWords
- Replace argument
key
(Callable) withsortKeyName
(str
) - Add argument
sortEncoding
(str
) defaulting toutf-8
- Replace argument
-
Change API of plugins that use
sortOnWrite = ALWAYS
- Replace
writer.sortKey
andWriter.sqliteSortKey
withsortKeyName
in plugin module. - See the stardict.py for example.
- Replace
Note 1: All
sortKey
andsortEncoding
arguments are optional.Note 2: Values of
sortKeyName
are documented in doc/sort-key.md -
-
Rename 2 files in
doc/
:- Rename
doc/entry_filters.md
todoc/entry-filters.md
- Rename
doc/term_colors.md
todoc/term-colors.md
- Rename
Features
-
--sort-key
and--sort-encoding
command line flags (as part of above re-design)- See README.md and doc/sort-key.md.
-
Now SQLite mode works for all output formats.
Bug fixes
- Fix lack of Progress Bar while writing in indirect or SQLite mode
- Fix misleading message log about SQLite mode
- Fix unclosed files in XDXF and FreeDict plugins
Improvements
- Show a 1-line log instead of
FileNotFoundError
traceback inglos.read
andglos.write
- Close readers in
glos.convert
ifwrite
failed - Fix some type annotations and comments
- (For library users) Change
Glossary.__str__
- (For library users)
glos.setInfo
: convert non-str value to str, and add tests
Unit testing
Add new tests and improve existing tests.
- Coverage of
glossary.py
: %89 - Overall coverage of codebase + plugins: %58
Refactoring and design improvements
- Simplify by passing
glos
object toEntryList()
- Replace
SqList
withSqEntryList
- Change
__iter__
ofSqEntryList
andEntryList
to give entry objects - Simplify
Glossary
by movinggc.collect
toEntryList
andSqEntryList
- Remove unused function
xml_unescape
- Remove unused import from FreeDict and JMDict plugins
- Use
operator.itemgetter
instardict.py
,dict_cc.py
,ebook_kobo.py
,reverse.py
glossary.py
: cleanup, simplify and optimize generators logic- Also remove
index
argument fromentryFilter.run
method and add some comments
- Also remove
- Remove redundant check in
glos.progress
- Remove redundant check in
_getLangByStr
- Remove redundant check in
Glossary.detectOutputFormat
4.3.0
Changes since 4.2.1
Bug fixes
- Tabfile writer: fix replacing
\
with\\
--remove-html
flag: fix bad regex- ui_cmd_interactive: fix a few bugs
- Lowercase word/entry links (
<a href="bword://...
) when--lower
flag is passed TextGlossaryWriter
: do not skip words that start with#
- Fix
StdLogHandler
: was not applying--no-color
- Fix checking for
sys.frozen
New features
-
Add
auto_sqlite
config parameter- to use SQLite mode for StarDict and EPUB-2 (which require sorting) by default
- also allow overriding it with
--no-sqlite
flag
-
Add 3 config parameters allow changing log colors in terminal:
color.cmd.critical
color.cmd.error
color.cmd.warning
-
Add 2 keys to config to enable/disable colors in Unix and Windows separately
color.enable.cmd.unix
: defaulttrue
color.enable.cmd.windows
: defaultfalse
New features for library users
-
Allow
glos.setInfo(key, None)
to delete the info / metadata key -
Add
glos.alts
property as shortcut, and use it internally
Design improvements
Change rawEntry[0]
from bytes
to List[str]
and avoid split/join when converting rawEntry
<-> entry
.
This fixes some very edge cases involving |
in words, but uses more RAM in indirect mode (converting to StarDict), which can be solved with --sqlite
.
Documentation
- Replace
doc/config.md
with doc/config.rst, update comments and other improvements - Generate doc/entry_filters.md
- Update plugins doc
- Update README.md
Unit testing
Coverage of glossary.py
: %75
There are 2501 lines of test code in tests directory.
Tests for Glossary class include:
- Basic functionality
- Error handling
- Sorting and direct / indirect / SQLite modes
- Entry filter config/flags (
lower
,rtl
,remove_html
,remove_html_all
) - Resources / data entries
- Convert: Tabfile <-> Aard2 slob
- Convert: Tabfile <-> CSV
- Convert: Tabfile -> EPUB-2
- Convert: Tabfile -> JSON
- Convert: Tabfile <-> StarDict
Other improvements:
glossary_test.py
: check CRC32 of downloaded test filesglossary_test.py
: use a new temp dir for each test method for isolation.ebook_kobo_test.py
: split into several test methods
Improvements
- Zim: make improvements, #352
- Aard2 slob: add 2 mime types, #352
- ui/main.py: do not allow --remove-html and --remove-html-all together
- Glossary: do not allow
glos.config
to be set twice - Glossary: change some error logs to critical, and more improvements
- Prevent conflicting config flags together, like
--lower --no-lower
- Disable
utf8_check
config parameter by default (not needed since3.0.0
)
Refactoring and cleanup
- Glossary: some refactoring in convert method
- Rename 3 scripts in
scripts/
directory - Remove
DataEntry.fromFile
and improve behavior ofDataEntry.__init__
- Refactoring in ui/
- rename
option.cmdFlag
tooption.customFlag
- Glossary: add
glos.rawEntryCompress
property, and use inentry.py
- Glossary: minor improvement in loadPlugins
- XDXF: remove useless argument in
Reader.open
- remove unused some functions from
text_utils.py
plugin_prop.py
: refactor getExtraOptions- Avoid assigning protected attrs in
text_writer.py
andplugins/tabfile.py
- Fewer protected attr access in
entry_filters.py
- Move
sortKey
andget_prefix
implementations fromebook_base.py
to epub and mobi plugins - Change name of 2 entry filters to match the config param
4.2.1
Changes since version 4.2.0
Minor bug fixes and improvements:
-
text_utils.py
- Minor bug: fix legacy function
urlToPath
usingurllib.parse.unquote
- Minor bug:
replacePostSpaceChar
: remove trailing space from the output str - Cleanup:
- Remove unused function
isControlChar
- Remove unused function
formatByteStr
- Remove argument
exclude
from functionisASCII
- Remove unused function
- Add unit tests
- Minor bug: fix legacy function
-
ui_cmd_interactive.py
: fix a minor bug and some small refactoring -
Command line: Override input glossary info with
--source-lang
and--target-lang
flags -
Add unit tests for CSV -> Tabfile conversion
-
CSV plugin: some refactoring, and rename the module to
csv_plugin.py
-
Update
setup.py
: addpython_requires=">=3.7.0"
, updateextras_require
-
Update README.md
Fearures:
- Command line: Add
--name
flag for changing glossary name Glossary
:convert
: addinfoOverride
optional argument
4.2.0
Changes since 4.1.0
-
Breaking changes:
- Replace
glos.getAuthor()
withglos.author
- This looks for "author" and then "publisher" keys in info/metadata
- Rename option
apply_css
tocss
for mobi and epub2 glos.getInfo
andglos.setInfo
only acceptstr
as key (or a subclass ofstr
)
- Replace
-
Bug fixes:
-
Indirect mode: Fix handling '|' character in words.
- Escape/unescape
|
in words when convertingentry
<->rawEntry
- Escape/unescape
-
Escape/unescape
|
in words when writing/reading text-based file formats -
JSON: Prevent duplicate keys in json output, #344
- Add new method
glos.preventDuplicateWords()
- Add new method
-
-
Features and improvements
-
Add SQLite mode with
--sqlite
flag for converting to StarDict.- Eliminates the need to load all entries into RAM, limiting RAM usage.
- You can add
--sqlite
to you command, even for running GUI.- For example:
python3 main.py --tk --sqlite
- For example:
- See README.md for more details.
-
Add
--source-lang
and--target-lang
flags -
XDXF: support more tags and improvements
-
Add unit tests for
Glossary
class, and some functions intext_utils.py
-
Windows: change cache directory to
%LOCALAPPDATA%
-
Some refactoring and optimization
-
Update, improve and re-format documentations
-
4.1.0
There are a lot of changes since last release, but here is what I could gather and organize!
Please see the commit list for more!
-
Improvements in ui_gtk
-
Improvements in ui_tk
-
Improvements in ui_cmd_interactive
-
Refactoring and improvements in ui-related codebase
-
Fix not loading config with
--ui=none
-
Code style fixes and cleanup
-
Documentation
- Update most documentations.
- Add comments for read/write options.
- Generate documentation for all formats
- Placed in doc/p, linked to in
README.md
- Generating with
scripts/plugin-doc-gen.py
script - Read list of dictionary tools/applicatios from TOML files in plugins-meta/tools
- Placed in doc/p, linked to in
-
Add
Dockerfile
andrun-with-docker.sh
script -
New command-line flags:
--json-read-options
and--json-write-options
- To allow using
;
in option values - Example:
'--json-write-options={"delimiter": ";"}'
- To allow using
--gtk
,--tk
and--cmd
as shortcut for--ui=gtk
etc--rtl
to change direction of definitions, #268, also added toconfig.json
-
Fix non-working
--remove-html
flag -
Changes in
Glossary
class- Rename
glos.getPref
toglos.getConfig
- Change
formatsReadOptions
andformatsWriteOptions
toDict[str, OrderedDict[str, Any]]
- to include default values
- remove
glos.writeTabfile
, replace with a func inpyglossary/text_writer.py
Glossary.init
: avoid showing error if user plugin directory does not exist
- Rename
-
Fixes and improvements code base
- Prevent
dataEntry.save()
from raising exception because of invalid filename or permission - Avoid exception if removing temp file/folder failed
- Avoid
mktemp
and more improvements- use
~/.cache/pyglossary/
directory instead of/tmp/
- use
- Fixes and improvements in
runDictzip
- Raise
RuntimeError
instead ofStopIteration
when iterating over a non-open reader - Avoid exception if no zip command was found, fix #294
- Remove directory after creating .zip, and some refactoring, #294
DataEntry
: replaceinTmp
argument withtmpPath
argumentEntry
: fix html pattern for hyperlinks, #330- Fix incorrect virutal env directory detection
- Refactor
dataDir
detection, #307 #316 - Show warning if failed to create user plugins directory
- fix possible exception in
log.emit
- Add support for Conda in
dataDir
detection, #321 - Fix f-string in
StdLogHandler.emit
- Prevent
-
Fixes and improvements in Windows
-
Changes in Config:
- Rename config key
skipResources
toskip_resources
- Add it to config.json and configDefDict
- Rename config key
utf8Check
toutf8_check
- User should edit ~/.pyglossary/config.json manually
- Rename config key
-
Implement direct compression and uncompression, and some refactoring
- change glos.detectInputFormat to return (filename, format, compression) or None
- remove Glossary.formatsReadFileObj and Glossary.formatsWriteFileObj
- remove
fileObj=
argument fromglos.writeTxt
- use optional 'compressions' list/tuple from Writer or Reader classes for direct compression/uncompression
- refactoring in glossary_utils.py
-
Update
setup.py
-
Show version from 'git describe --always' on
--version
-
FileSize
option (used in many formats):- Switch to metric (powers of 1000) for
K
,M
,G
units - Add
KiB
,MiB
,GiB
for powers of 1024
- Switch to metric (powers of 1000) for
-
Add
extensionCreate
variable (str) to plugins and plugin API- Use it to improve ui_tk
-
Text-based glossary code-base (effecting Tabfile, Kobo Dictfile, LDF)
- Optimize TextGlossaryReader
- Change multi-file text glossary file names from
.N.txt
to.txt.N
(whereN>=1
) - Enable reading pyglossary-writen multi-file text glossary by adding
file_count=-1
to metadata- because the number of files is not known when creating the first txt file
-
Tabfile
- Rename option
writeInfo
toenable_info
- Reader: read resource files from
*.txt_res
directory if exists - Add
*.txt_res
directory to *.zip file
- Rename option
-
Zim Reader:
- Migrate to libzim 1.0
- Add mimetype
image/webp
, fix #329
-
Slob and Tabfile Writer: add
file_size_approx
option to allow writing multi-part output- support values like:
5500k
,100m
,1.2g
- support values like:
-
Add
word_title=False
option to some writers- Slob Writer: add
word_title=False
option - Tabfile Writer: add
word_title=False
option - CSV Writer: add
word_title=False
option - JSON Writer: add
word_title=False
option - Dict.cc Reader: do not add word title
- FreeDict Reader: rename
keywords_header
option toword_title
- Add
glos.wordTitleStr
, used in plugins withword_title
option - Add
definition_has_headwords=True
info key to avoid adding the title next time we read the glossary
- Slob Writer: add
-
Aard2 (slob)
- Writer: add option
separate_alternates=False
, #270 - Writer: fix handling
content_type
option - Writer: use
~/.cache/pyglossary/
instead of/tmp
- Writer: add mp3 to mime types, #289
- Writer: add support for .ini data file, #289
- Writer: support .webp files, #329
- Writer: supoort .tiff and .tif files
- Reader: read glossary name/title and creation time from tags
- Reader: extract all metedata / tags
slob.py
library: Refactoring and cleanup
- Writer: add option
-
StarDict:
-
FreeDict Reader
- Fix two slashes before and after
pron
- Avoid running
unescape_unicode
byencoding="utf-8"
arg toET.htmlfile
- Fix exception if
edition
is missing in header, and few other fixes - Support
<cit type="example">
with<cit type="trans">
inside it - Support
<cit type="trans">
inside nested second-level(nested)<sense>
- Add
"lang"
attribute to html elements - Add option "example_padding"
- Fix rendering
<def>
, refactoring and improvement - Handle
<note>
inside<sense>
- Support
<note>
in<gramGrp>
- Mark external refs with
<a ... class="external">
- Support comment in
<cit>
- Support
<xr>
inside<sense>
- Implement many tags under
<sense>
- Improvements and refactoring
- Fix two slashes before and after
-
XDXF
-
Fix not finding
xdxf.xsl
in installed mode- Effecting XDXF and StarDict formats
-
xdxf.xsl
: generate<font color=...>
instead of<span style=...>
-
StarDict Reader: Add
xdxf_to_html=True
option, #258 -
StarDict Reader: Import
xdxf_transform
lazily- Remove forced dependency to
lxml
, #261
- Remove forced dependency to
-
XDXF plugin: fix glos.setDefaultDefiFormat call
xdxf_transform.py
: remove warnings for , #322
- Merge PR #317
- Parse
sr
,gr
,ex_orig
,ex_transl
tags andaudio
- Remove
None
attribute fromaudio
tag - Use unicode symbols for audio and external link
- Use another speaker symbol for audio
- Add audio controls
- Use plain link without an audio tag
- Parse
-
-
Mobi
-
Changes in
ebook_base.py
(Mobi and EPUB)- Avoid exception if removing tmpDir failed
- Use
style.css
dataEntry, #299
-
DSL Reader:
-
AppleDict Source
- Change path of Dictionary Development Kit, #300
- Open all text files with
encoding="utf-8"
- Some refactporing
- Rename 4 options:
- cleanHTML -> clean_html
- defaultPrefs -> default_prefs
- prefsHTML -> prefs_html
- frontBackMatter -> front_back_matter
-
AppleDict Binary
-
Octopus MDict (MDX)
-
DICT.org plugin:
installToDictd
: skip if target directory does not exist- Make rendering dictd files a bit clear in pure txt
- Fix indention issue and add bword prefix as url
-
Fixes and improvements in Dict.cc (SQLite3) plugin:
-
JMDict
- Support reading compressed file directly
- Show pos before gloss (translations)
- Avoid running
unescape_unicode
-
DigitalNK: work around Python's sqlite bug, #282
-
Changes in
dict_org.py
plugin, By Justin Yang- Use
to replace newline - Replace words with {} around to true web link
- Use
-
CC-CEDICT Reader:
- Fix import error in
conv.py
- Switch from jinja2 to lxml
- Fix not escaping
<
,>
and&
- Note: lxml inserts
 
instead of
- Fix not escaping
- Use
<font>
instead of<span style=...>
- add option to use Traditional Chinese for entry name
- Fix import error in
-
Rename read/write options:
- DSL: rename option onlyFixMarkUp to only_fix_markup
- SQL: ren...
4.0.0
Changes since 3.3.0
-
Require Python 3.7 or 3.8, drop support for Python 3.4, 3.5 and 3.6
-
Fix / rewrite
setup.py
- Fix
python3 setup.py sdist bdist_wheel
, and pypi paackage- Had to move
ui/
directory intopyglossary/
- Had to move
- Switch from
distutils
tosetuptools
- Remove
py2exe
- Fix
-
Add interactive command line user interface
- Automatically selected if input & ouput file arguments are not passed and one of these:
- On Linux and no
$DISPLAY
is not set - On Mac and no
tkinter
module is found --ui=cmd
flag is passed
- On Linux and no
- Automatically selected if input & ouput file arguments are not passed and one of these:
-
New format support:
- Add read support for FreeDict, #206
- Add read support for Zim (Kiwix)
- Add read and write support for Kobo E-Reader Dictfile (.df)
- Add write support for DICT.org
dictfmt
source file - Add read support for dictunformat output file
- Add write support for JSON
- Add read support for Dict.cc (SQLite3)
- Add read support for JMDict, #239
- Add basic read support for Wiktionary Dump (.xml)
- Add read support for cc-kedict
- Add read support for DigitalNK (SQLite3)
- Add read support for Wordset.org JSON directory
-
Remove Omnidic write support (Unmaintained J2ME dictionary)
-
Remove Octopus MDict Source plugin
-
Remove Babylon Source plugin
-
BGL Weader: improvements
-
DictionaryForMIDs Writer: fix non-working code
-
Gettext Source (po) Writer: fix info header
-
MOBI E-Book Writer: fix sort order, fix and test kindlegen codes, add
kindlegen_path
option, #112 -
EPUB-2 E-Book Writer: fix sort order
-
XDXF Reader: rewrite with
etree.iterparse
to avoid using too much RAM -
Lingoes Source (LDF) Reader: fix ignoring info/metadata header
-
dict_org.py: rewrite broken plugin (Reader and Writer)
-
DSL Reader: fix loosing metadata/info
-
Aard 2 (slob) Reader:
- Fix adding css/js files as normal entries
- Add
bword://
prefix to entry links - Fix duplicate entries issue by keeping a set of blob IDs, #224
- Detect and pass defiFormat
-
Aard 2 (slob) Writer:
- Fix content_type detection
- Remove
bword://
prefix from entry links - Add resource files / data entries, #243
- Fix replacing image paths
- Show log events from
slob.py
in debug mode - Change default
compression
tozlib
- Allow passing empty
compression
-
Octopus MDict Reader:
- Read MDX file twice to load links
- Count data entries as part of
len(reader)
for progressbar
-
StarDict Writer:
- Copy "copyright" and "publisher" values to "description"
- Add source and target language codes to the end of bookname
- Add write-option
stardict_client: bool
SetTrue
to make glossary more compatible with StarDict 3.x - Fix broken result when
sametypesequence
option is given and a definitions contains|
- Allow
sametypesequence=x
for xdxf - Add
merge_syns
option - Allow
sametypesequence=None
option
-
XDXF Reader:
- Fix/improve xdxf to html transformation
-
Kobo Writer:
- Fix get_prefix algorithm and sorting order, with tests, #219
- Replace
<img src=...
tags with[Image: name.bmp]
, #219- and show a warning about data entries
- Additional keywords as alternatives, #232
- Fix support for alternates: duplicate entries based on word prefix, #238
- Show headword in title of alternate entries, #238, #245
- Strip full html definition, #246
-
CSV:
- Add
delimiter
option to Reader and Writer - Read and write info
- Writer: accept bool option
add_defi_format=True
(default False)
- Add
-
AppleDict Writer:
- AppleDict Writer: replace fix_sound_link() code with a single line
- AppleDict Writer should not call glos.setDefaultDefiFormat
-
MDX Reader:
- Replace
entry://
withbword://
in MDX Reader instead of AppleDict Writer - Fix internal
href="x:"
andhref="d:"
links - Fix
file://
in images path, fix #243
- Replace
-
User Interface improvements and fixes:
- ui_gtk: add About tab and more improvements
- ui_tk: replace About dialog with About tab and more improvements
- ui_cmd: improvements in progressbar
- ui_cmd: allow "=" in value of read/write options
-
Add a list of 208 languages and ~40 writing systems
- Detect
sourceLang
andtargetLang
from glossary name/title - Auto-select between
<b>
and<big>
tags depending on writing system- Using
glos.titleElement
method, used in FreeDict, JMDict and Dict.cc writers
- Using
glos.sourceLang
andglos.targetLang
properties (with setters) asLang
objectsglos.sourceLangName
andglos.targetLangName
properties (with setters) asstr
- Used in several plugins
- Detect
-
Break compatibilty of plugins
- Drop support for read and write functions (outside a class)
- Now we only support Reader class and Writer class
- Reader class must have these methods
__init__(self, glos)
open(self, filename)
- Here glossary info must be read from file and set with
glos.setInfo
- Here glossary info must be read from file and set with
__len__(self) -> int
- Should return the number or entries, or zero if it's too costly
__iter__(self) -> "Iterator[BaseEntry]"
- Can be a generator
close(self)
- Writer class must have these methods
-
__init__(self, glos)
-
open(self, filename)
- Here glossary info must be read from
glos.getInfo
orglos.iterInfo
and written to file
- Here glossary info must be read from
-
write(self) -> "Generator[None, BaseEntry, None]"
-
Entries must be fetched with
entry = yield
in awhile True
loop:while True: entry = yield if entry is None: break # process and write entry into file(s)
-
-
finish(self)
-
- Read options and write options must be set to their default values as class attributes
- See
pyglossary/plugins/csv_pyg.py
plugin for example
- See
sortKey
must be an intance method of Writer, instead of a function outside any class- Only for plugins that need sorting before write
-
Refactor and cleanup
Glossary
class- Removed or replaced most of class/static attributes of
Glossary
- To see the diff, run
git diff 3.3.0..master -- pyglossary/glossary.py
- To see the diff, run
- Removed
glos.addEntry
method- If you use it in your program, replace with
glos.addEntryObj(glos.newEntry(word, defi, defiFormat))
- If you use it in your program, replace with
- Removed instance methods:
getMostUsedDefiFormats
iterEntryBuckets
zipOutDir
andarchiveOutDir
- Moved to
pyglossary/glossary_utils.py
archiveOutDir
renamed tocompressOutDir
- Moved to
writeDict
iterSqlLines
-> moved topyglossary/plugins/sql.py
reverse
,takeOutputWords
,searchWordInDef
-> moved topyglossary/reverse.py
- Values of
Glossary.plugins
is changed toplugin_prop.PluginProp
instances - Change
glos.writeTxt
arguments- Replace
sep1
andsep2
withentryFmt
- Replace
rplList
withdefiEscapeFunc
,wordEscapeFunc
andtail
- Remove
iterEntries
,entryFilterFunc
- Method returns
Generator[None, BaseEntry, None]
instead ofbool
- See for usage example:
pyglossary/glossary.py
->def writeTabfile
pyglossary/plugins/dict_org_source.py
pyglossary/plugins/json_plugin.py
pyglossary/plugins/lingoes_ldf.py
pyglossary/plugins/sdict_source.py
- Replace
- Removed or replaced most of class/static attributes of
-
Refactor, cleanup and fixes in
Entry
andDataEntry
classes- Replace
entry.getWord()
withentry.word
- Replace
entry.getWords()
withentry.l_word
- Replace
entry.getDefi()
withentry.defi
- Remove
entry.getDefis()
- Drop handling alternate definitions in
Entry
objects
- Drop handling alternate definitions in
- Replace
entry.getDefiFormat()
withentry.defiFormat
- Add
entry.b_word
andentry.b_defi
shortcuts that givebytes
(UTF-8) - Replace
dataEntry.getData()
withdataEntry.data
- Add
__slots__
to Entry and DataEntry classes - Fix
DataEntry
in indirect mode- Mistaken for Entry with defi=DATA, and file content discarded
- Save resource files in user's cache directory when loading input glossary into memory
- Move file to output glossary on
dataEntry.save(...)
- Move file to output glossary on
- Fix
Entry.getRawEntrySortKey
not being alternates-aware, broke StarDict Writer DataEntry
: save: useshutil.copy
if has_tmpPath
, and set_tmpPath
- Replace
-
New features of
Entry
entry.stripFullHtml()
, remove<html... <head>...</head>...<body>
- Used in Kobo and Kobo Dictfile writers
- Add tests
-
Fix
glos.writeTabfile
:- Remove
\r
from definitions and info values - Fix not escaping word
- Remove
-
Fix/improve html detection in definitions
-
Switch to lazy imports of non-standard modules in plugins
-
Optimize RAM usage of indirect conversion
- To write StarDict, EPUB and DictionaryForMIDs glossaries, we need to load all entries into RAM to sort them
-
Other new features of Glossary class
glos.getAuthor()
to get "author", or "publisher" (as fallback)glos.removeHtmlTagsAll()
method, can be called by plugins' writerglos.collectDefiFormat(maxCount)
extract defiFormat counts- by reading first
maxCount
entries. (then iterator will be reset) - Used in StarDict Writer
- by reading first
- Show memory usage in trace mode
-
Bug fixes and improvements in code base
-
Apply entry filter when iterating over reader, fix #251
- Fixes wrong sort order for some glossaries (converting to StarDict or other formats that need sort)
-
Fixes and improvements in
TextGlossaryReader
class- Fix ignoring glossary defaultDefiFormat
-
Fix evaluating
None
value in read/write options
-
-
Support reading multi-file Tabfile or other text formats
- Example:
file.txt
,file.txt.1
,file.txt.2
- Need to add
file_count
info key, for example:##file_count 3
- Example:
-
Fixes in Tabfile Writer
- Fix not escaping ""
-
Add/update docume...
3.3.0
Changes since 3.2.1
-
Require Python 3.6 or higher (mainly becuase of f-strings)
-
New format support
-
Glossary: detect and load Writer class from plugins
- Remove write function from plugin if it has Writer class
-
Glossary: call
gc.collect()
on indirect mode after reading/writing each 128 entries- To free up memory and avoid running out of RAM for large glossaries
-
Glossary: remove empty and duplicate alternate words when converting, using Entry Filter, #188
-
Add command line options to remove html tags:
--remove-html=tag1,tag2,tag3
--remove-html-all
-
Re-design format-specific options
- Allow specifying format-specific read/write options in ui_gtk and ui_tk
- Add much better and cleaner codebase for handling options in
option.py
- Implement validation of options in command line, GTK and Tkinter interfaces
- Add tests for
option.py
inoption_test.py
- Avoid using None as default value of option argument
- Check default value of plugin options and show warning if invalid
- Add IntOption class, use it in Omnidic plugin
- Add DictOption, use it for appledict defaultPrefs
- And
optionsProp
to all plugins- Containing value type, allowed values and optional comment
- Remove
readOptions
andwriteOptions
from all plugins- Detect options from functions' signature and
optionsProp
variables - Avoid using
**kwargs
in pluginread
,Reader.open
orwrite
functions
- Detect options from functions' signature and
-
Add
depends
variable to plugins- To let GUI install plugin dependencies
- Type:
dict
, keys are module names, values are pip's package name - Add
Glossary.formatsDepends
-
Minor fixes and improvements in Glossary class:
- Return with error if output file path is an existing directory
- Fix empty zip when creating
DIRECTORY.zip
as output glossary - Do not uncompress gz/bz2/zip input files automatically
- Ignore "read" function of plugin if "Reader" class is present
- Cleaning: Add Glossary.init() classmethod to initialize the class, can be called multiple times
- Some refactoring and cleaning, and add some logs
- Small optimization:
index % 100
->index & 0x7f
- Allow having progressbar by position in file and size of file
- use for
appledict_bin.py
- use for
- Do not write resource file names as entries to text file in
Glossary.writeTxt
-
StarDict plugin
- Always open
.ifo
file as UTF-8 - Fix output filenames without .ifo extention creating hidden files, #187
- Always open
-
Babylon BGL plugin
- Fix bytes metedata values
b'...'
and some refactoring in readType3 - Skip empty info values
- Fix non-string info values written as empty
- Prefix 3 info keys with
bgl_
- Fix NameError in debug mode in
stripHtmlTags
- Some refactoring
- Fix bytes metedata values
-
Octopus MDict plugin
- Fix Python 3 bug in
readmdict.py
: https://bitbucket.org/xwang/mdict-analysis/commits/8f66c30 - Support multiple mdd files (#203)
- Fix Python 3 bug in
-
Change yes/no options in AppleDict and ABBYY Lingvo DSL plugins to boolean
- To keep compatibility of command line flags, fix yes/no manually in ui_cmd.py
-
AppleDict plugin:
-
Fix misspelled "extension" (as "extention") in plugins
-
Detect entries with
span
tag as html, #193 -
Refactoring in ui_gtk and ui_tk
-
Fix some deprecated API in ui_gtk
-
Fix minor bugs and improvements in ui_tk and ui_gtk
-
Update setup.py to adapt packaging with wheel, #189
-
Add type hints to codebase and plugins
-
Refactoring and style changes:
- rename
pyglossary.pyw
to main.py, add a smallpyglossary.pyw
for compatibility - Switch to f-strings in glossary.py and freedict.py
- main.py: replace single quotes with double quotes
- PEP-8 style fixes
- rename