Releases: amaiya/ktrain
Releases · amaiya/ktrain
v0.31.9
v0.31.8
v0.31.7
v0.31.6
v0.31.5
v0.31.4
0.31.4 (2022-08-01)
new:
- N/A
changed
TextPredictor.explain
andImagePredictor.explain
now use a different fork ofeli5
:pip install https://github.com/amaiya/eli5-tf/archive/refs/heads/master.zip
fixed:
- Fixed
loss_fn_from_model
function to work withDISABLE_V2_BEHAVIOR
properly TextPredictor.explain
andImagePredictor.explain
now work withtensorflow>=2.9
andscipy>=1.9
(due to neweli5-tf
fork -- see above)
v0.31.3
v0.31.2
0.31.2 (2022-05-20)
new:
- N/A
changed
- added
truncate_to
argument (default:5000) andminchars
argument (default:3) argument toKeywordExtractor.extract_keywords
method. - added
score_by
argument toKeywordExtractor.extract_keywords
. Default isfreqpos
, which means keywords are now ranked by a combination of frequency and position in document.
fixed:
- N/A
v0.31.1
v0.31.0
0.31.0 (2022-05-07)
new:
- The
text.ner.models.sequence_tagger
now supports word embeddings from non-BERT transformer models (e.g.,roberta-base
,codebert
). Thanks to @Niekvdplas. - Custom tokenization can now be used in sequence-tagging even when using transformer word embeddings. See
custom_tokenizer
argument toNERPredictor.predict
.
changed
- [breaking change] In the
text.ner.models.sequence_tagger
function, thebilstm-bert
model is now calledbilstm-transformer
and thebert_model
parameter has been renamed totransformer_model
. - [breaking change] The
syntok
package is now used as the default tokenizer forNERPredictor
(sequence-tagging prediction). To use the tokenization scheme from older versions of ktrain, you can import there
andstring
packages and supply this function to thecustom_tokenizer
argument:lambda s: re.compile(f"([{string.punctuation}“”¨«»®´·º½¾¿¡§£₤‘’])").sub(r" \1 ", s).split()
. - Code base was reformatted using black and isort
- ktrain now supports TIKA for text extraction in the
text.textractor.TextExtractor
package with theuse_tika=True
argument as default. To use the old-style text extraction based on thetextract
package, you can supplyuse_tika=False
toTextExtractor
. - removed warning about sentence pair classification to avoid confusion
fixed:
- N/A