- Added Suggests dependency from tibble (#32).
tknz_sent()
andpreprocess()
now have a different implementation on Windows and UNIX OSs, respectively (since the previous C++ implementation has impredictable behaviour on Windows, see #30). This fix also included minor changes in thetknz_sent()
output, in some corner cases (e.g.tknz_sent("")
now returnscharacter(0)
, wheareas it used to return""
).
perplexity()
gets a new argumentexp
that allows to return the cross-entropy per word, rather than perplexity (its exponential).perplexity.character()
gets a new argumentdetailed
that allows to return, alongside with the total perplexity of the input document, also the cross-entropies and word lengths of individual sentences. Closes #28.
- Minor documentation improvements.
- Removed "Tools for..." at the beginning of package DESCRIPTION, as per CRAN's request.
- Simplified examples in
?kgram_freqs
.
- Updated
R
requirements3.5 -> 4.0
. - Removed
SystemRequirements: C++11
(see this tidyverse blog post)
- Remove dependency from external online sources in vignette.
- The package's test suite has been greatly extended.
- Improved error/warning conditions for wrong arguments.
- Re-enabled compiler diagnostics as per CRAN policy (#19)
verbose
arguments now default toFALSE
.probability()
,perplexity()
andsample_sentences()
are restricted to accept onlylanguage_model
class objects as theirmodel
argument.
as_dictionary(NULL)
now returns an emptydictionary
.
- Fixed bug causing
.preprocess
and.tknz_sent
arguments to be ignored inprocess_sentences()
. - Fixed previously wrong defaults for
max_lines
andbatch_size
arguments inkgram_freqs.connection()
. - Added print method for class
dictionary
. - Fixed bug causing invalid results in
dictionary()
with batch processing and non-trivial size constraints on vocabulary size.
- Maintainer's email updated