This intermediate release serves as a preliminary version of the upcoming v2.0.0. Consider it an alpha release, where interface changes are still possible.
Added
- General
- Python requirements raised to Python 3.8 since Python 3.7 has reached end of life on 2023-06-27.
- Dropped torchtext as an integration dependency. For individual use cases it can of course still be used.
- Added environment variables
SMALL_TEXT_PROGRESS_BARS
andSMALL_TEXT_OFFLINE
to control the default behavior for progress bars and model downloading.
- PoolBasedActiveLearner:
initialize_data()
has been replaced byinitialize()
which can now also be used to provide an initial model in cold start scenarios. (#10)
- Classification:
- All PyTorch-classifiers (KimCNN, TransformerBasedClassification, SetFitClassification) now support
torch.compile()
which can be enabled on demand. (Requires PyTorch >= 2.0.0). - All PyTorch-classifiers (KimCNN, TransformerBasedClassification, SetFitClassification) now support Automatic Mixed Precision.
SetFitClassification.__init__()
now has a verbosity parameter (similar toTransformerBasedClassification
) through which you can control the progress bar output ofSetFitClassification.fit()
.- TransformerBasedClassification:
- Removed unnecessary
token_type_ids
keyword argument in model call. - Additional keyword args for config, tokenizer, and model can now be configured.
- Removed unnecessary
- All PyTorch-classifiers (KimCNN, TransformerBasedClassification, SetFitClassification) now support
- Embeddings:
- Prevented unnecessary gradient computations for some embedding types and unified code structure.
- Pytorch:
- Added an
inference_mode()
context manager that appliestorch.inference_mode
ortorch.no_grad
for older Pytorch versions.
- Added an
- Query Strategies:
- New strategies: DiscriminativeRepresentationLearning, LabelCardinalityInconsistency, ClassBalancer, and ProbCover.
- Query strategies now have a tie-breaking mechanism to randomly permutate when there is a tie in scores.
- Added
ScoringMixin
to enable a reusable scoring mechanism for query strategies. - LightweightCoreset can now process input in batches. (#23)
- Vector Index Functionality:
- A new vector index API provides implementations over a unified interface to use different implementations for k-nearest neighbor search.
- Existing strategies that used a hard-coded vector search ([ContrastiveActiveLearning][contrastive_active_learning], [SEALS][seals], [AnchorSubsampling][anchor_subsampling]) have been adapted and can now be used with different vector index implementations.
Fixed
- Fixed a bug where the
clone()
operation wrapped the labels, which then raised an error. This affected the single-label scenario for PytorchTextClassificationDataset and TransformersDataset. (#35) - Fixed a bug where the batching in
greedy_coreset()
andlightweight_coreset()
resulted in incorrect batch sizes. (#50) - Fixed a bug where
lightweight_coreset()
failed when computing the norm of the elementwise mean vector.
Changed
- General
- Moved
split_data()
method fromsmall_text.data.datasets
tosmall_text.data.splits
.
- Moved
- Dependencies
- Raised setfit version to 1.1.0.
- Classification:
- The
initialize()
methods of all PyTorch-classifiers (KimCNN, TransformerBasedClassification, SetFitClassification) are now more unified. (#57) - KimCNNClassifier / TransformerBasedClassification: model selection is now disabled by default. Also, it no longer saves models when disabled, thereby greatly reducing the runtime.
- The
- Utils
init_kmeans_plusplus_safe()
now supports weighted kmeans++ initialization forscikit-learn>=1.3.0
.
Removed
- Deprecated functionality
- Removed
default_tensor_type()
method. - Removed
small_text.utils.labels.get_flattened_unique_labels()
. - Removed
small_text.integrations.pytorch.utils.labels.get_flattened_unique_labels()
. - Classification
- Removed early stopping legacy arguments in
__init__()
for KimCNN and TransformerBasedClassification. (Usefit()
keyword arguments instead.) - Removed model selection legacy argument in
TransformerBasedClassification.__init__()
.
- Removed early stopping legacy arguments in
- Removed
- The explicit installation instruction for conda was removed, but the small-text conda-forge package will remain.