Release v2.0.0.dev1 · webis-de/small-text

This intermediate release serves as a preliminary version of the upcoming v2.0.0. Consider it an alpha release, where interface changes are still possible.

Added

General
- Python requirements raised to Python 3.8 since Python 3.7 has reached end of life on 2023-06-27.
- Dropped torchtext as an integration dependency. For individual use cases it can of course still be used.
- Added environment variables SMALL_TEXT_PROGRESS_BARS and SMALL_TEXT_OFFLINE to control the default behavior for progress bars and model downloading.
PoolBasedActiveLearner:
- initialize_data() has been replaced by initialize() which can now also be used to provide an initial model in cold start scenarios. (#10)
Classification:
- All PyTorch-classifiers (KimCNN, TransformerBasedClassification, SetFitClassification) now support torch.compile() which can be enabled on demand. (Requires PyTorch >= 2.0.0).
- All PyTorch-classifiers (KimCNN, TransformerBasedClassification, SetFitClassification) now support Automatic Mixed Precision.
- SetFitClassification.__init__() now has a verbosity parameter (similar to TransformerBasedClassification) through which you can control the progress bar output of SetFitClassification.fit().
- TransformerBasedClassification:
  - Removed unnecessary token_type_ids keyword argument in model call.
  - Additional keyword args for config, tokenizer, and model can now be configured.
Embeddings:
- Prevented unnecessary gradient computations for some embedding types and unified code structure.
Pytorch:
- Added an inference_mode() context manager that applies torch.inference_mode or torch.no_grad for older Pytorch versions.
Query Strategies:
- New strategies: DiscriminativeRepresentationLearning, LabelCardinalityInconsistency, ClassBalancer, and ProbCover.
- Query strategies now have a tie-breaking mechanism to randomly permutate when there is a tie in scores.
- Added ScoringMixin to enable a reusable scoring mechanism for query strategies.
- LightweightCoreset can now process input in batches. (#23)
Vector Index Functionality:
- A new vector index API provides implementations over a unified interface to use different implementations for k-nearest neighbor search.
- Existing strategies that used a hard-coded vector search ([ContrastiveActiveLearning][contrastive_active_learning], [SEALS][seals], [AnchorSubsampling][anchor_subsampling]) have been adapted and can now be used with different vector index implementations.

Fixed

Fixed a bug where the clone() operation wrapped the labels, which then raised an error. This affected the single-label scenario for PytorchTextClassificationDataset and TransformersDataset. (#35)
Fixed a bug where the batching in greedy_coreset() and lightweight_coreset() resulted in incorrect batch sizes. (#50)
Fixed a bug where lightweight_coreset() failed when computing the norm of the elementwise mean vector.

Changed

General
- Moved split_data() method from small_text.data.datasets to small_text.data.splits.
Dependencies
- Raised setfit version to 1.1.0.
Classification:
- The initialize() methods of all PyTorch-classifiers (KimCNN, TransformerBasedClassification, SetFitClassification) are now more unified. (#57)
- KimCNNClassifier / TransformerBasedClassification: model selection is now disabled by default. Also, it no longer saves models when disabled, thereby greatly reducing the runtime.
Utils
- init_kmeans_plusplus_safe() now supports weighted kmeans++ initialization for scikit-learn>=1.3.0.

Removed

Deprecated functionality
- Removed default_tensor_type() method.
- Removed small_text.utils.labels.get_flattened_unique_labels().
- Removed small_text.integrations.pytorch.utils.labels.get_flattened_unique_labels().
- Classification
  - Removed early stopping legacy arguments in __init__() for KimCNN and TransformerBasedClassification. (Use fit() keyword arguments instead.)
  - Removed model selection legacy argument in TransformerBasedClassification.__init__().
The explicit installation instruction for conda was removed, but the small-text conda-forge package will remain.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v2.0.0.dev1

Added

Fixed

Changed

Removed