Notes:
- Versioning: The versioning scheme depends on the Khiops version supported (first 3 digits) and
a Khiops Python Library correlative (4th digit).
- Example: 10.2.1.4 is the 5th version that supports khiops 10.2.1.
- Internals: Changes in Internals sections are unlikely to be of interest for data scientists.
- (General) Support for Python 3.13.
- (General)
visualize_report
helper function to open reports with the Khiops Visualization and Khiops Co-Visualization app.
- (General) Initialization failing in Conda-based environments.
- (
core
) Support for system parameters has been moved from theKhiopsLocalRunner
to thecore
API. - (
core
) System parametermax_memory_mb
has been renamed tomemory_limit_mb
. - (
core
) System parameterkhiops_temp_dir
has been renamed totemp_dir
.
- (General) Khiops Python 9 compatibility.
- (
sklearn
)train_test_split_dataset
helper function to ease the splitting in train/test for multi-table datasets. - (
sklearn
) Complete support forcore
API functions parameters in thesklearn
estimators.
- (General) The Conda package only depends on the
conda-forge
andkhiops
channels. - Internals:
- Improve and simplify the integration with the
khiops-core
package via itskhiops_env
script.
- Improve and simplify the integration with the
- (
sklearn
) Sklearn's attributes for supervised estimators.
- (
core
) API functions handling of unknown parameters: they now fail. - Internals:
- Detection of the path to the MPI command: the real path to the executable is now used.
- (
core
) Documentation of thespecific_pairs
parameter for thetrain_predictor
andtrain_recoder
core API functions.
- (
core
) The following parameters of thetrain_predictor
core API functions:max_groups
max_intervals
min_group_frequency
min_interval_frequency
results_prefix
snb_predictor
univariate_predictor_number
discretization_method
for supervised learninggrouping_method
for supervised learning
- Internals:
- The OpenMPI backend now executes with the
--allow-run-as-root
option.
- The OpenMPI backend now executes with the
- (
sklearn
) Support for sparse arrays in sklearn estimators.
- Internals:
- MPI backend from MPICH to OpenMPI for native + Pip-based Linux installations.
core
- Metric name search in estimator analysis report.
- (
sklearn
) 1:1 relations to multi-table datasets. - (
sklearn
) Estimators'fit
methods now accept single-column pandas dataframes asy
target.
- (
core
) Improve user error and warning messaging.
- (General) Reinstate Rocky Linux 8 support.
Note: This release marks the open sourcing of Khiops:
- The
khiops
package replaces the oldpykhiops
package. We recommend to uninstallpykhiops
before installingkhiops
. More information at the Khiops site. - The
khiops
package uses a new four digit versioning convention. - The
khiops
conda package is available for many environments. See the Khiops site for more information.
- General:
khiops-python
is now available with condakhiops
package. This package bundleskhiops-python
and the Khiops binaries so no system-wide Khiops installation is needed. More information at the Khiops website.- Support for python 3.12.
sklearn
- Estimator classes can now be trained from Numpy arrays in single-table mode.
core
stdout_file_path
andstderr_file_path
parameters forkhiops.core.api
functions. These parameters allow to save the stdout/stderr output of the Khiops execution.
sklearn
- Estimator classes now comply with scikit-learn standards.
core
- The JSON initialization of
AnalysisResults
,CoclusteringResults
and its component classes is coherent with the empty initialization.
- The JSON initialization of
core
- Wrong default discretization and grouping methods in
train_predictor
andtrain_recoder
. KhiopsDockerRunner
checking the existenceshared_dir
on remote paths.
- Wrong default discretization and grouping methods in
sklearn
:- Direct support for coclustering simplification, via the
KhiopsCoclustering.simplify
method.
- Direct support for coclustering simplification, via the
- Internals:
- The
TaskRegistry.set_task_end_version
method for specifying the ending Khiops version for a task.
- The
sklearn
:- Verbose mode support is now complete for coclustering.
- Internals:
- User-provided scenario prologue is now taken into account into the tasks.
- General:
- License has been updated to BSD-3 Clear.
sklearn
:auto_sort
replacesinternal_sort
to control input table sorting in estimators.- The multi-table documentation has been streamlined to be more precise and clearer.
sklearn
:- The
max_part_numbers
parameter ofKhiopsCoclustering
fit
method. TheKhiopsClustering
simplify
method now contains the simplification feature. - The
internal_sort
estimator parameter. Theauto_sort
estimator parameter replaces it.
- The
core
:- The
build_multi_table_dictionary
API function. Thebuild_multi_table_dictionary_domain
helper function provides the same functionality.
- The
- Internals:
- The
build_multi_table_dictionary
task. This task will not be supported after Khiops 11.
- The
sklearn
:- Support for snowflake database schemas.
core
:- Support for Khiops on MacOS.
- core:
- Khiops coclustering is not executed with MPI anymore.
- Bug when the JSON reports had colliding character ranges but no particular colliding character.
- Internals:
- The transformation of the
core.api
function parameters to scenario files has now an additional layer mediated by theKhiopsTask
class. These objects have all the necessary information about a particular Khiops tasks (ex.train_predictor
) to transform its parameters to an scenario file. Furthermore, this allows to export the task signatures to API description languages such as Protocol Buffers. - The
core.filesystem
now exposes its API as a set of functions instead of resource objects. They are still available but the API should be prioritized for its use.
- The transformation of the
- General:
- Support for Python 3.6, pyKhiops 10.1.1 was the last version to support it.
- General:
- Jupyter notebooks tutorials to the documentation site.
pk-status
script to check the pyKhiops installation.
- General:
- Code samples scripts not being installed: They are located in
<pykhiops_install_dir>/samples
.
- Code samples scripts not being installed: They are located in
sklearn
KhiopsCoclustering
raising an exception instead of a warning when no informative coclustering was found.internal_sort
parameter being ignored inKhiopsCoclustering
.
core
detect_format
failing when the Khiops log had extra output.
sklearn
:- Estimators now accept dataframes with numerical column indexes.
KhiopsClassifier
now accepts integer target vectors.classes_
estimator attribute forKhiopsClassifier
(available once fitted).feature_names_out_
estimator attribute forKhiopsEncoder
(available once fitted).export_report_file
andexport_dictionary_file
to export Khiops report and dictionary files once the estimators are fitted.internal_sort
parameter for estimators that may be used to not sort the tables on the internal procedures of pyKhiops (default isTrue
). Disabling it may give speed gains in large datasets.verbose
flag for debugging estimators: It shows internal information and doesn't erase temporary files.
core
:get_khiops_version
API function.- New rule
LocalTimestamp
rule for AutoML feature generation (requires Khiops 10.1). max_total_parts
parameter tosimplify_coclustering
core API function (requires Khiops 10.1).
- Internals:
- Khiops samples directory in Linux now defaults to
/opt/khiops/samples
which is where it is installed by default.
- Khiops samples directory in Linux now defaults to
sklearn
:- Breaking: Estimators return NumPy arrays instead of dataframes in
predict
,predict_proba
,transform
,fit_predict
andfit_transform
methods.
- Breaking: Estimators return NumPy arrays instead of dataframes in
core
:train_recoder
API function does not build trees by default anymore.- When pyKhiops reads a legacy Khiops JSON report/dictionary with Unicode decoding errors it now
only warns and loads it anyway with the
errors="replace"
setting. Before it raised an exception.
- General:
- Simpler multi-table samples in the documentation.
sklearn
:- Datasets based on file paths. From pyKhiops 11 only in-memory datasets will be accepted. File
based treatments can be treated with the
core
API. max_part_number
as instance parameter ofKhiopsCoclustering
. It is now afit
parameter. It will be eliminated in pyKhiops 11.
- Datasets based on file paths. From pyKhiops 11 only in-memory datasets will be accepted. File
based treatments can be treated with the
core
:get_khiops_info
andget_khiops_coclustering
API functions. From Khiops 10.1 there is no need of license key so these methods have no use anymore. They are kept deprecated for backwards compatibility only. It will be eliminated in pyKhiops 11.
- Internals:
legacy_mode
inPyKhiopsRunner
. It its place there is generic versioning scheme to handle features and Khiops scenarios.
sklearn
:- Bug in dataframe-based datasets with numerical key columns
sklearn
:- A new way to specify multi-table inputs for estimators via a
dict
. From now on it is the standard way to specify multi-table datasets and the others are deprecated. See the documentation for more details. - New examples of use of
sklearn
in the scriptsamples_sklearn.py
. Available also in the documentation.
- A new way to specify multi-table inputs for estimators via a
core
:- It now fully supports remote filesystems provided for which the extra dependencies are installed (it is still necessary to install Khiops remote filesystem plugins).
- Other:
- Most methods that accept containers now additionally accept classes implementing their abstract
interface (eg.
collections.abc.Sequence
,collections.abc.Mapping
).
- Most methods that accept containers now additionally accept classes implementing their abstract
interface (eg.
- Internals:
- The default value of
samples_dir
of thePyKhiopsLocalRunner
class can now be set via the environment variableKHIOPS_SAMPLES_DIR
. - New classes
Dataset
andDatasetTable
tosklearn.tables
to handle sklearn table transformations for Khiops.
- The default value of
- General:
- Improved documentation completeness and layout.
sklearn
- Estimators do not depend anymore on local files. This fixes many issues including those related to serialization.
KhiopsRegressor
now warns whenn_trees > 0
.
core
- Functions
deploy_coclustering
anddeploy_model_for_metrics
are moved fromcore.api
tocore.helpers
. The latter module will contain non-basic functionality, whereascore.api
will contain only the official Khiops API.
- Functions
sklearn
:tuple
andlist
multi-table input modes in estimators.key
parameter of estimators.variables
parameter ofKhiopsCoclustering
estimator.
sklearn
:- Breaking
computation_dir
parameter insklearn
estimators. Khiops output files can still be saved with the parameteroutput_dir
.
- Breaking
- Other:
- Breaking Support for Python 2: 10.0.4 was the last version to support it.
sklearn
:- Data-race when using many estimators in parallel.
- Bug in
KhiopsCoclustering
when the trained coclustering did not cluster the key variable. - Bug in
KhiopsEncoder
that happened because a bad handling of OS-dependent line separators.
- Other:
- Bug with
KHIOPS_HOME
environment variable not properly being taken into account when initializing the runner.
- Bug with
- Class
PyKhiopsDockerRunner
in packagepykhiops.extras
allowing to run pyKhiops with a remote Khiops Docker image as backend.
- Bug with
PyKhiopsRunner
'sscenario_prologue
failing to execute. - Bug in
pykhiops.sklearn
estimators not taking into account the target variable asUsed
. - Bug in CentOS not taking into account environment variables and failing to execute.
extract_clusters
core API function to extract a dimension's clusters into a file.deploy_predictor_for_metrics
core API function to evaluate performance metrics with third-party.detect_data_table_format
core API function to obtain (heuristically) theheader_line
andfield_separator
from a data file (requires Khiops >= 10.0.1) libraries.train_predictor
andevaluate_predictor
now accept amain_target_value
parameter- Various ease-of-access methods:
AnalysisResults
:get_reports
EvaluationReport
:get_snb_performance
,get_snb_lift_curve
andget_snb_rec_curve
PredictorPerformance
:get_metric
andget_metric_names
- New examples to
samples.py
.
Internals:
- Support for remote filesystems
s3
andgcs
insklearn
module (installation with extra dependencies required). - New
scenario
module containing classes to write templatized-scenarios and that also handle character encoding (see Fixed below). - Support for the new
subTool
key of Khiops JSON files. - Command-line options for
samples.py
to specify which samples to run.
str
parameters of core API functions may now also bebytes
andbytearray
- Changed all
core
module docstrings to the "NumPy" style. write
andwriteln
methods of classes in thedictionary
,analysis_results
andcoclustering_results
now require aPyKhiopsOutputFile
object as argument.- Query methods such as
get_dictionary
fromDictionaryDomain
now raiseKeyError
instead of returningNone
if the query fails. - Core API functions that use a
field_separator
parameter now accept the string "\t". KhiopsClassifier
andKhiopsRegressor
now warn of incorrect types of target variable.- Internals:
PyKhiopsLocalRunner
now calls directly theMODL
executables instead of the Khiops launch scripts (only for Khiops >= 10.0.1).- Specific pair parameter is not handled anymore with a temporary file.
- Improved temporary file services in
PyKhiopsRunner
.
- Field separator constructor parameter for estimator classes of
sklearn
- Dictionary files created with pyKhiops are now guaranteed to be free of character
encoding errors unless the new JSON field
khiops_encoding
is non-existent or set tocolliding_ansi_utf8
in which case a warning is emitted - Khiops execution problems due to the character encoding of certain parameters
- Khiops error reporting problems due to to character encoding
train_coclustering
now returns the path of the JSON coclustering report (.khcj
)get_dimensions
not working at all inCoclusteringReport
- Some Python 2 incompatibilities in Linux
get_samples_dir
core API function (works only with a local runner).train_predictor
,evaluate_model
,train_recoder
,train_coclustering
anddeploy_coclustering
now have return values (paths of relevant output files).
transfer_database
core API function renamed todeploy_model
.build_transferred_dictionary
core API function renamed tobuild_deployed_dictionary
.- In general the "model deployment" concept replaces that of "database transfer" in all code and in particular in the samples scripts.
- It is not necessary to specify a relative path as
./path
for theresults_dir
argument. - Messages enabled with the
trace
parameter go again tostdout
.
sklearn
sub-module updated for pyKhiops 10.sklearn
samples notebooks.deploy_coclustering
core API function.build_multitable_dictionary
core API function.
- The information messages of
sklearn
are now deactivated by default (they can be reactivated manually).
sklearn
dependency onoverrides
package.
- Small transformation bug in
convert-pk10
.
detect_format
parameters to API methods that read databases. It is enabled by default and Khiops will try to automatically detect the format of input data tables. See the docstrings for the new behavior ofheader_line
andfield_separator
.specific_pairs
option replacingonly_pairs_with
. It allows the methodstrain_predictor
andtrain_recoder
more options to generate pairs of variables (only_pairs_with
kept in legacy mode).PykhiopsRunner
class to extend pyKhiops to different backends.PyKhiopsLocalRunner
implements the current functionality and is the default runner.
dictionary_domain
parameter removed from all relevant API methods. Now methods accepting a dictionary file path as argument also accept aDictionaryDomain
object.- Renamed various parameters. Until the next major release pykhiops will warn when these old parameters are used.
- All optional parameters of API methods are now proper named parameters (no kwargs).
- All errors are now handled with custom
PyKhiops*
exceptions. - Updated default values to those of Khiops 10. Notably
max_trees == 10
by default. tools/convert-pk10.py
script no longer exists. Now when installing pykhiops aconvert-pk10
will be automatically be installed the user's local python scripts directory. Optionally, the functionpykhiops.tools.convert_pk10
provides the same functionality.samples.py
script is now in snake case and improved.- Simplified
samples.ipynb
. - Messages in
trace
mode now go tostderr
.
- Naive Bayes classifier option from
train_predictor
.
simplify_coclustering
:results_prefix
now works.subprocess.Popen
returning 1 in Linux even when the Khiops process ended correctly. This made the legacy mode detection fail.- API functions failing when
stderr
was not empty even though the Khiops process ended correctly. Now it just emits a warning.
- Compatibility for Khiops 10
- Legacy support for Khiops 9
- Partial compatibility for Khiops 10 JSON reports (no tree report)
- Script
tools/convert_pk10.py
to migrate from pyKhiops 9 to 10. See Changed below - Extraction of dictionary data paths: See
core.DictionaryDomain.extract_data_paths
- Robust JSON loading: tries
utf-8
encoding, then the system's default. - Licence file
- Now all variable/method names follow the PEP8 convention: All methods are now in snake_case
- In
core.train_predictor
:fill_test_database_settings
andmap
(kept in legacy mode).
- Sources (first commit)