Skip to content

Releases: alexzwanenburg/familiar

Version 1.5.0 (Whole Whale)

24 Sep 06:29
7fe6fb0
Compare
Choose a tag to compare

Major changes

  • The source code now uses the tidyverse code style.

  • Power transformation is now handled by the power.transform package. This package replaces the internal routines that were previously used.

Future deprecation

  • Functionality reliant on the mboost, VGAM or qvalue packages will be deprecated in version 2.0.0.

  • count outcome types will be deprecated by merging into continuous outcome type, starting version 2.0.0.

Bug fixes

  • Prevent errors due to parsing columns called else, for, function, if, in, or while.

  • Presence of features with integer values no longer lead to rare errors during evaluation.

  • The main panel for composite plots (e.g. calibration plots, Kaplan-Meier curves) is no longer of fixed width when title or subtitles are present.

  • Thresholds for clustering with correlation-based metrics are now computed correctly.

Version 1.4.8 (Valorous Viper)

26 May 18:56
3feebd8
Compare
Choose a tag to compare

Bug fixes

  • Adapted tests to work when suggested packages are missing (addresses CRAN noSuggests check).

  • Fixed an issue that prevented hyperparameter optimisation of xgboost models for survival tasks.

Version 1.4.7 (Uncertain Unicorn)

15 May 05:05
Compare
Choose a tag to compare

Bug fixes

  • Computing distance matrices no longer produces an error due to applying rownames to data.table. The exact cause is unsure, but was either introduced by data.table version 1.15.0, R version 4.4.0, or both.

  • Several fixes related to changes introduced in ggplot2 version 3.5 were made:

    • Plot margins are now correctly set in the default familiar plotting theme.
    • Plot elements of composite plots are now correctly set.
  • Fixed an incorrect data.table merge when computing survival predictions from random forests.

Version 1.4.6 (Talented Toad)

24 Jan 12:39
521fcb6
Compare
Choose a tag to compare

Bug fixes

  • Fixes unused arguments appearing in documentation.

Version 1.4.5 (Reminiscing Rat)

23 Jun 15:24
036ed4c
Compare
Choose a tag to compare

Bug fixes

  • Creating data objects (as_data_object) using naive learners now works and no longer throws an error.

Version 1.4.4 (Quixotic Quail)

10 May 13:34
4710761
Compare
Choose a tag to compare

Bug fixes

  • Prevented an error that could occur when computing net benefit for decision curves of models that would predict class probabilities of exactly 1. This was a very rare error, as it would only occur if predicted class probabilities would have at most two distinct values, one of which is 1.0.

  • Prevented an issue that could occur when computing linear calibration fits where the fit can be computed without residual errors. This would prevent the t-statistic and p-value from being correctly computed for binomial, multinomial and survival outcomes.

  • Prevented an issue when computing linear calibration fits when all the expected values are the same. The model will then lack a slope. We now add a slope of 0, with an infinite confidence interval, if this is the case.

Version 1.4.3 (Puzzled Prawn)

25 Apr 08:35
2397e0f
Compare
Choose a tag to compare

Bug fixes

  • Prevented an error due to an overzealous check for hyperparameters being present for training a model.

Version 1.4.2 (Omnicompetent Owl)

01 Mar 08:40
7541117
Compare
Choose a tag to compare

Bug fixes

  • Fixed an error that could occur when creating a lasso model for imputation using just a single feature.

Version 1.4.1 (Nefarious Newt)

16 Dec 13:41
13510e5
Compare
Choose a tag to compare

Minor changes

  • Robust methods for power transformations were added, based on the work of Raymaekers and Rousseeuw (Transforming variables to central normality. Mach Learn. 2021. doi:10.1007/s10994-021-05960-5). These methods are yeo_johnson_robust and box_cox_robust.

  • A robust normalisation method, based on Huber's M-estimators for location and scale, was added: standardisation_robust.

  • Improved efficiency of aggregating and computing point estimates for evaluation steps. It may occur that for each grouping (e.g. samples for pairwise sample similarity), multiple values are available that should be aggregated to a point estimate. Previously we split on all unique combinations of grouping column, and process each split separately. This is a valid approach, but can occur significant overhead when this forms a large number (>100k) splits. We now first determine which data (if any) require computation of a (bias-corrected) point estimate because of grouping. Often, each split would only contain a single instance which forms a point estimate on its own. Extra computation is avoided for these cases.

  • Plots now always show the evaluation time point. This is relevant for, for example, calibration plots, where both the observed and expected (predicted) probabilities are time-dependent, and will change depending on the time point.

  • Improved support for providing a file name for storing a plot. The plotting device is now changed based on the file name, if it has an extension. In case multiple plots would be created, e.g. due to splitting on some grouping variable, such as the underlying dataset, the provided file name is used as a base.

  • Methods for setting labels previously could update the ordering of the labels for familiarCollection objects, which could produce unexpected changes. Setting new labels now does not change the label order. Use the order argument to update the order of the labels.

Bug fixes

  • Fixed an error that would occur when attempting to create risk group labels for a familiarCollection object that is composed of externally provided familiarData objects.

  • Fixed an issue that would prevent a familiarCollection object from being returned if an experiment was run using a temporary folder.

  • Fixed an issue with apply functions in familiar taking long to aggregate their results.

  • Fixed an issue that would prevent Kaplan-Meier curves to be plotted when more than three risk strata where present.

  • Fixed an error that would occur if Kaplan-Meier curves were plotted for more than one stratification method and different risk groups.

  • Fixed an issue that could potentially cause matching wrong transformation and normalisation parameter values when forming ensemble models. This may have affect sample cluster plots, which uses this information.

Version 1.4.0 (Misanthropic Muskrat)

24 Nov 08:12
c9b9f65
Compare
Choose a tag to compare

Major changes

  • Hyperparameter optimisation now trains naive models if none of the hyperparameter sets lead to models that perform better than these models. Previously a model was trained regardless of whether such a model would actually be better than a naive model. Naive models, for example, predict the majority class or median value, depending on the problem.

Minor changes

  • Metrics for assessing performance of regression models, such as mean squared error, can now be computed in winsorised or trimmed (truncated) forms. These can be specified by appending _winsor or _trim as a suffix to the metric name. Winsorising clips the predicted values for 5% of the instances with the most extreme absolute errors prior to computing the performance metric, whereas trimming removes these instances. The result of either option is that for many metrics, the assessed model performance is less skewed by rare outliers.

  • Two additional optimisation functions were defined to assess suitability of hyperparameter sets:

    • model_balanced_estimate: seeks to maximise the estimate of the balanced IB and OOB score. This is similar to the balanced score, and in fact uses a hyperparameter learner to predict said score (not available for random search).

    • model_balanced_estimate_minus_sd: seeks to maximise the estimate of the balanced IB and OOB score, minus its estimated standard deviation. This is similar to the balanced score, but takes into account its estimated spread. Note that like model_estimate_minus_sd, the width of the distribution of balanced scores is more difficult to determine than its estimate.

  • The balanced optimisation function now adds a penalty when the trained model on the training data performs worse then a naive model.

  • A new exploration method for hyperparameter optimisation was added, namely single_shot. As the name suggests, this performs a single pass on the challenger and incumbent models during each intensification iteration. This is also the new default. Extensive tests have shown that the use of single-shot selection led to comparable performance.

  • Convergence checks for hyperparameter sets now depend on the validation optimisation score, as this is more stable than the summary score for some optimisation_function methods, such as model_estimate_minus_sd. More over the tolerance has been changed to allow for values above 0.01 for sample sizes smaller than 100. This prevents convergence issues where the expected statistical fluctuation for small sample sizes would easily break convergence checks, and hence force long searches for suitable hyperparameters.

  • The default familiar plotting theme is now exported as theme_familiar. This allows for changing tweaking the default theme, for example, setting a larger font size, or selecting a different font family. After making changing to theme, it can be provided as the ggtheme argument.

Bug fixes

  • ggtheme is now checked for completeness, which prevents errors with unclear causes or solutions.

  • We previously checked that any coefficients of a regression model could be estimated. This could lead to large models being formed where all features were insufficiently converged, even if this led to a meaningless model. We now check that all (instead of any) coefficients could be estimated for GLM, Cox and survival regression models.

  • Fixed an error caused by unsuccessfully retraining an anonymous random forest for variable importance estimations.

  • Fixed errors due to introduction of linewidth elements in version 3.4.0 of ggplot2. Versions of ggplot2 prior to 3.4.0 are still supported.