Skip to content

Commit

Permalink
WIP on evaluation tasks.
Browse files Browse the repository at this point in the history
  • Loading branch information
alexzwanenburg committed Dec 4, 2024
1 parent 482b0ec commit 6b0aeac
Show file tree
Hide file tree
Showing 9 changed files with 366 additions and 74 deletions.
1 change: 1 addition & 0 deletions DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -199,6 +199,7 @@ Collate:
'RankStabilityAggregation.R'
'SocketServer.R'
'StringUtilities.R'
'TaskEvaluate.R'
'TaskFeatureInfo.R'
'TaskLearn.R'
'TaskLearnerHyperparameters.R'
Expand Down
20 changes: 17 additions & 3 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,12 @@

## Breaking changes

- Naming and on-disk location of variable importance tables and models has
changed. These are no longer nested to limit path lengths.
- Naming and on-disk location of variable importance tables, models, evaluated
datasets and collections, have changed. These are no longer nested to reduce
path lengths and avoid issues due to long path lengths, particularly on
Windows OS.

- Ensembles are now no longer explicitly stored, but are formed at run-time.

## Major changes

Expand Down Expand Up @@ -37,6 +41,13 @@

- `parallel_feature_selection` was renamed to `parallel_vimp`.

- It is now possible to build models without explicitly defining a variable
importance (feature selection) step. For example,
`experimental_design = mb + ev` is now valid and will result in training of a
single model on the development dataset with subsequent evaluation on an
external dataset. This is realised by using variable importance data obtained
during hyperparameter optimisation.

## Minor changes

- The `iteration_seed` configuration parameter was added to provide a fixed seed
Expand Down Expand Up @@ -73,7 +84,7 @@
lead to too few samples to allow for assessment. This affected
Leave-One-Out-Cross-Validation (LOOCV) schemes in particular.

## Bug fixes
## Fixes

- Fixed errors when creating feature or similarity plots caused by sample or
feature names matching internal column names.
Expand All @@ -83,6 +94,9 @@
- Variable importance methods and outcome information objects were missing a
familiar version attribute, which has now been added to ensure future
compatibility.

- Some vignettes referred to `experiment_design` where `experimental_design` was
intended.

# Version 1.5.0 (Whole Whale)

Expand Down
12 changes: 12 additions & 0 deletions R/ExperimentSetup.R
Original file line number Diff line number Diff line change
Expand Up @@ -201,6 +201,18 @@ extract_experimental_setup <- function(
section_table[main_data_id == data_id, "n_runs" := length(iteration_list[[as.character(data_id)]]$run)]
}

# Set the (max) number of available validation instances.
for (data_id in section_table$main_data_id) {
section_table[main_data_id == data_id, "max_validation_instances" := max(sapply(
iteration_list[[as.character(data_id)]]$run,
function(x) {
if (is_empty(x$valid_samples)) return(0L)

return(nrow(x$valid_samples))
}
))]
}

return(section_table)
}

Expand Down
1 change: 1 addition & 0 deletions R/Familiar.R
Original file line number Diff line number Diff line change
Expand Up @@ -407,6 +407,7 @@ summon_familiar <- function(
optimisation_determine_vimp = settings$hpo$hpo_determine_vimp,
vimp_methods = settings$vimp$vimp_methods,
learners = settings$mb$learners,
pool_only = settings$eval$pool_only,
file_paths = file_paths
)
}
Expand Down
Loading

0 comments on commit 6b0aeac

Please sign in to comment.