You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, first of all, thank you for the awesome library! I am a maintainer of the Anomalib library, and we are using TorchMetrics extensively throughout our code base to evaluate our models.
The most recent version of TorchMetrics introduced some changes to the PrecisionRecallCurve metric, which are causing some problems in one of our components. The problems are related to the re-mapping of the prediction values to the [0,1] range by applying a sigmoid function.
Some context
The goal of the models in our library is to detect anomalous samples in a dataset that contains both normal and anomalous samples. The task is similar to a classical binary classification problem, but instead of generating a class label and a confidence score, our models generate an anomaly score, which quantifies the distance of the sample to the distribution of normal samples seen during training. The range of possible anomaly score values is unbounded and may differ widely between models and/or datasets, which makes it tricky to set a good threshold for mapping the raw anomaly scores to a binary class label (normal vs. anomalous). This is why we apply an adaptive thresholding mechanism as a post-processing step. The adaptive threshold mechanism returns the threshold value that maximizes the F1 score over the validation set.
Our adaptive thresholding class inherits from TorchMetrics' PrecisionRecallCurve class. After TorchMetrics computes the precision and recall values, our class computes the F1 scores for the range of precision and recall values, and finally returns the threshold value that corresponds to the highest observed F1 score.
The problem
In the latest version of the PrecisionRecallCurve metric, the update method now re-maps the predictions to the [0, 1] range by applying a sigmoid function. As a result, the thresholds variable returned by compute is now not in the same domain as the original predictions, and the values are not usable for our purpose of finding the optimal threshold value.
In addition, the sigmoid function squeezes the higher and lower values, which leads to lower resolution at the extremes of the input range, and in some cases even information loss.
To Reproduce
Here's an example to illustrate the problem. Let's say we have a set of binary targets and a set of model predictions in the range [12, 17]. Previously, the PrecisionRecallCurve metric would return the values of precision and recall for the different thresholds that occur naturally in the data.
Given these outputs it is straightforward to obtain the F1 scores for the different threshold values and use this to find the optimal threshold that maximizes F1.
After the recent changes, the predictions are now re-mapped by the sigmoid function. While we can still compute the F1 scores, we can no longer find the value of the threshold that yields the highest F1 score, because the values of the thresholds variable are no longer in the same domain as the original predictions.
Note that the elements of the thresholds variable all appear as 1.0000 because the numerical differences between the threshold candidates are minimized due to the squeezing effect of the sigmoid.
It gets even worse when we increase the absolute values of the predictions to [22, 27]. The output of the sigmoid now evaluates to 1.0 for all predictions due to rounding, and the metric is not able to compute any meaningful precision and recall values.
I guess this change was made to accommodate classical binary classification problems, where the predictions are generally confidence scores in the [0, 1] range, but I feel this is too restricting for other problem classes. Mathematically there is no reason why the precision-recall curve cannot be computed using predictions that fall outside of this range.
Expected behavior
The re-mapping of the prediction values to [0,1] by applying a sigmoid function should be optional.
Environment
TorchMetrics 0.11.1 (pip)
Python 3.8
PyTorch 1.13.1
The text was updated successfully, but these errors were encountered:
Hi @djdameln, thanks for reporting this issue. Sorry for not getting back to you sooner.
I open PR #1676 with the proposed solution to add a new format_input=True/False argument which essentially can be used to enable/disable the internal formatting:
thus in your case simply added format_input=False when you initialize metrics should work.
On a note, you are completely right that we introduced this as our standard formatting of input to standardize all possible input for further processing. This was especially related to the inclusion of the new thresholds argument for precision recall curve, where thresholds=100 means that we need to pre-set 100 thresholds before seeing the users input and in that the only solution is to standardize everything to one format.
🐛 Bug
Hello, first of all, thank you for the awesome library! I am a maintainer of the Anomalib library, and we are using TorchMetrics extensively throughout our code base to evaluate our models.
The most recent version of TorchMetrics introduced some changes to the
PrecisionRecallCurve
metric, which are causing some problems in one of our components. The problems are related to the re-mapping of the prediction values to the [0,1] range by applying a sigmoid function.Some context
The goal of the models in our library is to detect anomalous samples in a dataset that contains both normal and anomalous samples. The task is similar to a classical binary classification problem, but instead of generating a class label and a confidence score, our models generate an anomaly score, which quantifies the distance of the sample to the distribution of normal samples seen during training. The range of possible anomaly score values is unbounded and may differ widely between models and/or datasets, which makes it tricky to set a good threshold for mapping the raw anomaly scores to a binary class label (normal vs. anomalous). This is why we apply an adaptive thresholding mechanism as a post-processing step. The adaptive threshold mechanism returns the threshold value that maximizes the F1 score over the validation set.
Our adaptive thresholding class inherits from TorchMetrics'
PrecisionRecallCurve
class. After TorchMetrics computes the precision and recall values, our class computes the F1 scores for the range of precision and recall values, and finally returns the threshold value that corresponds to the highest observed F1 score.The problem
In the latest version of the
PrecisionRecallCurve
metric, theupdate
method now re-maps the predictions to the [0, 1] range by applying a sigmoid function. As a result, thethresholds
variable returned bycompute
is now not in the same domain as the original predictions, and the values are not usable for our purpose of finding the optimal threshold value.In addition, the sigmoid function squeezes the higher and lower values, which leads to lower resolution at the extremes of the input range, and in some cases even information loss.
To Reproduce
Here's an example to illustrate the problem. Let's say we have a set of binary targets and a set of model predictions in the range [12, 17]. Previously, the
PrecisionRecallCurve
metric would return the values of precision and recall for the different thresholds that occur naturally in the data.v0.10.3
Given these outputs it is straightforward to obtain the F1 scores for the different threshold values and use this to find the optimal threshold that maximizes F1.
After the recent changes, the predictions are now re-mapped by the sigmoid function. While we can still compute the F1 scores, we can no longer find the value of the threshold that yields the highest F1 score, because the values of the
thresholds
variable are no longer in the same domain as the original predictions.v0.11.1
Note that the elements of the
thresholds
variable all appear as 1.0000 because the numerical differences between the threshold candidates are minimized due to the squeezing effect of the sigmoid.It gets even worse when we increase the absolute values of the predictions to [22, 27]. The output of the sigmoid now evaluates to 1.0 for all predictions due to rounding, and the metric is not able to compute any meaningful precision and recall values.
v0.11.1
I guess this change was made to accommodate classical binary classification problems, where the predictions are generally confidence scores in the [0, 1] range, but I feel this is too restricting for other problem classes. Mathematically there is no reason why the precision-recall curve cannot be computed using predictions that fall outside of this range.
Expected behavior
The re-mapping of the prediction values to [0,1] by applying a sigmoid function should be optional.
Environment
The text was updated successfully, but these errors were encountered: