Covariates table draft #1 #3

eraimundez · 2024-01-17T12:46:50Z

This is an initial draft for the covariates table for the PEtab NLME extension #1
Open to discussions/feedback 😄

EDIT: I included the covariate values in the measurements table as we may have time-varying covariates, which is quite common at least in the pharmacometrics field.

paulflang · 2024-01-17T14:29:10Z

README.md

+- `$covariateId` one column per distinct covariate
+
+## Covariate table


Atm, I don't see why the covariates could not be hard-coded into the SBML file. Can you show one example that illustrates this? Or was it just that, while developing the model, you might want to try different covariate formulas without having to edit the SBML file?

From the pharmacometrics perspective:
Usually different information from the patients is available (e.g. demographics, body weight, biochemistry markers, ...) that can be potential covariates in the model.

During model development then the inclusion of the different covariates needs to be tested (as a model selection problem). Even the mathematical expression for the covariate may be tested to identify the most suitable one. That is why I kept it out from the SBML model, so the user will have high flexibility when defining the covariates and how they are modeled without needing to work with the SBML every time.

I see the point of adding the formula directly inside the SBML, but I thought this may be more complicated from a user perspective when iterating the structure.

So, I think maybe this depend on the approach we want to take:
(1) use PEtab while doing model development, or
(2) use PEtab to store the final model structure.

OK, thanks for the explanation. I see why this makes the whole thing more user friendly. Here are some additional thoughts:

the inclusion of the different covariates needs to be tested (as a model selection problem).

Then we need to make sure it is compatible with PEtab select. But my (limited) understanding of PEtab select suggests, that whatever we come up with here, can be nested inside a PEtab select problem, so we don't have to worry about this now.

Additionally, I am wondering if the estimate column in the covariate table should be replaced with sth like select (I think the estimate column should NOT be used to indicate whether $\theta_{cov}$, should be estimated or not. As I explain in a comment below, I think all $\theta_{cov}$ s should go explicitly in the covariateFormula and the parameter table (which has an estimate column already. What are your thoughts?).

The allowed values of such a select column would be sth like include, exclude, optimize. But what would you then do, if you have like three different formulas you want to try for sth. like WT? Well, I guess you could just copy the row for WT, except for the covariateFormula, which would be different. If select is optimize and no WT row is include, then this would mean to optimize which expression should be taken, if any (e.g. WT might not improve the model selection criterion, so no expression might be taken). If all rows are exclude, it is excluded, if exactly one row is include, it is included. Else the PEtab is invalid.

I see the point of adding the formula directly inside the SBML, but I thought this may be more complicated from a user perspective when iterating the structure.

Fair enough. If for whatever reason this is needed (e.g. users prefer MathML over string expressions), users can still specify alternative SBML files as part of a stardard PEtab select problem I think.

So, I think maybe this depend on the approach we want to take:

I am in favor of (1). PEtab is for model fitting/development, SBML for storing the final model with optimized parameters.

paulflang · 2024-01-17T14:29:14Z

README.md

+- `covariateParameter` [STRING]
+
+   Defines the parameter where the covariate effect (COVEFF) is modelled. It must be a `parameterId` defined in the parameter table. The COVEFF would be multiplied with the `parameterId`.


Instead of multiplying with a parameterId, as an alternative, have you thought about having covariateId override SBML functionDefinitions? What would be the pros and cons of that?

paulflang · 2024-01-17T14:29:16Z

README.md

+-  `covariateFormula` [STRING: 'exclude' or 'fractional' or 'power' or free text formula]
+
+   Covariate effect function as plain text formula expression. The user can define its own formula or can use predefined models: 'exclude' or 'fractional' or 'power'.


... the user can define their own formula... (or: ...users can define their own formula...)

In the covariate table, you provided, I see that k3 would be multiplied with four different terms (((WT/70)**WTeff), exclude, fractional and power), right? Regarding exclude for which sex would this multiplication with 1 happen? And what would happen for the other sex?

The interpretation is correct.

The behavior for exclude would be that, in this case, SEX is not having an impact at all in the model (regardless of the actual category male or female). So, the effect of SEX would be none for both categories.

I suggested this exclude behavior thinking about testing the inclusion of different covariates in the model to try to have it in a easy manner for the user.

But, as I indicate in another comment in this PR, the approach about this extension is something we need to discuss 😄

Got it.

the approach about this extension is something we need to discuss

How about my suggestion above of using a select column, instead of an estimate column?

paulflang · 2024-01-17T14:29:17Z

README.md

+
+   For continuous covariates:
+   $\text{COVEFF}= 1 + \theta_{cov} \times \left(\text{covariateId} - median(\text{covariateId})\right)$


What is $\theta_{cov}$?

Sorry I did not explain this in the README, quickly $\theta_{cov}$ represents the magnitude / effect of the covariate on a specific parameter. Usually needs to be estimated, e.g. "estimated 30% decrease in the drug clearance in female vs male"

I will correct this in the README accordingly 👍🏼

OK. But how about including $\theta_{cov}$ in the covariateFormula, and following the convention that everything in the formula must be either

a covariate as specified in the measurement table

a valid SBML ID (e.g. a parameter or species)

a parameter defined in the parameter table.

This would mean, that Instead of power, you would have to do power^theta_cov, and put theta_cov in the parameter table, which would give us lowerBound, upperBound, nominalValue, estimate, objectivePrior, etc. for free.

Which also tells me that my interpretation of lowerBound, upperBound and nominalValue in the covariate table was probably wrong. Now I think that they refer to $\theta_{cov}$. Is this correct? If so, we could either

get rid of them in the covariate table

still use lowerBound and upperBound, but with the meaning I suggested earlier (i.e. just for verification purposes).

paulflang · 2024-01-17T14:29:19Z

README.md

+- `covariateType` [STRING]
+
+   Defines whether a covariate is categorical or numerical.


I think this could be inferred from measurements.tsv (the convention could be that if there are only numbers in the column, then it is numerical, if there are only strings then categorical, otherwise error).

I am not sure about using strings for categorical variables. Currently most of the pharmacometrics modeling tools adopt numerical values for these. If we use strings, that would add another layer of processing when compiling the model into the corresponding software tool.

But we can discuss that with the rest of the project team.

Moreover another thing I can think of is that sometimes data anonymization may be needed, e.g. to exchange with an external partner, so with that we would be giving the actual, e.g. sex, race, ethnicity, positivity for a given chemical assay... and this could be sometimes problematic.

Hmm, OK. Seems to be a tradeoff between conciseness and ease of developing a PEtab export tool. I don't see the anonymization tradeoff, cause it is easy to use one or any other string instead of 1, etc. Let's discuss with the others.

paulflang · 2024-01-17T14:29:21Z

README.md

+- `lowerBound` [NUMERICAL]
+
+   Same use as in the parameter table.


Would that make sense for a categorical covariate? And isn't it the fault of the user if some of the patients has such a low number, that the expression suddenly evaluates to a negative number? To avoid that, we could make it "Lower bound of covariateFormula", which could be set to zero. But tbh, these kind of hard caps sound fishy to me. I can see how this is useful to prevent negative values, but it is certainly not biological to suddenly cap them, and therefore it is not representative of the system that should be modelled. Imo, a good standard should not allow users to specify things that do not make sense. But of course, I can be convinced that this makes sense. What could probably make sense is, that if one or more of the covariateValues set in the measurements.tsv lead to a value outside the boundaries, petablint complains that the model is invalid. So upper and lower bound are optional and would just be used as sanity check.

paulflang · 2024-01-17T14:29:23Z

README.md

+- `nominalValue` [NUMERICAL]
+
+   Same use as in the parameter table.


We could additionally allow things like median, mean, max, min their, to automatically calculate the value from what is provided in the measurements.tsv.

Like that suggestion!

paulflang · 2024-01-17T14:29:25Z

covariates.tsv

+SEX	k3	exclude	categorical	log10	0.1	10	1	0
+AGE	k3	fractional	categorical	log10	0.1	1	0.5	1
+EGF	k3	power	continuous	log10	0.1	10	1	1
+EGF	k5	((EGF/4.56)**EGFeffk5)	continuous	log10	0.1	10	1	1


I think PEtab uses ^ instead of **.

covariates table draft

f1bd355

eraimundez added the enhancement New feature or request label Jan 17, 2024

eraimundez requested a review from paulflang January 17, 2024 12:46

eraimundez self-assigned this Jan 17, 2024

eraimundez added 5 commits January 17, 2024 13:58

fix formatting readme file

7c08a7e

fix formatting readme file

0e1640e

fix formatting readme file

c8b4726

fix formatting readme file

e55412e

fix formatting readme file

0d6ca3b

paulflang reviewed Jan 19, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Covariates table draft #1 #3

Covariates table draft #1 #3

eraimundez commented Jan 17, 2024 •

edited

Loading

paulflang Jan 17, 2024

eraimundez Jan 19, 2024

paulflang Jan 19, 2024

paulflang Jan 17, 2024

paulflang Jan 17, 2024

eraimundez Jan 19, 2024

paulflang Jan 19, 2024

paulflang Jan 17, 2024

eraimundez Jan 19, 2024

paulflang Jan 19, 2024

paulflang Jan 17, 2024

eraimundez Jan 19, 2024

paulflang Jan 19, 2024

paulflang Jan 17, 2024

paulflang Jan 17, 2024

eraimundez Jan 19, 2024

paulflang Jan 17, 2024

		- `$covariateId` one column per distinct covariate

		## Covariate table

		- `covariateParameter` [STRING]

		Defines the parameter where the covariate effect (COVEFF) is modelled. It must be a `parameterId` defined in the parameter table. The COVEFF would be multiplied with the `parameterId`.

		- `covariateFormula` [STRING: 'exclude' or 'fractional' or 'power' or free text formula]

		Covariate effect function as plain text formula expression. The user can define its own formula or can use predefined models: 'exclude' or 'fractional' or 'power'.


		For continuous covariates:
		$\text{COVEFF}= 1 + \theta_{cov} \times \left(\text{covariateId} - median(\text{covariateId})\right)$

		- `covariateType` [STRING]

		Defines whether a covariate is categorical or numerical.

		- `lowerBound` [NUMERICAL]

		Same use as in the parameter table.

		- `nominalValue` [NUMERICAL]

		Same use as in the parameter table.

Covariates table draft #1 #3

Are you sure you want to change the base?

Covariates table draft #1 #3

Conversation

eraimundez commented Jan 17, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eraimundez commented Jan 17, 2024 •

edited

Loading