Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Covariates table draft #1 #3

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open

Conversation

eraimundez
Copy link
Collaborator

@eraimundez eraimundez commented Jan 17, 2024

This is an initial draft for the covariates table for the PEtab NLME extension #1
Open to discussions/feedback 😄

EDIT: I included the covariate values in the measurements table as we may have time-varying covariates, which is quite common at least in the pharmacometrics field.

@eraimundez eraimundez added the enhancement New feature or request label Jan 17, 2024
@eraimundez eraimundez requested a review from paulflang January 17, 2024 12:46
@eraimundez eraimundez self-assigned this Jan 17, 2024
- `$covariateId` one column per distinct covariate

## Covariate table
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Atm, I don't see why the covariates could not be hard-coded into the SBML file. Can you show one example that illustrates this? Or was it just that, while developing the model, you might want to try different covariate formulas without having to edit the SBML file?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the pharmacometrics perspective:
Usually different information from the patients is available (e.g. demographics, body weight, biochemistry markers, ...) that can be potential covariates in the model.

During model development then the inclusion of the different covariates needs to be tested (as a model selection problem). Even the mathematical expression for the covariate may be tested to identify the most suitable one. That is why I kept it out from the SBML model, so the user will have high flexibility when defining the covariates and how they are modeled without needing to work with the SBML every time.

I see the point of adding the formula directly inside the SBML, but I thought this may be more complicated from a user perspective when iterating the structure.

So, I think maybe this depend on the approach we want to take:
(1) use PEtab while doing model development, or
(2) use PEtab to store the final model structure.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, thanks for the explanation. I see why this makes the whole thing more user friendly. Here are some additional thoughts:

the inclusion of the different covariates needs to be tested (as a model selection problem).

Then we need to make sure it is compatible with PEtab select. But my (limited) understanding of PEtab select suggests, that whatever we come up with here, can be nested inside a PEtab select problem, so we don't have to worry about this now.

Additionally, I am wondering if the estimate column in the covariate table should be replaced with sth like select (I think the estimate column should NOT be used to indicate whether $\theta_{cov}$, should be estimated or not. As I explain in a comment below, I think all $\theta_{cov}$ s should go explicitly in the covariateFormula and the parameter table (which has an estimate column already. What are your thoughts?).

The allowed values of such a select column would be sth like include, exclude, optimize. But what would you then do, if you have like three different formulas you want to try for sth. like WT? Well, I guess you could just copy the row for WT, except for the covariateFormula, which would be different. If select is optimize and no WT row is include, then this would mean to optimize which expression should be taken, if any (e.g. WT might not improve the model selection criterion, so no expression might be taken). If all rows are exclude, it is excluded, if exactly one row is include, it is included. Else the PEtab is invalid.

I see the point of adding the formula directly inside the SBML, but I thought this may be more complicated from a user perspective when iterating the structure.

Fair enough. If for whatever reason this is needed (e.g. users prefer MathML over string expressions), users can still specify alternative SBML files as part of a stardard PEtab select problem I think.

So, I think maybe this depend on the approach we want to take:

I am in favor of (1). PEtab is for model fitting/development, SBML for storing the final model with optimized parameters.

- `covariateParameter` [STRING]

Defines the parameter where the covariate effect (COVEFF) is modelled. It must be a `parameterId` defined in the parameter table. The COVEFF would be multiplied with the `parameterId`.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of multiplying with a parameterId, as an alternative, have you thought about having covariateId override SBML functionDefinitions? What would be the pros and cons of that?

- `covariateFormula` [STRING: 'exclude' or 'fractional' or 'power' or free text formula]

Covariate effect function as plain text formula expression. The user can define its own formula or can use predefined models: 'exclude' or 'fractional' or 'power'.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... the user can define their own formula... (or: ...users can define their own formula...)

In the covariate table, you provided, I see that k3 would be multiplied with four different terms (((WT/70)**WTeff), exclude, fractional and power), right? Regarding exclude for which sex would this multiplication with 1 happen? And what would happen for the other sex?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The interpretation is correct.

The behavior for exclude would be that, in this case, SEX is not having an impact at all in the model (regardless of the actual category male or female). So, the effect of SEX would be none for both categories.

I suggested this exclude behavior thinking about testing the inclusion of different covariates in the model to try to have it in a easy manner for the user.

But, as I indicate in another comment in this PR, the approach about this extension is something we need to discuss 😄

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it.

the approach about this extension is something we need to discuss

How about my suggestion above of using a select column, instead of an estimate column?


For continuous covariates:
$\text{COVEFF}= 1 + \theta_{cov} \times \left(\text{covariateId} - median(\text{covariateId})\right)$
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is $\theta_{cov}$?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I did not explain this in the README, quickly $\theta_{cov}$ represents the magnitude / effect of the covariate on a specific parameter. Usually needs to be estimated, e.g. "estimated 30% decrease in the drug clearance in female vs male"

I will correct this in the README accordingly 👍🏼

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. But how about including $\theta_{cov}$ in the covariateFormula, and following the convention that everything in the formula must be either

  • a covariate as specified in the measurement table
  • a valid SBML ID (e.g. a parameter or species)
  • a parameter defined in the parameter table.

This would mean, that Instead of power, you would have to do power^theta_cov, and put theta_cov in the parameter table, which would give us lowerBound, upperBound, nominalValue, estimate, objectivePrior, etc. for free.

Which also tells me that my interpretation of lowerBound, upperBound and nominalValue in the covariate table was probably wrong. Now I think that they refer to $\theta_{cov}$. Is this correct? If so, we could either

  • get rid of them in the covariate table
  • still use lowerBound and upperBound, but with the meaning I suggested earlier (i.e. just for verification purposes).

- `covariateType` [STRING]

Defines whether a covariate is categorical or numerical.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could be inferred from measurements.tsv (the convention could be that if there are only numbers in the column, then it is numerical, if there are only strings then categorical, otherwise error).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure about using strings for categorical variables. Currently most of the pharmacometrics modeling tools adopt numerical values for these. If we use strings, that would add another layer of processing when compiling the model into the corresponding software tool.

But we can discuss that with the rest of the project team.

Moreover another thing I can think of is that sometimes data anonymization may be needed, e.g. to exchange with an external partner, so with that we would be giving the actual, e.g. sex, race, ethnicity, positivity for a given chemical assay... and this could be sometimes problematic.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, OK. Seems to be a tradeoff between conciseness and ease of developing a PEtab export tool. I don't see the anonymization tradeoff, cause it is easy to use one or any other string instead of 1, etc. Let's discuss with the others.

- `lowerBound` [NUMERICAL]

Same use as in the parameter table.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would that make sense for a categorical covariate? And isn't it the fault of the user if some of the patients has such a low number, that the expression suddenly evaluates to a negative number? To avoid that, we could make it "Lower bound of covariateFormula", which could be set to zero. But tbh, these kind of hard caps sound fishy to me. I can see how this is useful to prevent negative values, but it is certainly not biological to suddenly cap them, and therefore it is not representative of the system that should be modelled. Imo, a good standard should not allow users to specify things that do not make sense. But of course, I can be convinced that this makes sense. What could probably make sense is, that if one or more of the covariateValues set in the measurements.tsv lead to a value outside the boundaries, petablint complains that the model is invalid. So upper and lower bound are optional and would just be used as sanity check.

- `nominalValue` [NUMERICAL]

Same use as in the parameter table.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could additionally allow things like median, mean, max, min their, to automatically calculate the value from what is provided in the measurements.tsv.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like that suggestion!

SEX k3 exclude categorical log10 0.1 10 1 0
AGE k3 fractional categorical log10 0.1 1 0.5 1
EGF k3 power continuous log10 0.1 10 1 1
EGF k5 ((EGF/4.56)**EGFeffk5) continuous log10 0.1 10 1 1
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think PEtab uses ^ instead of **.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants