-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Covariates table draft #1 #3
base: main
Are you sure you want to change the base?
Conversation
- `$covariateId` one column per distinct covariate | ||
|
||
## Covariate table |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Atm, I don't see why the covariates could not be hard-coded into the SBML file. Can you show one example that illustrates this? Or was it just that, while developing the model, you might want to try different covariate formulas without having to edit the SBML file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the pharmacometrics perspective:
Usually different information from the patients is available (e.g. demographics, body weight, biochemistry markers, ...) that can be potential covariates in the model.
During model development then the inclusion of the different covariates needs to be tested (as a model selection problem). Even the mathematical expression for the covariate may be tested to identify the most suitable one. That is why I kept it out from the SBML model, so the user will have high flexibility when defining the covariates and how they are modeled without needing to work with the SBML every time.
I see the point of adding the formula directly inside the SBML, but I thought this may be more complicated from a user perspective when iterating the structure.
So, I think maybe this depend on the approach we want to take:
(1) use PEtab while doing model development, or
(2) use PEtab to store the final model structure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, thanks for the explanation. I see why this makes the whole thing more user friendly. Here are some additional thoughts:
the inclusion of the different covariates needs to be tested (as a model selection problem).
Then we need to make sure it is compatible with PEtab select. But my (limited) understanding of PEtab select suggests, that whatever we come up with here, can be nested inside a PEtab select problem, so we don't have to worry about this now.
Additionally, I am wondering if the estimate
column in the covariate table should be replaced with sth like select
(I think the estimate
column should NOT be used to indicate whether covariateFormula
and the parameter table (which has an estimate
column already. What are your thoughts?).
The allowed values of such a select
column would be sth like include
, exclude
, optimize
. But what would you then do, if you have like three different formulas you want to try for sth. like WT
? Well, I guess you could just copy the row for WT
, except for the covariateFormula, which would be different. If select
is optimize
and no WT
row is include
, then this would mean to optimize which expression should be taken, if any (e.g. WT
might not improve the model selection criterion, so no expression might be taken). If all rows are exclude
, it is excluded, if exactly one row is include
, it is included. Else the PEtab is invalid.
I see the point of adding the formula directly inside the SBML, but I thought this may be more complicated from a user perspective when iterating the structure.
Fair enough. If for whatever reason this is needed (e.g. users prefer MathML over string expressions), users can still specify alternative SBML files as part of a stardard PEtab select problem I think.
So, I think maybe this depend on the approach we want to take:
I am in favor of (1). PEtab is for model fitting/development, SBML for storing the final model with optimized parameters.
- `covariateParameter` [STRING] | ||
|
||
Defines the parameter where the covariate effect (COVEFF) is modelled. It must be a `parameterId` defined in the parameter table. The COVEFF would be multiplied with the `parameterId`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of multiplying with a parameterId, as an alternative, have you thought about having covariateId
override SBML functionDefinition
s? What would be the pros and cons of that?
- `covariateFormula` [STRING: 'exclude' or 'fractional' or 'power' or free text formula] | ||
|
||
Covariate effect function as plain text formula expression. The user can define its own formula or can use predefined models: 'exclude' or 'fractional' or 'power'. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... the user can define their own formula... (or: ...users can define their own formula...)
In the covariate table, you provided, I see that k3 would be multiplied with four different terms (((WT/70)**WTeff)
, exclude
, fractional
and power
), right? Regarding exclude
for which sex would this multiplication with 1
happen? And what would happen for the other sex?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The interpretation is correct.
The behavior for exclude
would be that, in this case, SEX is not having an impact at all in the model (regardless of the actual category male or female). So, the effect of SEX would be none for both categories.
I suggested this exclude
behavior thinking about testing the inclusion of different covariates in the model to try to have it in a easy manner for the user.
But, as I indicate in another comment in this PR, the approach about this extension is something we need to discuss 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it.
the approach about this extension is something we need to discuss
How about my suggestion above of using a select
column, instead of an estimate
column?
|
||
For continuous covariates: | ||
$\text{COVEFF}= 1 + \theta_{cov} \times \left(\text{covariateId} - median(\text{covariateId})\right)$ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I did not explain this in the README, quickly
I will correct this in the README accordingly 👍🏼
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. But how about including covariateFormula
, and following the convention that everything in the formula must be either
- a covariate as specified in the measurement table
- a valid SBML ID (e.g. a parameter or species)
- a parameter defined in the parameter table.
This would mean, that Instead of power
, you would have to do power^theta_cov
, and put theta_cov
in the parameter table, which would give us lowerBound
, upperBound
, nominalValue
, estimate
, objectivePrior
, etc. for free.
Which also tells me that my interpretation of lowerBound
, upperBound
and nominalValue
in the covariate table was probably wrong. Now I think that they refer to
- get rid of them in the covariate table
- still use
lowerBound
andupperBound
, but with the meaning I suggested earlier (i.e. just for verification purposes).
- `covariateType` [STRING] | ||
|
||
Defines whether a covariate is categorical or numerical. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this could be inferred from measurements.tsv (the convention could be that if there are only numbers in the column, then it is numerical, if there are only strings then categorical, otherwise error).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure about using strings for categorical variables. Currently most of the pharmacometrics modeling tools adopt numerical values for these. If we use strings, that would add another layer of processing when compiling the model into the corresponding software tool.
But we can discuss that with the rest of the project team.
Moreover another thing I can think of is that sometimes data anonymization may be needed, e.g. to exchange with an external partner, so with that we would be giving the actual, e.g. sex, race, ethnicity, positivity for a given chemical assay... and this could be sometimes problematic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, OK. Seems to be a tradeoff between conciseness and ease of developing a PEtab export tool. I don't see the anonymization tradeoff, cause it is easy to use one
or any other string instead of 1
, etc. Let's discuss with the others.
- `lowerBound` [NUMERICAL] | ||
|
||
Same use as in the parameter table. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would that make sense for a categorical covariate? And isn't it the fault of the user if some of the patients has such a low number, that the expression suddenly evaluates to a negative number? To avoid that, we could make it "Lower bound of covariateFormula
", which could be set to zero. But tbh, these kind of hard caps sound fishy to me. I can see how this is useful to prevent negative values, but it is certainly not biological to suddenly cap them, and therefore it is not representative of the system that should be modelled. Imo, a good standard should not allow users to specify things that do not make sense. But of course, I can be convinced that this makes sense. What could probably make sense is, that if one or more of the covariateValues set in the measurements.tsv lead to a value outside the boundaries, petablint complains that the model is invalid. So upper and lower bound are optional and would just be used as sanity check.
- `nominalValue` [NUMERICAL] | ||
|
||
Same use as in the parameter table. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could additionally allow things like median
, mean
, max
, min
their, to automatically calculate the value from what is provided in the measurements.tsv.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like that suggestion!
SEX k3 exclude categorical log10 0.1 10 1 0 | ||
AGE k3 fractional categorical log10 0.1 1 0.5 1 | ||
EGF k3 power continuous log10 0.1 10 1 1 | ||
EGF k5 ((EGF/4.56)**EGFeffk5) continuous log10 0.1 10 1 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think PEtab uses ^
instead of **
.
This is an initial draft for the covariates table for the PEtab NLME extension #1
Open to discussions/feedback 😄
EDIT: I included the covariate values in the measurements table as we may have time-varying covariates, which is quite common at least in the pharmacometrics field.