diff --git a/CHANGELOG.md b/CHANGELOG.md index 3aec7afef..8a19f4243 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,5 +1,29 @@ ### Changelogs +#### 0.15.0 + - adding `robust` params to `CoxPHFitter`'s `fit`. This enables atleast i) using non-integer weights in the model (these could be sampling weights like IPTW), and ii) mis-specified models (ex: non-proportional hazards). Under the hood it's a sandwich estimator. This does not handle ties, so if there are high number of ties, results may significantly differ from other software. + - `standard_errors_` is now a property on fitted `CoxPHFitter` which describes the standard errors of the coefficients. + - `variance_matrix_` is now a property on fitted `CoxPHFitter` which describes the variance matrix of the coefficients. + - new criteria for convergence of `CoxPHFitter` and `CoxTimeVaryingFitter` called the Newton-decrement. Tests show it is as accurate (w.r.t to previous coefficients) and typically shaves off a single step, resulting in generally faster convergence. See https://www.cs.cmu.edu/~pradeepr/convexopt/Lecture_Slides/Newton_methods.pdf. Details about the Newton-decrement are added to the `show_progress` statements. + - Minimum suppport for scipy is 1.0 + - Convergence errors in models that use Newton-Rhapson methods now throw a `ConvergenceError`, instead of a `ValueError` (the former is a subclass of the latter, however). + - `AalenAdditiveModel` raises `ConvergenceWarning` instead of printing a warning. + - `KaplanMeierFitter` now has a cumulative plot option. Example `kmf.plot(invert_y_axis=True)` + - a `weights_col` option has been added to `CoxTimeVaryingFitter` that allows for time-varying weights. + - `WeibullFitter` has a new `show_progress` param and additional information if the convergence fails. + - `CoxPHFitter`, `ExponentialFitter`, `WeibullFitter` and `CoxTimeVaryFitter` method `print_summary` is updated with new fields. + - `WeibullFitter` has renamed the incorrect `_jacobian` to `_hessian_`. + - `variance_matrix_` is now a property on fitted `WeibullFitter` which describes the variance matrix of the parameters. + - The default `WeibullFitter().timeline` has changed from integers between the min and max duration to _n_ floats between the max and min durations, where _n_ is the number of observations. + - Performance improvements for `CoxPHFitter` (~20% faster) + - Performance improvements for `CoxTimeVaryingFitter` (~100% faster) + - In Python3, Univariate models are now serialisable with `pickle`. Thanks @dwilson1988 for the contribution. For Python2, `dill` is still the preferred method. + - `baseline_cumulative_hazard_` (and derivatives of that) on `CoxPHFitter` now correctly incorporate the `weights_col`. + - Fixed a bug in `KaplanMeierFitter` when late entry times lined up with death events. Thanks @pzivich + - Adding `cluster_col` argument to `CoxPHFitter` so users can specify groups of subjects/rows that may be correlated. + - Shifting the "signficance codes" for p-values down an order of magnitude. (Example, p-values between 0.1 and 0.05 are not noted at all and p-values between 0.05 and 0.1 are noted with `.`, etc.). This deviates with how they are presented in other software. There is an argument to be made to remove p-values from lifelines altogether (_become the changes you want to see in the world_ lol), but I worry that people could compute the p-values by hand incorrectly, a worse outcome I think. So, this is my stance. P-values between 0.1 and 0.05 offer _very_ little information, so they are removed. There is a growing movement in statistics to shift "signficant" findings to p-values less than 0.01 anyways. + - New fitter for cumulative incidence of multiple risks `AalenJohansenFitter`. Thanks @pzivich! See "Methodologic Issues When Estimating Risks in Pharmacoepidemiology" for a nice overview of the model. + #### 0.14.6 - fix for n > 2 groups in `multivariate_logrank_test` (again). - fix bug for when `event_observed` column was not boolean. diff --git a/docs/Examples.rst b/docs/Examples.rst index 1080a6646..2036456f0 100644 --- a/docs/Examples.rst +++ b/docs/Examples.rst @@ -282,6 +282,18 @@ Hide confidence intervals :height: 300 +Invert axis +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +.. code-block:: python + + kmf.fit(T, label="kmf.plot(invert_y_axis=True)") + kmf.plot(invert_y_axis=True) + +.. image:: /images/invert_y_axis.png + :height: 300 + + Set the index/timeline of a estimate ############################################## @@ -403,7 +415,7 @@ id T E Example SQL queries and transformations to get time varying data #################################################################### -For Cox time-varying models, we discussed what the dataset should look like in :ref:`Dataset for time-varying regression`. Typically we have a base dataset, and then we fold in the covariate datasets. Below are some SQL queries and Python transformations from end-to-end. +For Cox time-varying models, we discussed what the dataset should look like in :ref:`Dataset creation for time-varying regression`. Typically we have a base dataset, and then we fold in the covariate datasets. Below are some SQL queries and Python transformations from end-to-end. Base dataset: ``base_df`` @@ -482,7 +494,7 @@ Initially, this can't be added to our baseline dataframe. Using ``utils.covariat Example cumulative total using and time-varying covariates ############################################################ -Often we have either __transactional covariate datasets__ or __state covariate datasets__. In a transactional dataset, it may make sense to sum up the covariates to represent administration of a treatment over time. For example, in the risky world of start-ups, we may want to sum up the funding amount recieved at a certain time. We also may be interested in the amount of the last round of funding. Below is an example to do just that: +Often we have either transactional covariate datasets or state covariate datasets. In a transactional dataset, it may make sense to sum up the covariates to represent administration of a treatment over time. For example, in the risky world of start-ups, we may want to sum up the funding amount recieved at a certain time. We also may be interested in the amount of the last round of funding. Below is an example to do just that: Suppose we have an initial DataFrame of start-ups like: @@ -573,6 +585,8 @@ Problems with convergence in the Cox Proportional Hazard Model ################################################################ Since the estimation of the coefficients in the Cox proportional hazard model is done using the Newton-Raphson algorithm, there is sometimes a problem with convergence. Here are some common symptoms and possible resolutions: + 0. First diagnostic: look for ``ConvergenceWarning`` in the output. Most often problems in convergence are the result of problems in the dataset. Lifelines has diagnostic checks it runs against the dataset before fitting and warnings are outputted to the user. + 1. ``delta contains nan value(s). Convergence halted.``: First try adding ``show_progress=True`` in the ``fit`` function. If the values in ``delta`` grow unboundedly, it's possible the ``step_size`` is too large. Try setting it to a small value (0.1-0.5). 2. ``LinAlgError: Singular matrix``: This means that there is a linear combination in your dataset. That is, a column is equal to the linear combination of 1 or more other columns. Try to find the relationship by looking at the correlation matrix of your dataset. @@ -584,7 +598,89 @@ Since the estimation of the coefficients in the Cox proportional hazard model is 3. Related to above, the relationship between a covariate and the duration may be completely determined. For example, if the rank correlation between a covariate and the duration is very close to 1 or -1, then the log-likelihood can be increased arbitrarly using just that covariate. Look for a ``ConvergenceWarning`` after the ``fit`` call. 4. Another problem may be a co-linear relationship in your dataset. See point 2. above. - 4. Adding a very small ``penalizer_coef`` significantly changes the results. This probably means that the step size is too large. Try decreasing it, and returning the ``penalizer_coef`` term to 0. + 4. If adding a very small ``penalizer`` significantly changes the results (``CoxPHFitter(penalizer=0.0001)``), then this probably means that the step size in the iterative algorithm is too large. Try decreasing it (``.fit(..., step_size=0.50)`` or smaller), and returning the ``penalizer`` term to 0. 5. If using the ``strata`` arugment, make sure your stratification group sizes are not too small. Try ``df.groupby(strata).size()``. +Adding weights to observations in a Cox model +############################################## + +There are two common uses for weights in a model. The first is as a data size reduction technique (known as case weights). If the dataset has more than one subjects with identical attributes, including duration and event, then their likelihood contribution is the same as well. Thus, instead of computing the log-likelihood for each individual, we can compute it once and multiple it by the count of users with identical attributes. In practice, this involves first grouping subjects by covariates and counting. For example, using the Rossi dataset, we will use Pandas to group by the attributes (but other data processing tools, like Spark, could do this as well): + +.. code-block:: python + + from lifelines.datasets import load_rossi + + rossi = load_rossi() + + rossi_weights = rossi.copy() + rossi_weights['weights'] = 1. + rossi_weights = rossi_weights.groupby(rossi.columns.tolist())['weights'].sum()\ + .reset_index() + + +The original dataset has 432 rows, while the grouped dataset has 387 rows plus an additional `weights` column. ``CoxPHFitter`` has an additional parameter to specify which column is the weight column. + +.. code-block:: python + + from lifelines import CoxPHFitter + + cp = CoxPHFitter() + cp.fit(rossi_weights, 'week', 'arrest', weights_col='weights') + + +The fitting should be faster, and the results identical to the unweighted dataset. This option is also available in the `CoxTimeVaryingFitter`. + + +The second use of weights is sampling weights. These are typically positive, non-integer weights that represent some artifical under/over sampling of observations (ex: inverse probability of treatment weights). It is recommened to set ``robust=True`` in the call to the ``fit`` as the usual standard error is incorrect for sampling weights. The ``robust`` flag will use the sandwich estimator for the standard error. + +.. warning:: The implementation of the sandwich estimator does not handle ties correctly (under the Efron handling of ties), and will give slightly or significantly different results from other software depending on the frequeny of ties. g + + +Correlations between subjects in a Cox model +################################################### + +There are cases when your dataset contains correlated subjects, which breaks the independent-and-identically-distributed assumption. What are some cases when this may happen? + +1. If a subject appears more than once in the dataset (common when subjects can have the event more than once) +2. If using a matching technique, like prospensity-score matching, there is a correlation between pairs. + +In both cases, the reported standard errors from a unadjusted Cox model will be wrong. In order to adjust for these correlations, there is a ``cluster_col`` keyword in `CoxPHFitter.fit` that allows you to specify the column in the dataframe that contains designations for correlated subjects. For example, if subjects in rows 1 & 2 are correlated, but no other subjects are correlated, then ``cluster_col`` column should have the same value for rows 1 & 2, and all others unique. Another example: for matched pairs, each subject in the pair should have the same value. + +.. code-block:: python + + from lifelines.datasets import load_rossi + from lifelines import CoxPHFitter + + rossi = load_rossi() + + # this may come from a database, or other libaries that specialize in matching + mathed_pairs = [ + (156, 230), + (275, 228), + (61, 252), + (364, 201), + (54, 340), + (130, 33), + (183, 145), + (268, 140), + (332, 259), + (314, 413), + (330, 211), + (372, 255), + # ... + ] + + rossi['id'] = None # we will populate this column + + for i, pair in enumerate(matched_pairs): + subjectA, subjectB = pair + rossi.loc[subjectA, 'id'] = i + rossi.loc[subjectB, 'id'] = i + + rossi = rossi.dropna(subset=['id']) + + cph = CoxPHFitter() + cph.fit(rossi, 'week', 'arrest', cluster_col='id') + +Specifying ``cluster_col`` will handle correlations, and invoke the robust sandwich estimator for standard errors (the same as setting `robust=True`). \ No newline at end of file diff --git a/docs/Quickstart.rst b/docs/Quickstart.rst index cbf4fa793..a3b1e12f7 100644 --- a/docs/Quickstart.rst +++ b/docs/Quickstart.rst @@ -159,16 +159,23 @@ The input of the ``fit`` method's API in a regression is different. All the data cph.print_summary() """ - n=200, number of events=189 + duration col = T + event col = E + number of subjects = 200 + number of events = 189 + log-likelihood = -807.620 + time fit was run = 2018-10-23 02:44:18 UTC + --- coef exp(coef) se(coef) z p lower 0.95 upper 0.95 - var1 0.2213 1.2477 0.0743 2.9796 0.0029 0.0757 0.3669 ** - var2 0.0509 1.0522 0.0829 0.6139 0.5393 -0.1116 0.2134 - var3 0.2186 1.2443 0.0758 2.8836 0.0039 0.0700 0.3672 ** + var1 0.2222 1.2488 0.0743 2.9920 0.0028 0.0767 0.3678 ** + var2 0.0510 1.0523 0.0829 0.6148 0.5387 -0.1115 0.2134 + var3 0.2183 1.2440 0.0758 2.8805 0.0040 0.0698 0.3669 ** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Concordance = 0.580 + Likelihood ratio test = 15.540 on 3 df, p=0.00141 """ cph.plot() diff --git a/docs/Survival Regression.rst b/docs/Survival Regression.rst index 41bb6f710..105af0eaf 100644 --- a/docs/Survival Regression.rst +++ b/docs/Survival Regression.rst @@ -35,7 +35,7 @@ Cox's Proportional Hazard model Lifelines has an implementation of the Cox propotional hazards regression model (implemented in R under ``coxph``). The idea behind the model is that the log-hazard of an individual is a linear function of their static covariates *and* a population-level baseline hazard that changes over time. Mathematically: -.. math:: \lambda(t | x) = \overbrace{b_0(t)}^{\text{baseline}}\underbrace{\exp \overbrace{\left(\sum_{i=1}^n b_i x_i \right)}^{\text{log-partial hazard}}}_ {\text{partial hazard}} +.. math:: \lambda(t | x) = \overbrace{b_0(t)}^{\text{baseline}}\underbrace{\exp \overbrace{\left(\sum_{i=1}^n b_i (x_i - \overline{x_i})\right)}^{\text{log-partial hazard}}}_ {\text{partial hazard}} Note a few facts about this model: the only time component is in the baseline hazard, :math:`b_0(t)`. In the above product, the partial hazard is a time-invariant scalar factor that only increases or decreases the baseline hazard. Thus a changes in covariates will only increase or decrease this baseline hazard. @@ -60,16 +60,22 @@ This example data is from the paper `here 4 rows × 6 columns

-From the above output, we can see that subject 1 changed state twice over the observation period, finally expiring at the end of time 10. Subject 2 was a censored case, and we lost track of them after time 2. +From the above output, we can see that subject 1 changed state twice over the observation period, finally expiring at the end of time 10. Subject 2 was a censored case, and we lost track of them after time 12. You may have multiple covariates you wish to add, so the above could be streamlined like so: @@ -889,6 +919,9 @@ Fitting the model Once your dataset is in the correct orientation, we can use ``CoxTimeVaryingFitter`` to fit the model to your data. The method is similar to ``CoxPHFitter``, expect we need to tell the ``fit`` about the additional time columns. +Fitting the Cox model to the data involves using gradient descent. Lifelines takes extra effort to help with convergence, so please be attentive to any warnings that appear. Fixing any warnings will generally help convergence. For further help, see :ref:`Problems with convergence in the Cox Proportional Hazard Model`. + + .. code:: python from lifelines import CoxTimeVaryingFitter @@ -918,7 +951,30 @@ of AUC, another common loss function, and is interpreted similarly: * 1.0 is perfect concordance and, * 0.0 is perfect anti-concordance (multiply predictions with -1 to get 1.0) -The measure is implemented in lifelines under `lifelines.utils.concordance_index` and accepts the actual times (along with any censorships) and the predicted times. +A fitted model's concordance-index is present in the `print_summary()`, but also available under the `score_` property. Generally, the measure is implemented in lifelines under `lifelines.utils.concordance_index` and accepts the actual times (along with any censorships) and the predicted times. + +.. code:: python + + from lifelines import CoxPHFitter + from lifelines.datasets import load_rossi + + rossi = load_rossi() + + cph = CoxPHFitter() + cph.fit(rossi, duration_col="week", event_col="arrest") + + # method one + cph.print_summary() + + # method two + print(cph.score_) + + # method three + from lifelines.utils import concordance_index + print(concordance_index(rossi['week'], -cph.predict_partial_hazard(rossi).values, rossi['arrest'])) + + +However, there are other, arguably better, methods to measure the fit of a model. Included in `print_summary` is the log-likelihood, which can be used in an `AIC calculation `, and the `log-likelihood ratio statistic `. Generally, I personally loved this article by Frank Harrell, `"Statistically Efficient Ways to Quantify Added Predictive Value of New Measurements" `. Cross Validation diff --git a/docs/Survival analysis with lifelines.rst b/docs/Survival analysis with lifelines.rst index 8ebed4dae..92eaf0065 100644 --- a/docs/Survival analysis with lifelines.rst +++ b/docs/Survival analysis with lifelines.rst @@ -319,7 +319,7 @@ probabilities of survival at those points: .. code:: python - ax = subplot(111) + ax = plt.subplot(111) t = np.linspace(0, 50, 51) kmf.fit(T[dem], event_observed=E[dem], timeline=t, label="Democratic Regimes") @@ -452,12 +452,17 @@ keywords to tinker with. Fitting to a Weibull model ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -Another very popular model for survival data is the Weibull model. In contrast the the Kaplan Meier estimator, this model is a *parametric model*, meaning it has a functional form with parameters that we are fitting the data to. (The Kaplan Meier estimator has no parameters to fit too). Mathematically, the survival function looks like: +Another very popular model for survival data is the Weibull model. In contrast the the Kaplan Meier estimator, this model is a *parametric model*, meaning it has a functional form with parameters that we are fitting the data to. (The Kaplan Meier estimator has no parameters to fit to). Mathematically, the survival function looks like: ..math:: S(t) = \exp\left(-(\lambda t)^\rho\right), \lambda >0, \rho > 0, - Apriori, we do not know what :math:`\lambda` and :math:`\rho` are, but we use the data on hand to estimate these parameters. In lifelines, this is implemented in the ``WeibullFitter``: +* A priori*, we do not know what :math:`\lambda` and :math:`\rho` are, but we use the data on hand to estimate these parameters. In fact, we actually model and estimate the hazard rate: + + + ..math:: S(t) = -(\lambda t)^\rho, \lambda >0, \rho > 0, + +In lifelines, estimation is available using the ``WeibullFitter`` class: .. code:: python @@ -468,9 +473,33 @@ Another very popular model for survival data is the Weibull model. In contrast t wf = WeibullFitter() wf.fit(T, E) + print(wf.lambda_, wf.rho_) wf.print_summary() + wf.plot() + + + +Other parametric models: Exponential +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Similarly, there are other parametric models in lifelines. Generally, which parametric model to choose is determined by either knowledge of the distribution of durations, or some sort of model goodness-of-fit. Below are three parametric models of the same data. + +.. code:: python + + from lifelines import WeibullFitter + from lifelines import ExponentialFitter + + T = data['duration'] + E = data['observed'] + + wf = WeibullFitter().fit(T, E, label='WeibullFitter') + exf = ExponentialFitter().fit(T, E, label='ExponentalFitter') + + ax = wf.plot() + ax = exf.plot(ax=ax) + Estimating hazard rates using Nelson-Aalen '''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''' diff --git a/docs/conf.py b/docs/conf.py index 52cf02d79..13885f8f8 100644 --- a/docs/conf.py +++ b/docs/conf.py @@ -55,9 +55,9 @@ # built documents. # # The short X.Y version. -version = '0.14.6' +version = '0.15.0' # The full version, including alpha/beta/rc tags. -release = '0.14.6' +release = '0.15.0' # The language for content autogenerated by Sphinx. Refer to documentation # for a list of supported languages. diff --git a/docs/images/invert_y_axis.png b/docs/images/invert_y_axis.png new file mode 100644 index 000000000..86ff80593 Binary files /dev/null and b/docs/images/invert_y_axis.png differ diff --git a/lifelines/__init__.py b/lifelines/__init__.py index 910da8868..206e72312 100644 --- a/lifelines/__init__.py +++ b/lifelines/__init__.py @@ -1,10 +1,10 @@ # -*- coding: utf-8 -*- from .estimation import KaplanMeierFitter, NelsonAalenFitter, \ AalenAdditiveFitter, BreslowFlemingHarringtonFitter, CoxPHFitter, \ - WeibullFitter, ExponentialFitter, CoxTimeVaryingFitter + WeibullFitter, ExponentialFitter, CoxTimeVaryingFitter, AalenJohansenFitter from .version import __version__ __all__ = ['KaplanMeierFitter', 'NelsonAalenFitter', 'AalenAdditiveFitter', 'BreslowFlemingHarringtonFitter', 'CoxPHFitter', 'WeibullFitter', - 'ExponentialFitter', 'CoxTimeVaryingFitter'] + 'ExponentialFitter', 'CoxTimeVaryingFitter', 'AalenJohansenFitter'] diff --git a/lifelines/compat.py b/lifelines/compat.py new file mode 100644 index 000000000..af8b13ae3 --- /dev/null +++ b/lifelines/compat.py @@ -0,0 +1,4 @@ +import sys + +PY2 = sys.version_info[0] == 2 +PY3 = sys.version_info[0] >= 3 diff --git a/lifelines/datasets/__init__.py b/lifelines/datasets/__init__.py index 87686de35..95826655f 100644 --- a/lifelines/datasets/__init__.py +++ b/lifelines/datasets/__init__.py @@ -270,7 +270,8 @@ def load_rossi(**kwargs): def load_regression_dataset(**kwargs): """ - Artificial regression dataset + Artificial regression dataset. Useful since there are no ties in this dataset. + Slightly edit in v0.15.0 to achieve this, however. Size: (200,5) Example: @@ -362,7 +363,7 @@ def load_dfcv(): 3 0 1.0 0 6.0 4 True """ from lifelines.datasets.dfcv_dataset import dfcv - return dfcv + return dfcv.copy() def load_lymphoma(**kwargs): diff --git a/lifelines/datasets/regression.csv b/lifelines/datasets/regression.csv index 01d39937a..2afc66848 100644 --- a/lifelines/datasets/regression.csv +++ b/lifelines/datasets/regression.csv @@ -1,201 +1,201 @@ var1,var2,var3,T,E -0.59517,1.143472,1.5710790000000001,14.785479,1 -0.209325,0.184677,0.35698,7.336734,1 -0.693919,0.071893,0.55796,5.271527,1 -0.443804,1.3646459999999998,0.374221,11.684168,1 -1.613324,0.125566,1.921325,7.637764,1 -0.065636,0.098375,0.237896,12.678268,1 -0.386294,1.6630919999999998,0.7903140000000001,6.601660000000001,1 -0.946688,1.345394,3.209113,11.369137,1 -0.11374000000000001,0.40986000000000006,0.064934,14.680468,1 -0.7777930000000001,0.33499,0.411055,10.585059,1 -0.04428,0.305158,0.17648,19.370936999999998,1 -1.03545,3.304733,0.997323,5.558555999999999,1 -0.22919499999999998,0.5813550000000001,0.48479399999999995,11.292129,1 -0.055970000000000006,2.6741349999999997,0.355279,9.919992,0 -1.236583,1.796598,0.179952,9.884988,1 -1.162835,0.46475900000000003,2.028854,6.265626999999999,1 -0.14943599999999999,2.949291,0.277801,13.812381,1 -0.399475,0.822413,0.673405,6.433643,1 -0.762121,0.050407,1.2851629999999998,6.979698,1 -1.239718,1.869215,0.020202,7.742774000000001,1 -0.019221000000000002,1.435543,0.255689,4.70447,1 -0.090253,0.211037,0.372809,11.236124,1 -0.20584899999999998,0.048722,0.00253,6.664666,1 -0.088185,1.319679,0.201675,10.718072,1 -4.629747,0.36352199999999996,1.08207,11.593159,1 -1.6028360000000001,1.217881,0.350837,8.463846,1 -0.014804,0.684737,0.493267,5.432543,1 -0.08402000000000001,1.432093,0.456541,10.277028,1 -2.260223,1.2389299999999999,5.541837999999999,7.217722,1 -0.6219680000000001,0.6844279999999999,0.135933,13.217322,1 -0.013219999999999999,3.280555,1.193551,8.029803,1 -0.070651,1.430517,0.0052049999999999996,9.807981,1 -0.20598000000000002,0.29064,0.096565,2.632263,1 -1.389882,0.14313299999999998,0.821257,6.160616,1 -0.104143,2.072924,0.449696,6.265626999999999,1 -0.42848100000000006,0.06573899999999999,3.00755,4.606461,0 -1.785151,1.572282,0.475059,11.124112,1 -0.22889299999999999,0.429025,0.60805,6.083608,1 -0.640837,0.311084,3.165658,9.219922,1 -1.450683,0.8470219999999999,2.5211770000000002,7.119712,0 -0.469459,0.318871,0.164498,13.056306,1 -0.36617,0.23499099999999998,0.678709,4.949495,1 -1.4325700000000001,2.668335,0.558046,4.858486,1 -2.696463,0.244077,1.3151110000000001,8.505851,1 -0.152624,0.37950100000000003,0.330164,7.035703999999999,1 -0.27739899999999995,0.871603,1.555185,10.837083999999999,1 -0.353633,0.294236,0.928573,9.919992,1 -0.6209560000000001,0.021884,3.2057599999999997,5.292529,1 -0.0007570000000000001,1.216615,0.8610690000000001,20.981098,1 -0.497674,2.744032,0.47358900000000004,5.810581,1 -1.213709,0.072756,0.09842000000000001,11.292129,1 -0.426255,2.550392,0.16762,8.19782,1 -0.008408,1.132205,1.234917,7.469747,1 -1.207833,0.13335,0.528231,10.669067,1 -0.036975,0.040631,0.2664,10.543054,1 -0.789439,0.669067,1.332697,6.2866290000000005,1 -1.482055,0.627205,0.738271,9.079908,1 -1.028671,0.21520999999999998,0.457692,14.183418,1 -0.521986,2.282683,0.31597600000000003,21.940194,1 -0.26262199999999997,0.345999,0.9210969999999999,8.883888,1 -0.360319,1.001364,0.237533,9.982998,1 -0.362587,0.110046,2.486691,9.555956,1 -1.793598,0.310001,0.26306599999999997,8.659866000000001,1 -0.419275,0.11430799999999999,1.124784,5.642564,1 -1.4702469999999999,0.289054,0.331833,10.9981,1 -0.27476,0.523508,2.139204,8.050805,1 -0.119805,0.7337739999999999,0.21205700000000002,11.250125,1 -0.369294,0.609847,0.89402,10.214021,0 -1.01825,2.119666,0.716002,12.335234,1 -0.607065,2.3501119999999998,0.031389999999999994,15.723572,1 -4.169830999999999,0.316285,0.16935,11.831183,0 -1.483383,2.242744,0.26543,7.364736,1 -0.32359699999999997,0.165159,0.97204,12.517252000000001,1 -1.82716,0.32779400000000003,0.9415389999999999,8.512851,1 -0.104104,0.9233020000000001,1.22007,12.79728,1 -0.392766,0.42279399999999995,3.4826080000000004,8.540854,1 -2.579629,0.109011,2.2800279999999997,3.5633559999999997,1 -0.775092,0.974519,2.2236990000000003,9.576958,1 -0.5075930000000001,0.917278,0.103131,9.646965,1 -0.13843699999999998,2.474084,1.6350049999999998,11.789178999999999,1 -1.120586,1.480593,0.6382439999999999,4.648465,0 -0.001842,0.6014520000000001,0.40551,14.162416,1 -0.9969790000000001,0.44859,0.782013,7.490749,1 -1.0672629999999999,0.304582,0.795276,10.739074,1 -1.123429,1.4093149999999999,0.090895,8.239824,1 -0.139158,3.203523,0.28734899999999997,7.630763000000001,1 -1.276529,1.039313,1.217827,7.819782000000001,1 -0.175824,1.371635,1.7854880000000002,12.419242,1 -0.24130100000000002,4.048806,0.423415,10.564055999999999,1 -1.8644439999999998,0.821839,0.426364,6.293629,1 -0.34029499999999996,0.727143,0.341437,12.405241,1 -5.130831,0.074513,0.8015260000000001,10.79508,1 -1.404635,0.039251,0.785162,17.09571,1 -0.07394400000000001,0.053314999999999994,0.18626199999999998,15.464545999999999,0 -1.271488,0.10678,0.291883,9.611961,1 -0.781452,1.229076,0.069747,14.407441,1 -0.3909,0.35690700000000003,0.23058,10.088009,1 -2.193825,0.6211840000000001,0.466925,5.817582,1 -2.942882,0.16383,1.040333,7.987799000000001,1 -0.705527,0.592699,0.923248,11.831183,1 -1.662925,2.1851700000000003,0.664273,11.873187,1 -0.407842,1.011611,0.485592,4.144414,1 -0.091321,0.281593,0.153947,8.288829,1 -3.5385089999999995,1.80715,1.336961,4.326433,1 -0.661027,1.171563,0.30091,14.246425,1 -0.106552,0.121843,0.257878,5.663566,1 -0.104327,1.513503,0.314581,7.9177919999999995,1 -0.811837,1.6833240000000003,0.061925,11.845185,1 -0.402495,0.43151999999999996,0.489576,11.075108,1 -1.322155,0.521161,1.859989,6.888689,1 -0.647954,3.243631,0.034075,8.344834,1 -0.851476,0.21736599999999998,0.29733000000000004,4.473447,1 -0.14999400000000002,3.027889,0.5427489999999999,9.247925,1 -0.381276,1.146927,0.22583000000000003,11.215122000000001,1 -0.019479,1.374707,1.5665950000000002,8.288829,1 -0.806793,0.60941,1.903648,10.074007,1 -0.9268280000000001,1.062158,0.048544,14.484448,1 -0.998282,0.385911,1.403305,11.026102999999999,1 -0.198755,1.668675,0.182337,6.251625,1 -1.668232,0.717113,0.39318000000000003,15.884588,1 -0.903388,0.34757,0.796215,11.341134,1 -3.094217,0.764497,3.063756,7.644764,1 -0.565765,0.8556440000000001,2.4122220000000003,8.365836999999999,1 -0.600544,0.019666,2.356107,11.90119,1 -0.453201,0.24214899999999998,0.7611140000000001,9.912991,1 -0.441605,0.271366,0.9775219999999999,8.323832000000001,1 -0.41135,0.029483999999999996,1.8434580000000003,11.971197,1 -0.5351199999999999,0.045629,0.16006700000000001,11.124112,1 -0.47211899999999996,2.239749,0.148828,6.153615,1 -0.485754,1.464013,0.380293,8.911891,1 -5.353937,0.855298,0.001006,4.879487999999999,1 -0.000974,0.35496500000000003,0.698741,20.666067,1 -0.36145700000000003,2.792862,1.503787,11.082108,1 -1.2026729999999999,1.825852,0.391339,8.008801,1 -0.8530770000000001,0.22137600000000002,1.6355389999999999,9.779978,1 -1.646959,3.3371690000000003,1.262672,5.663566,1 -0.050491,1.0423879999999999,0.040406,10.50105,1 -0.693033,0.067717,1.6319299999999999,9.968997,1 -3.8753610000000003,1.206579,0.6567850000000001,4.837484,1 -0.401754,1.526443,0.449621,7.952795,1 -2.112141,0.994604,0.12592799999999998,4.445444999999999,1 -2.358111,1.411174,4.747023,6.930692999999999,1 -0.406167,0.7479359999999999,1.240233,11.971197,1 -0.9833120000000001,1.330699,0.931057,12.769277,1 -2.8028560000000002,0.141768,0.96447,6.153615,1 -0.22598000000000001,0.156969,0.771678,9.093909,1 -1.0202120000000001,1.338747,1.485407,9.70297,1 -0.737183,0.21196700000000002,1.479703,10.417042,1 -0.694596,0.13306500000000002,1.612199,13.182317999999999,1 -1.614919,1.628414,3.3395629999999996,2.576258,1 -1.263567,0.041625999999999996,0.13448800000000002,7.091709,1 -1.81759,0.89371,0.256831,5.740574,1 -0.221442,1.00047,0.13556500000000002,12.923292,1 -0.388571,2.331312,0.048117,12.874286999999999,1 -1.365461,0.44473,0.26388,4.725473,1 -0.017446,1.50251,1.859648,9.835984,1 -0.803217,0.259678,0.305695,6.062606,1 -1.153738,2.357565,0.264925,8.092808999999999,1 -0.546425,0.516525,0.05980599999999999,8.043804,1 -0.061367,2.453071,0.234816,8.715872000000001,1 -0.42113599999999995,0.295455,1.117664,13.287329000000001,1 -1.5747790000000002,0.7411220000000001,0.533676,10.515052,1 -1.3943510000000001,0.877793,1.637652,6.426643,1 -0.923441,1.1076139999999999,0.78291,3.640364,1 -0.231346,0.620135,1.8213549999999998,4.746475,1 -0.7357060000000001,3.4050540000000002,3.457625,11.677168,1 -1.748839,1.132628,0.812584,11.558156,1 -0.280291,1.664837,0.051460000000000006,8.757876,1 -0.150857,2.545696,1.456119,12.825283,1 -1.5516809999999999,0.125114,0.148355,15.618561999999999,0 -0.746388,0.267458,0.42003599999999996,11.943194,1 -0.068177,0.19378800000000002,2.693533,7.952795,1 -0.305141,0.858988,3.883753,12.356236,1 -3.614956,0.659784,1.013164,3.5633559999999997,1 -1.9810330000000003,0.7379720000000001,0.272071,8.561856,1 -0.19708,1.164958,0.8204870000000001,4.207421,1 -0.027854000000000004,0.6533260000000001,0.08022,21.030103,1 -1.8066659999999999,3.535072,2.176759,5.810581,1 -0.16528800000000002,1.6233950000000001,1.9945509999999997,8.79988,1 -1.617063,0.49479799999999996,0.131597,7.798780000000001,0 -1.298794,1.778036,0.453693,12.657266,1 -0.707968,1.081388,0.477484,14.30243,1 -0.246455,0.11361800000000001,0.407209,13.329332999999998,1 -0.282453,0.731784,0.002421,6.1256129999999995,1 -0.133855,0.096552,0.152854,4.935494,0 -0.025306,0.07387,0.163927,6.314630999999999,1 -1.017839,0.737884,3.126409,6.573657000000001,0 -0.847491,1.142187,1.342932,8.610861,1 -0.9420930000000001,0.161735,1.388318,9.997,1 -0.38300100000000004,0.006451,0.901114,7.749775,1 -0.011165999999999999,0.220669,0.6917909999999999,7.3437339999999995,1 -1.5435020000000002,1.472249,0.830817,6.986699000000001,1 -0.168033,3.052163,0.035085000000000005,18.131813,1 -2.1599459999999997,0.001644,1.443158,4.382438,1 -0.249142,0.628992,2.3185130000000003,8.743874,1 -0.137399,0.107748,0.354812,11.446145,1 -0.6373409999999999,2.847188,1.4591370000000001,7.623761999999999,1 -1.109732,0.405561,0.018856,10.634063000000001,1 -0.031865,1.753759,0.25204,8.519852,1 -1.631269,1.5886209999999998,3.7098989999999996,4.480448,1 +0.59517,1.143472,1.571079,14.7856515748,1 +0.209325,0.184677,0.35698,7.33584583652,1 +0.693919,0.071893,0.55796,5.26979701571,1 +0.443804,1.364646,0.374221,11.6840920212,1 +1.613324,0.125566,1.921325,7.63949212526,1 +0.065636,0.098375,0.237896,12.6784581817,1 +0.386294,1.663092,0.790314,6.60166572026,1 +0.946688,1.345394,3.209113,11.3670916491,1 +0.11374,0.40986,0.064934,14.6805866317,1 +0.777793,0.33499,0.411055,10.5854086595,1 +0.04428,0.305158,0.17648,19.3721173864,1 +1.03545,3.304733,0.997323,5.55904466985,1 +0.229195,0.581355,0.484794,11.2924891948,1 +0.05597,2.674135,0.355279,9.92047433529,0 +1.236583,1.796598,0.179952,9.88652411916,1 +1.162835,0.464759,2.028854,6.26643301257,1 +0.149436,2.949291,0.277801,13.8127296,1 +0.399475,0.822413,0.673405,6.43309776107,1 +0.762121,0.050407,1.285163,6.97979741031,1 +1.239718,1.869215,0.020202,7.74300832502,1 +0.019221,1.435543,0.255689,4.70530329608,1 +0.090253,0.211037,0.372809,11.2335841641,1 +0.205849,0.048722,0.00253,6.66273101972,1 +0.088185,1.319679,0.201675,10.7174137318,1 +4.629747,0.363522,1.08207,11.5938047533,1 +1.602836,1.217881,0.350837,8.46420655566,1 +0.014804,0.684737,0.493267,5.43255855841,1 +0.08402,1.432093,0.456541,10.276593667,1 +2.260223,1.23893,5.541838,7.21736226987,1 +0.621968,0.684428,0.135933,13.2176584654,1 +0.01322,3.280555,1.193551,8.0299335416,1 +0.070651,1.430517,0.005205,9.80826804874,1 +0.20598,0.29064,0.096565,2.63226375911,1 +1.389882,0.143133,0.821257,6.16269524116,1 +0.104143,2.072924,0.449696,6.26182607469,1 +0.428481,0.065739,3.00755,4.6048190364,0 +1.785151,1.572282,0.475059,11.1239077928,1 +0.228893,0.429025,0.60805,6.08263253911,1 +0.640837,0.311084,3.165658,9.22065563383,1 +1.450683,0.847022,2.521177,7.12012891282,0 +0.469459,0.318871,0.164498,13.0551806715,1 +0.36617,0.234991,0.678709,4.95133583707,1 +1.43257,2.668335,0.558046,4.85889534209,1 +2.696463,0.244077,1.315111,8.50543244101,1 +0.152624,0.379501,0.330164,7.03794878748,1 +0.277399,0.871603,1.555185,10.8353931443,1 +0.353633,0.294236,0.928573,9.92109564592,1 +0.620956,0.021884,3.20576,5.29165212342,1 +0.000757,1.216615,0.861069,20.9813809356,1 +0.497674,2.744032,0.473589,5.80982311606,1 +1.213709,0.072756,0.09842,11.2908076197,1 +0.426255,2.550392,0.16762,8.19779319557,1 +0.008408,1.132205,1.234917,7.47213923892,1 +1.207833,0.13335,0.528231,10.670073183,1 +0.036975,0.040631,0.2664,10.543203936,1 +0.789439,0.669067,1.332697,6.28852320244,1 +1.482055,0.627205,0.738271,9.080334859,1 +1.028671,0.21521,0.457692,14.1822947115,1 +0.521986,2.282683,0.315976,21.9399783806,1 +0.262622,0.345999,0.921097,8.88229093093,1 +0.360319,1.001364,0.237533,9.98342539597,1 +0.362587,0.110046,2.486691,9.55638379129,1 +1.793598,0.310001,0.263066,8.65928769028,1 +0.419275,0.114308,1.124784,5.6434659532,1 +1.470247,0.289054,0.331833,10.9977552573,1 +0.27476,0.523508,2.139204,8.05076874352,1 +0.119805,0.733774,0.212057,11.2503684762,1 +0.369294,0.609847,0.89402,10.2129272163,0 +1.01825,2.119666,0.716002,12.3366190173,1 +0.607065,2.350112,0.03139,15.7231628796,1 +4.169831,0.316285,0.16935,11.8302258308,0 +1.483383,2.242744,0.26543,7.36252640465,1 +0.323597,0.165159,0.97204,12.5169357575,1 +1.82716,0.327794,0.941539,8.51322372304,1 +0.104104,0.923302,1.22007,12.7973318626,1 +0.392766,0.422794,3.482608,8.54052212332,1 +2.579629,0.109011,2.280028,3.56358063047,1 +0.775092,0.974519,2.223699,9.57756677618,1 +0.507593,0.917278,0.103131,9.64881133972,1 +0.138437,2.474084,1.635005,11.789052178,1 +1.120586,1.480593,0.638244,4.6478831396,0 +0.001842,0.601452,0.40551,14.1629885287,1 +0.996979,0.44859,0.782013,7.49170332816,1 +1.067263,0.304582,0.795276,10.7391507856,1 +1.123429,1.409315,0.090895,8.23925481073,1 +0.139158,3.203523,0.287349,7.63092764446,1 +1.276529,1.039313,1.217827,7.8203073558,1 +0.175824,1.371635,1.785488,12.4208671834,1 +0.241301,4.048806,0.423415,10.5633022553,1 +1.864444,0.821839,0.426364,6.29314423772,1 +0.340295,0.727143,0.341437,12.4052831947,1 +5.130831,0.074513,0.801526,10.7964133954,1 +1.404635,0.039251,0.785162,17.0955898105,1 +0.073944,0.053315,0.186262,15.4629258426,0 +1.271488,0.10678,0.291883,9.61273262267,1 +0.781452,1.229076,0.069747,14.4093766173,1 +0.3909,0.356907,0.23058,10.0869453587,1 +2.193825,0.621184,0.466925,5.81874836313,1 +2.942882,0.16383,1.040333,7.98833687105,1 +0.705527,0.592699,0.923248,11.8312524366,1 +1.662925,2.18517,0.664273,11.8731640313,1 +0.407842,1.011611,0.485592,4.14387121547,1 +0.091321,0.281593,0.153947,8.2907236782,1 +3.538509,1.80715,1.336961,4.32535868496,1 +0.661027,1.171563,0.30091,14.2454199869,1 +0.106552,0.121843,0.257878,5.66506177628,1 +0.104327,1.513503,0.314581,7.91846164924,1 +0.811837,1.683324,0.061925,11.8443153255,1 +0.402495,0.43152,0.489576,11.0761501869,1 +1.322155,0.521161,1.859989,6.88916431026,1 +0.647954,3.243631,0.034075,8.34491489657,1 +0.851476,0.217366,0.29733,4.47399978584,1 +0.149994,3.027889,0.542749,9.24722217844,1 +0.381276,1.146927,0.22583,11.2146024447,1 +0.019479,1.374707,1.566595,8.28623107033,1 +0.806793,0.60941,1.903648,10.0738349202,1 +0.926828,1.062158,0.048544,14.4854236558,1 +0.998282,0.385911,1.403305,11.0245383745,1 +0.198755,1.668675,0.182337,6.25197764519,1 +1.668232,0.717113,0.39318,15.8836989034,1 +0.903388,0.34757,0.796215,11.34167126,1 +3.094217,0.764497,3.063756,7.6440979558,1 +0.565765,0.855644,2.412222,8.36665621446,1 +0.600544,0.019666,2.356107,11.9012822052,1 +0.453201,0.242149,0.761114,9.91242335863,1 +0.441605,0.271366,0.977522,8.32289768512,1 +0.41135,0.029484,1.843458,11.9717347284,1 +0.53512,0.045629,0.160067,11.1247094819,1 +0.472119,2.239749,0.148828,6.15413791004,1 +0.485754,1.464013,0.380293,8.91144648631,1 +5.353937,0.855298,0.001006,4.88104290364,1 +0.000974,0.354965,0.698741,20.6652251814,1 +0.361457,2.792862,1.503787,11.0822662185,1 +1.202673,1.825852,0.391339,8.00771353063,1 +0.853077,0.221376,1.635539,9.78044985255,1 +1.646959,3.337169,1.262672,5.66532639351,1 +0.050491,1.042388,0.040406,10.5017493224,1 +0.693033,0.067717,1.63193,9.96821890209,1 +3.875361,1.206579,0.656785,4.83539084237,1 +0.401754,1.526443,0.449621,7.95252897093,1 +2.112141,0.994604,0.125928,4.44530592881,1 +2.358111,1.411174,4.747023,6.92942222099,1 +0.406167,0.747936,1.240233,11.9701980352,1 +0.983312,1.330699,0.931057,12.7687578904,1 +2.802856,0.141768,0.96447,6.1536973616,1 +0.22598,0.156969,0.771678,9.09300617059,1 +1.020212,1.338747,1.485407,9.70355645693,1 +0.737183,0.211967,1.479703,10.4173960087,1 +0.694596,0.133065,1.612199,13.1829276562,1 +1.614919,1.628414,3.339563,2.57553764605,1 +1.263567,0.041626,0.134488,7.09141708798,1 +1.81759,0.89371,0.256831,5.73937524269,1 +0.221442,1.00047,0.135565,12.9241740121,1 +0.388571,2.331312,0.048117,12.8735020088,1 +1.365461,0.44473,0.26388,4.72642629428,1 +0.017446,1.50251,1.859648,9.83594179776,1 +0.803217,0.259678,0.305695,6.06237621553,1 +1.153738,2.357565,0.264925,8.09338513449,1 +0.546425,0.516525,0.059806,8.04406734742,1 +0.061367,2.453071,0.234816,8.71594708122,1 +0.421136,0.295455,1.117664,13.2869538904,1 +1.574779,0.741122,0.533676,10.5135003978,1 +1.394351,0.877793,1.637652,6.42775744203,1 +0.923441,1.107614,0.78291,3.63878040929,1 +0.231346,0.620135,1.821355,4.74775344975,1 +0.735706,3.405054,3.457625,11.6770199376,1 +1.748839,1.132628,0.812584,11.5594329742,1 +0.280291,1.664837,0.05146,8.75824998324,1 +0.150857,2.545696,1.456119,12.8268461929,1 +1.551681,0.125114,0.148355,15.6204061094,0 +0.746388,0.267458,0.420036,11.9422735844,1 +0.068177,0.193788,2.693533,7.95449616061,1 +0.305141,0.858988,3.883753,12.3573764607,1 +3.614956,0.659784,1.013164,3.56383007199,1 +1.981033,0.737972,0.272071,8.5619748224,1 +0.19708,1.164958,0.820487,4.20656850475,1 +0.027854,0.653326,0.08022,21.0318230188,1 +1.806666,3.535072,2.176759,5.81052910695,1 +0.165288,1.623395,1.994551,8.79849009986,1 +1.617063,0.494798,0.131597,7.79923023169,0 +1.298794,1.778036,0.453693,12.6551650347,1 +0.707968,1.081388,0.477484,14.3014540711,1 +0.246455,0.113618,0.407209,13.3297030877,1 +0.282453,0.731784,0.002421,6.12506421389,1 +0.133855,0.096552,0.152854,4.93564074908,0 +0.025306,0.07387,0.163927,6.3156952171,1 +1.017839,0.737884,3.126409,6.57321280053,0 +0.847491,1.142187,1.342932,8.61060494656,1 +0.942093,0.161735,1.388318,9.9956084953,1 +0.383001,0.006451,0.901114,7.74825868839,1 +0.011166,0.220669,0.691791,7.34226786253,1 +1.543502,1.472249,0.830817,6.98633720723,1 +0.168033,3.052163,0.035085,18.1313105791,1 +2.159946,0.001644,1.443158,4.38165504789,1 +0.249142,0.628992,2.318513,8.74257448673,1 +0.137399,0.107748,0.354812,11.4454572735,1 +0.637341,2.847188,1.459137,7.62462675408,1 +1.109732,0.405561,0.018856,10.6346199544,1 +0.031865,1.753759,0.25204,8.51971771151,1 +1.631269,1.588621,3.709899,4.47895208711,1 diff --git a/lifelines/estimation.py b/lifelines/estimation.py index 226309ca0..553331cb1 100644 --- a/lifelines/estimation.py +++ b/lifelines/estimation.py @@ -8,3 +8,4 @@ from lifelines.fitters.coxph_fitter import CoxPHFitter from lifelines.fitters.cox_time_varying_fitter import CoxTimeVaryingFitter from lifelines.fitters.aalen_additive_fitter import AalenAdditiveFitter +from lifelines.fitters.aalen_johansen_fitter import AalenJohansenFitter diff --git a/lifelines/fitters/__init__.py b/lifelines/fitters/__init__.py index f3c830a7a..f23859bf9 100644 --- a/lifelines/fitters/__init__.py +++ b/lifelines/fitters/__init__.py @@ -1,12 +1,26 @@ # -*- coding: utf-8 -*- from __future__ import print_function import collections +from functools import wraps +import sys import numpy as np import pandas as pd from lifelines.plotting import plot_estimate -from lifelines.utils import qth_survival_times +from lifelines.utils import qth_survival_times, _to_array +from lifelines.compat import PY2, PY3 + +def must_call_fit_first(func): + @wraps(func) + def error_wrapper(*args, **kwargs): + self = args[0] + try: + estimate = self._estimate_name + except AttributeError: + raise RuntimeError("Must call `fit` first!") + return func(*args, **kwargs) + return error_wrapper class BaseFitter(object): @@ -25,93 +39,90 @@ def __repr__(self): s = """""" % classname return s - class UnivariateFitter(BaseFitter): - def _plot_estimate(self, *args): - return plot_estimate(self, *args) - - def _subtract(self, estimate): - class_name = self.__class__.__name__ - doc_string = """ - Subtract the %s of two %s objects. + @must_call_fit_first + def _update_docstrings(self): + # Update their docstrings + if PY2: + self.__class__.subtract.__func__.__doc__ = self.subtract.__doc__.format(self._estimate_name, self.__class__.__name__) + self.__class__.divide.__func__.__doc__ = self.divide.__doc__.format(self._estimate_name, self.__class__.__name__) + self.__class__.predict.__func__.__doc__ = self.predict.__doc__.format(self.__class__.__name__) + self.__class__.plot.__func__.__doc__ = plot_estimate.__doc__.format(self.__class__.__name__, self._estimate_name) + elif PY3: + self.__class__.subtract.__doc__ = self.subtract.__doc__.format(self._estimate_name, self.__class__.__name__) + self.__class__.divide.__doc__ = self.divide.__doc__.format(self._estimate_name, self.__class__.__name__) + self.__class__.predict.__doc__ = self.predict.__doc__.format(self.__class__.__name__) + self.__class__.plot.__doc__ = plot_estimate.__doc__.format(self.__class__.__name__, self._estimate_name) + + @must_call_fit_first + def plot(self, *args, **kwargs): + return plot_estimate(self, *args, **kwargs) + + @must_call_fit_first + def subtract(self, other): + """ + Subtract the {0} of two {1} objects. Parameters: - other: an %s fitted instance. - - """ % (estimate, class_name, class_name) - - def subtract(other): - self_estimate = getattr(self, estimate) - other_estimate = getattr(other, estimate) - new_index = np.concatenate((other_estimate.index, self_estimate.index)) - new_index = np.unique(new_index) - return pd.DataFrame( - self_estimate.reindex(new_index, method='ffill').values - - other_estimate.reindex(new_index, method='ffill').values, - index=new_index, - columns=['diff'] - ) - subtract.__doc__ = doc_string - return subtract - - def _divide(self, estimate): - class_name = self.__class__.__name__ - doc_string = """ - Divide the %s of two %s objects. + other: an {1} fitted instance. + """ + self_estimate = getattr(self, self._estimate_name) + other_estimate = getattr(other, other._estimate_name) + new_index = np.concatenate((other_estimate.index, self_estimate.index)) + new_index = np.unique(new_index) + return pd.DataFrame( + self_estimate.reindex(new_index, method='ffill').values - + other_estimate.reindex(new_index, method='ffill').values, + index=new_index, + columns=['diff'] + ) + + @must_call_fit_first + def divide(self, other): + """ + Divide the {0} of two {1} objects. - Parameters: - other: an %s fitted instance. - - """ % (estimate, class_name, class_name) - - def divide(other): - self_estimate = getattr(self, estimate) - other_estimate = getattr(other, estimate) - new_index = np.concatenate((other_estimate.index, self_estimate.index)) - new_index = np.unique(new_index) - return pd.DataFrame( - self_estimate.reindex(new_index, method='ffill').values / - other_estimate.reindex(new_index, method='ffill').values, - index=new_index, - columns=['ratio'] - ) - divide.__doc__ = doc_string - return divide - - def _predict(self, estimate_name_or_function, label): - class_name = self.__class__.__name__ - doc_string = """ - Predict the %s at certain point in time. Uses a linear interpolation if - points in time are not in the index. - - Parameters: - time: a scalar or an array of times to predict the value of %s at. - - Returns: - predictions: a scalar if time is a scalar, a numpy array if time in an array. - """ % (class_name, class_name) - - def predict(times): - def _to_array(x): - if not isinstance(x, collections.Iterable): - return np.array([x]) - return np.asarray(x) - - if callable(estimate_name_or_function): - return pd.DataFrame(estimate_name_or_function(_to_array(times)), index=_to_array(times)).loc[times].squeeze() - else: - estimate = getattr(self, estimate_name_or_function) - # non-linear interpolations can push the survival curves above 1 and below 0. - return estimate.reindex(estimate.index.union(_to_array(times))).interpolate("index").loc[times].squeeze() - - predict.__doc__ = doc_string - return predict + Parameters: + other: an {1} fitted instance. + + """ + self_estimate = getattr(self, self._estimate_name) + other_estimate = getattr(other, other._estimate_name) + new_index = np.concatenate((other_estimate.index, self_estimate.index)) + new_index = np.unique(new_index) + return pd.DataFrame( + self_estimate.reindex(new_index, method='ffill').values / + other_estimate.reindex(new_index, method='ffill').values, + index=new_index, + columns=['ratio'] + ) + + @must_call_fit_first + def predict(self, times): + """ + Predict the {0} at certain point in time. Uses a linear interpolation if + points in time are not in the index. + + Parameters: + time: a scalar or an array of times to predict the value of {0} at. + + Returns: + predictions: a scalar if time is a scalar, a numpy array if time in an array. + """ + if callable(self._estimation_method): + return pd.DataFrame(self._estimation_method(_to_array(times)), index=_to_array(times)).loc[times].squeeze() + else: + estimate = getattr(self, self._estimation_method) + # non-linear interpolations can push the survival curves above 1 and below 0. + return estimate.reindex(estimate.index.union(_to_array(times))).interpolate("index").loc[times].squeeze() @property + @must_call_fit_first def conditional_time_to_event_(self): return self._conditional_time_to_event_() + @must_call_fit_first def _conditional_time_to_event_(self): """ Return a DataFrame, with index equal to survival_function_, that estimates the median diff --git a/lifelines/fitters/aalen_additive_fitter.py b/lifelines/fitters/aalen_additive_fitter.py index 5f67ba05a..fbda1437f 100644 --- a/lifelines/fitters/aalen_additive_fitter.py +++ b/lifelines/fitters/aalen_additive_fitter.py @@ -1,5 +1,6 @@ # -*- coding: utf-8 -*- from __future__ import print_function +import warnings import numpy as np import pandas as pd @@ -9,7 +10,7 @@ from lifelines.fitters import BaseFitter from lifelines.utils import _get_index, inv_normal_cdf, epanechnikov_kernel, \ ridge_regression as lr, qth_survival_times, pass_for_numeric_dtypes_or_raise,\ - concordance_index, check_nans + concordance_index, check_nans_or_infs, ConvergenceWarning from lifelines.utils.progress_bar import progress_bar from lifelines.plotting import fill_between_steps @@ -185,7 +186,7 @@ def _fit_static(self, dataframe, duration_col, event_col=None, try: v, V = lr(df.values, relevant_individuals, c1=self.coef_penalizer, c2=self.smoothing_penalizer, offset=previous_hazard) except LinAlgError: - print("Linear regression error. Try increasing the penalizer term.") + warnings.warn("Linear regression error. Try increasing the penalizer term.", ConvergenceWarning) hazards_.loc[time, id_] = v.T variance_.loc[time, id_] = V[:, relevant_individuals][:, 0] ** 2 @@ -278,7 +279,7 @@ def _fit_varying(self, dataframe, duration_col="T", event_col="E", try: v, V = lr(wp[time].values, relevant_individuals, c1=self.coef_penalizer, c2=self.smoothing_penalizer, offset=previous_hazard) except LinAlgError: - print("Linear regression error. Try increasing the penalizer term.") + warnings.warn("Linear regression error. Try increasing the penalizer term.", ConvergenceWarning) hazards_.loc[id, time] = v.T variance_.loc[id, time] = V[:, relevant_individuals][:, 0] ** 2 @@ -314,8 +315,8 @@ def _fit_varying(self, dataframe, duration_col="T", event_col="E", def _check_values(self, df, T, E): pass_for_numeric_dtypes_or_raise(df) - check_nans(T) - check_nans(E) + check_nans_or_infs(T) + check_nans_or_infs(E) def smoothed_hazards_(self, bandwidth=1): """ diff --git a/lifelines/fitters/aalen_johansen_fitter.py b/lifelines/fitters/aalen_johansen_fitter.py new file mode 100644 index 000000000..6f6075f0d --- /dev/null +++ b/lifelines/fitters/aalen_johansen_fitter.py @@ -0,0 +1,175 @@ +from __future__ import print_function +from __future__ import division +import numpy as np +import pandas as pd +import warnings + +from lifelines.fitters import UnivariateFitter +from lifelines.utils import _preprocess_inputs, inv_normal_cdf +from lifelines.fitters.kaplan_meier_fitter import KaplanMeierFitter + +class AalenJohansenFitter(UnivariateFitter): + """Class for fitting the Aalen-Johansen estimate for the cumulative incidence function in a competing risks framework. + Treating competing risks as censoring can result in over-estimated cumulative density functions. Using the Kaplan + Meier estimator with competing risks as censored is akin to estimating the cumulative density if all competing risks + had been prevented. If you are interested in learning more, I (Paul Zivich) recommend the following open-access + paper; Edwards JK, Hester LL, Gokhale M, Lesko CR. Methodologic Issues When Estimating Risks in + Pharmacoepidemiology. Curr Epidemiol Rep. 2016;3(4):285-296. + + AalenJohansenFitter(alpha=0.95, jitter_level=0.00001, seed=None) + + Aalen-Johansen cannot deal with tied times. We can get around this by randomy jittering the event times + slightly. This will be done automatically and generates a warning. + """ + def __init__(self, jitter_level=0.0001, seed=None, alpha=0.95): + UnivariateFitter.__init__(self, alpha=alpha) + self._jitter_level = jitter_level + self._seed = seed # Seed is for the jittering process + + def fit(self, durations, event_observed, event_of_interest, timeline=None, entry=None, label='AJ_estimate', + alpha=None, ci_labels=None, weights=None): + """ + Parameters: + durations: an array or pd.Series of length n -- duration of subject was observed for + event_observed: an array, or pd.Series, of length n. Integer indicator of distinct events. Must be + only positive integers, where 0 indicates censoring. + event_of_interest: integer -- indicator for event of interest. All other integers are considered competing events + Ex) event_observed contains 0, 1, 2 where 0:censored, 1:lung cancer, and 2:death. If event_of_interest=1, then death (2) + is considered a competing event. The returned cumulative incidence function corresponds to risk of lung cancer + timeline: return the best estimate at the values in timelines (postively increasing) + entry: an array, or pd.Series, of length n -- relative time when a subject entered the study. This is + useful for left-truncated (not left-censored) observations. If None, all members of the population + were born at time 0. + label: a string to name the column of the estimate. + alpha: the alpha value in the confidence intervals. Overrides the initializing + alpha for this call to fit only. + ci_labels: add custom column names to the generated confidence intervals + as a length-2 list: [, ]. Default: