Kaplan-Meier survival risk per individual instance

1798

收藏 2014-04-30

Given n-patient records with time and status variables (among others), I would like to obtain their survival risk in the time period they're within ie 2, 4, 6, 8, 10 years.

I have a division of 24 - 47 months (2 years), 48 - 83 months (4 years), 84 - 107 months (6 years), 108 - 119 months (8 years) and 120 - "up to what's available" months (10 years).

In an individual perspective, a patient with survival months of 30 months will be included in the two-year period and along with the other predictive variables I want to know this patient's survival risk within two years.

My method

I'm retrieving survival risk percentages for my data using R

km <- survfit(Surv(time, status)~1, data=mydata)survest <- stepfun(km$time, c(1, km$surv))

The time variable is the survival months and the status has values 1 and 0 for alive and dead respectively.

The code outputs something like this (taken from here):

> survest(0:100) [1] 1.0000000 0.9854015 0.9781022 0.9708029 0.9635036 0.9635036 0.9635036 [8] 0.9416058 0.9124088 0.9124088 0.8978102 0.8905109 0.8759124 0.8613139 [15] 0.8613139 0.8467153 0.8394161 0.8394161 0.8175182 0.8029197 0.7883212 [22] 0.7737226 0.7664234 0.7664234 0.7518248 0.7299270 0.7299270 0.7225540 [29] 0.7225540 0.7151810 0.7004350 0.6856890 0.6856890 0.6783160 0.6783160

My question is:

Are these the actual survival estimates for my 300,000 individual records wherein I need to use survest(0:300000)? I tried survest(0:1000) but the result already converged to some value and this does not answer my problem.
Can we do it using SPSS?

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

全部回复

ReneeBK

2014-4-30 03:01:56

I don't think it is possible to get KM-estimates for individual patients. The KM-estimator gives the observed probability of survival at a certain timepoint on a population level. The observed survival probability for an individual, however, is either 0 (death) or 1 (alive), nothing in between.

Instead of observed survival probabilities you will have to use some sort of model (e.g. Cox PH, accelerated failure time model, neural network etc.) to get predicted survival probabilities. These probabilities inform you about the risk of an individual with that particular variable combination to be alive at a particular timepoint.

UPDATE: with example code based on code the OP provided here

library(pec) ; library(rms)

# Simulate data
set.seed(1)
examp.data <- SimSurv(3000)

# fit a Cox model with predictors X1+X2
coxmodel <- cph(Surv(time,status)~X1+X2, data=examp.data, surv=TRUE)

# predicted survival probabilities can be extracted at selected time-points:
ttt <- quantile(examp.data$time)
ttt
#       0%       25%       50%       75%       100%
#6.959458e-03 9.505409e+00 3.077284e+01 7.384565e+01 7.100556e+02

# Get predicted survival probabilities at selected time-points:
preds <- predictSurvProb(coxmodel, newdata=examp.data, times=ttt)

# Store in original data
examp.data$predict.surv.prob.Q1 <- preds[,1] # pred. surv. prob. at  0.006959458
examp.data$predict.surv.prob.Q2 <- preds[,2] # pred. surv. prob. at  9.505409
examp.data$predict.surv.prob.Q3 <- preds[,3] # pred. surv. prob. at  30.77284
examp.data$predict.surv.prob.Q4 <- preds[,4] # pred. surv. prob. at  73.84565
examp.data$predict.surv.prob.Q5 <- preds[,5] # pred. surv. prob. at  710.0556
Now you have predictions of the survival probabilities at those 5 timepoints for each individual in your data. Of course, you do need to evaluate the predictive performance of your model in terms of discrimination (e.g. with the function cindex in pec-package) and calibration (with calibration plot, see rms-package).

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

扫码加我 拉你入群

扫码加我 拉你入群

分享

扫码加好友，拉您进群

扫码加我拉你入群

扫码加我拉你入群