Survival Analysis in R

Quick Look: Survival Analysis

Survival analysis is a type of statistical analysis that studies how long something will persist, and it often considers how different covariates impact the survival rate.

Common applications of survival analysis are in clinical and epidemiological studies, where the researcher is studying the time someone or something remains in the study. This could be time until tumor recurrence, or time until death once entering the clinical study.

Survival analysis doesn’t have to be medical in nature. For example, you could use survival analysis to study the time it takes for a machine to stop working.

Censoring in Survival Analysis

We may not observe the occurance of the event for all subjects in the data. This is called censoring in survival analysis. There are various types of censoring, but the most common is right censoring. This occurs when the study reaches the end, and we do not have a direct observation of the event for one or more subjects. An example is a study measuring patient death, when the patient is still alive after the study ends.

We will look at a small data set of patients on a treatment drug to see this analysis in action. The data exhibits right censoring, as can be seen in the status column. A 1 is an indiciator that something happened with the data - thus, a 0 means the event did not occur and the data is censored. We observe 12 instances of censored data, and 30 instances of non-censored data, as shown using the table function below.

data <- read.csv("data/Data4.csv")
colnames(data) <- c("Time","Status","Treatment")
table(data$Status)
##
##  0  1
## 12 30

The Survival Function

There are two functions that we can estimate. The first, and what we look at in this post, is the Survival function. Not surprisingly,this function provides the probability of surviving and not experiencing the event.

The other is the Hazard function. This looks at the potential the event will occur if the subject has still survived up to a certain time.

We will use the OIsurv R library, which contains the survival package. Functions in the survival packages apply methods to Surv objects, which are created by the Surv() function.

Modeling in R

First, we will create a Surv() object

Next we fit it using the survfit function, which you’ll see looks a lot like a regression function.

Finally, we plot the model using a Kaplan-Meier estimate. We see from this curve the differences in likelihood of survival over time.

attach(data)
library(OIsurv) #includes survival package
recsurv <- Surv(Time, Status)
fit <- survfit(recsurv~Treatment, data = data, conf.int = .95, conf.type = "log")
plot(fit, main = "Survival function (Kaplan-Meier estimate)", xlab="Weeks", ylab="Survival Probability", col=c("red","blue"), lwd=3)
legend(x="topright", col=c("red","blue"), lwd=3,
       legend=c("Control","Treatment"))

This shows that those in the treatment group were more likely to surive during the time period studied, as the curve is higher than the control group.

Are the curves significantly different?

We perform a Logrank test using survdiff from the R survival package to test if the Control and Treatment groups are significantly different from one another.

sigtest <- survdiff(Surv(Time, Status)~Treatment, data = data)
sigtest
## Call:
## survdiff(formula = Surv(Time, Status) ~ Treatment, data = data)
##
##                      N Observed Expected (O-E)^2/E (O-E)^2/V
## Treatment=Control   21       21     10.7      9.77      16.8
## Treatment=Drug 6-MP 21        9     19.3      5.46      16.8
##
##  Chisq= 16.8  on 1 degrees of freedom, p= 4.17e-05

We see from these results that the curves are significantly different. Our p-value is 0.0000417, so we can reject the null hypothesis that the curves are not different at the highest confidence level.

Go Top