Jump to content
  • 0

Kaplan Meier Estimator

Vishwadeep Khatri

Message added by Mayank Gupta

Kaplan Meier Estimator or Product Limit Estimator is a non-parametric statistic that helps in estimating the survival function from lifetime data. Most common usage of this estimator is in medical research, where it is often used to measure the fraction of patients living for a certain amount of time after treatment


An application-oriented question on the topic along with responses can be seen below. The best answer was provided by Mohamed Asif on 21st February 2020


Applause for all the respondents - Mohamed Asif, Shashikant Adlakha


Also review the answer provided by Mr Venugopal R, Benchmark Six Sigma's in-house expert.


Q 238. The Kaplan Meier Estimator helps Doctors estimate which treatment works best. How does it work?


Note for website visitors - Two questions are asked every week on this platform. One on Tuesday and the other on Friday.

Link to comment
Share on other sites

4 answers to this question

Recommended Posts

  • 0

Kaplan Meier Estimator / Product Limit estimator is used mostly in medical and pharma research. 
It estimates survival function from the recorded data to analyze and estimate post treatment performance.
It is also used in non medical fields to measure time, performance, and other metrics of interest post an event. 


Survival analysis is used to analyze time duration and numerical variables 
[Time from origin event to the occurrence of event of interest (it can be improvement, death, etc...)
The estimator is plotted over time and the plot curve is referred as Kaplan-Meier curve


Apart from Kaplan Meier Estimator, Cox model and Cochran–Mantel–Haenszel test is also used for survival analysis.


For the considered data, we could either have complete time data with event of interest or they could be censored.

In censored we will not be able to accurately measure the total survival time of the patient.


This analysis can be calculated for two group of subjects and the statistical significance in the survivals can be estimated.


Some Assumptions in the test includes below points:

  • In all time censored patient will have similar survival as those of regular patients
  • Survival probabilities are same for subjects interviewed early and later during the study period
  • Events happens at specified time


Survival probability, St is calculated by using below formula

St = [No. of subjects living at start - No. of subjects died] / [No. of subjects living at start]


Working on Minitab:
This method is non parametric Analysis and can be done in Minitab by following the below steps:
1) Creating Distribution overview plot
[Stat > Reliability/Survival > Distribution Analysis (Right Censoring) > Distribution Overview Plot]
Either Right Censoring / Arbitrary Censoring can be used based on the data type
2) Select Variables
3) Specify distribution 
4) update censoring column / value


Lets understand with an example in R:

Would like to walkthrough Kaplan Meier Estimator in R studio for in-build data set (lung data set) for easy understanding.
Input: Weight loss between male and female post event
Output: Improved / Not improved

  • Perform Log-Rank test (for binary outcome)
  • Load Survival library
  • Create survival object (surv())
  • Create data for plot (survfit())
  • Perform test (survdiff()) 
  • Select Parameters [Eg: time, status, sex, weight loss, etc.] 
  • Convert data to binary for censored data (i.e., patient with no outcome/no event)
  • use surv(time, event) and run function to view plot and interpret the analysis

In the above curve, we could see weight loss is better in case of females compared to males.
Referring to p value = 0.00133, we can say there is significant difference


Pitfalls of Kaplan Meier Estimator is that, not all patients turn up for follow up after the treatment, so while interpreting the curve, we will have to cautiously analyze and interpret. 

Link to comment
Share on other sites

  • 1

Benchmark Six Sigma Expert View by Venugopal R

The Kaplan Meier chart is used to estimate the probability of survival during a medical research.

For instance, let us consider that we are interested to study the effect of a particular drug for treatment of a life-threatening disease. The study based on 10 patients who were subjected to this treatment is plotted as below, which is knows as Kaplan Meier chart.




The Y axis represents the probability of survival and the X axis represents time (say no. of years).


As seen, at the start the probability of survival is taken as 1 (or 100%). After two years a patient dies, then the probability of survival drops to 0.9 (90 %). At the end of 3 years we have one more mortality, then we calculate the survival rate as the conditional probability of survival at the end of 3 years for patents who survived the first lap i.e. 0.9 * (8 / 9) = 0.8 (or 80%). The calculation for each step of this chart is continued.


However, it may sometimes so happen that we might lose track of a patient. They are no longer available for the study and are categorized as ‘censored’ patients. It is represented by a vertical cross line; as seen during the 5th year. The censored patients are removed from the denominator while calculating the survival probability for that year and for subsequent years.




In the above figure, the red graph represents the Kaplan Meier chart for another drug B for a similar exercise.


If we look the median survival for both the groups, it will be:

Median survival for Drug A = 7 years

Median survival for Drug B = 4 years


One can also compare the estimates of the survival probabilities for a give period. For instance:

3-year survival probability for Drug A = 0.80

3-year survival probability for Drug B = 0.54


In general, a steeper curve represents a worse situation. Though not discussed in detail here, it is also to be noted that there is also a confidence interval associated with each estimate, and the width of the confidence interval depends on the number of samples being studied.


I hope that this brief discussion about Kaplan Meier charts provides a broad idea as to how Medical Researches would use this tool for estimating and comparing the effectiveness of treatments.

Link to comment
Share on other sites

  • 0

The Kaplan–Meier estimator, named after statisticians Edward L. Kaplan and Paul Meir and also called as the product limit estimator,  is  widely used for estimating the survival function from lifetime data. It is a type of survival models, which models time to an event. Examples include- In medical sector-Time to death after first heart attack, proportion of patients, living for a particular period of time. Some other uses are- time to loan repayment, time to get a job after graduating,  in the insurance sector, in credit and banking sectors to estimate the fraction of delinquency and default. 

A plot of the Kaplan–Meier estimator contains a number of horizontal steps, with a large sample size and approaches the true survival function of the studied population. 

Kaplan–Meier curve takes into account censored data, which occurs if a patient is not being  followed up, the patient voluntarily withdraws from study or the patient is still alive, without event occurrence at the time of last follow up. Small vertical tick-marks on the plot indicate individual patients with censored survival time. Without censoring,  the Kaplan–Meier curve shows the empirical distribution function.

In medical research, a typical application may be -  grouping the studied patients into different categories, those with a certain type of gene profile-Gene A, and those with another type of gene profile-gene  B. Both are treated with a similar treatment protocol. In graph, patients with gene profile B, succumb much faster to a disease, for example, breast cancer,  compared to those with gene A. So, if we estimate and compare the 5-year survival rates of two categories of patients, 70% survival, will be found for gene A, compared to 20% for gene B type patients.  

 Kaplan–Meier estimator curve generation, utilizes two types of data for each patient- A. Status at last observation(event occurrence or censored) and time to event(or time to censoring). 

Apart from survival function estimation through Kaplan-Meier estimator, other parameters that can be estimated directly from survival function  are:

Density function: the probability of happening of event at time t

Hazard function: connotes instantaneous event rate, the force of mortality


For example- 50  patients have entered the study, out of 50- 1 has died at time 1, so the probability of survival at time 1, P1=49/50.  Out of 49 patients alive, 2 more die at time 2, so probability of survival at time 2, P2=47/49


So survival function-time 2= P1*P2


The survival function is the unconditional probability of survival and is estimated by the multiplication of conditional probability at different time of observations.


The curve of Kaplan- Meier estimator curve starts with the probability being 1 at time zero and will decrease subsequently and may reach 0 at  particular time , if the study is continued for a long period of time, or the disease has a very high hazard function.




Link to comment
Share on other sites

This topic is now closed to further replies.

  • Forum Statistics

    • Total Topics
    • Total Posts
  • Member Statistics

    • Total Members
    • Most Online

    Newest Member
    Aakar Gupte
  • Create New...