The survival variables are right-censored by fixed constant , if the observed sample consists of the ordered pairs for. For each , where is the fixed censor time and is the censor indicator for. For left-censored data, the observed times are , where is the left censor time associated with. For left-censored data,. It follows from the above that left-censoring is a special case of right-censoring with the time axis reversed.
It is because of this phenomenon that there have been few specialist techniques developed explicitly for left-censored data [ 12 ]. It comes about in the following ways: Study ends without subject experiencing the event; the subject is lost to follow-up within the study period; subject deliberately withdraws the treatment variable; the subject is obliged to withdraw from the treatment due to reasons beyond their control and subject withdraws from the study due to another reason i.
Reference [ 13 ] has observed in a clinical trial that possible causes of patient withdrawal or dropping out of the study included death, adverse reactions, unpleasant study procedures, lack of improvement, early recovery, and other factors related or unrelated to trial procedure and treatments.
In other cases, some data may not be collectible, observable, or available for some study subjects. Data can also be missing by the design of the study as a result of resource constraints [ 14 ]. Type I censoring occurs when a study is designed to end at a fixed period of time which was fixed by the researcher. At the end of the study period, any subject that did not experience the event is censored.
In type I censoring, the number of uncensored observations is a random variable. In Type II censoring, the time may be left open at the beginning. The value of r is fixed before the survival data are seen. This means that the observed data consists of the smallest r observations. In terms of random variables, this may be expressed using other statistics. From the possible responses , we observe only the first r ranked responses. This means that. In random censoring, the total period of observation is fixed, but subjects enter into the study at different points in time.
Some subjects experience the events of interest. Others do not, and some are lost to follow-ups. Others will still be alive at the end of the study period. In random censoring, the censored subjects do not all have the same censoring time. Random censoring can also be produced when there is a single termination time, but entry times vary randomly across the subjects [ 14 ]. Double-censoring occurs as a result of a combination of left and right-censoring. In this case, we note that , where are, respectively, the left and right-censoring times associated with , , in this case, is only observed if it falls in a window of observations.
Otherwise, one of the endpoints of the windows is observed and the other endpoint of the window probably remains undisclosed. We should also note that double-censoring is not the same as interval-censoring [ 4 , 15 ]. Independent censoring occurs whenever there exists a subgroup of interest such that the subjects who are censored at time are representative of all the subjects in that subgroup who remained at risk at time with respect to their survival experience.
In other words, items may not be censored withdrawn because they have a higher or lower risk than the average. Reference [ 16 ] have noted that in practice, this assumption of independent censoring deserves special scrutiny, that is to say, the Kaplan—Meier estimator may overestimate the survival function of if the survival time and the censoring time are positively correlated and underestimate the survival function if the survival and censoring times are negatively correlated.
Independent censoring may not hold for all situations but under some dependence conditions we can still use the likelihood function [ 17 , 18 ]. With noninformative censoring , participants who drop out of the study must not do so due to reasons unrelated to the study. Noninformative censoring occurs if the distribution of survival times provides no information about the distribution of censorship times and vice versa.
That is, the reason why the time of the event was not observed was entirely unrelated to the outcome under study. The study simply ended while the observed subjects were alive. One consequence of noninformative censoring is that the underlying probabilities of obtaining the event of interest are the same for both censored and uncensored observations. The difference between the two is that the survival time for obtaining the event of interest for the censored observations is not known.
Informative censoring occurs when subjects are lost to follow-up due to reasons related to the study [ 17 — 20 ]. The censoring status is a dichotomous random variable. When a subject obtains the event of interest, we denote the censoring status by 1. If the subject fails to obtain the event of interest, we denote the status by 0. In Figure 2 , the bullets at the end of lines representing Anthony, Bernard, Daniel, and Edward mean that each of them obtained the event of interest.
That is to say, they were uncensored. Anthony got the event before the end of the study. He was, therefore, uncensored. Bernard got the event before the observation period. Therefore, he was left-censored. Charles got enrolled in the study; at the start of the study, he was observed for some time; he took a break and was nowhere to be found lost to follow up ; he resurfaced after some time to continue with the study; he did not get the event of interest before the observation period ended, his status was, therefore, interval-censored.
Finally, Edward got the event of interest after the observation period had come to an end. His status was therefore right-censored [ 19 ]. Theorem 1. Under type I right-censoring with fixed censor times, the joint likelihood, , of the observed data , is given by , where k is a constant.
By a type I censoring design we mean a study in which every subject is under observation for a specified period or until failure. A slightly more complicated type I censoring design is one in which each of the subjects have their own fixed censoring time , instead of a common censoring time. In this study design, the likelihood function for each subject can be represented by one of the following two probabilities: the probability that the event occurred in a small interval including time [denoted by or the probability that the subject did not have the event at [denoted by ].
We start the proof by considering a joint density function of and , where for censoring indicator and fixed. Equation 12 can be generalized to accommodate other types of censoring, such as interval-censoring. If we assume that the interval-censoring mechanism operates independently of the observed lifetimes, then it follows that if were represent an interval-censored observation , then the contribution to the likelihood may be determined by the following: This means that terms of this form may be included in the likelihood in Theorem 1 for interval-censored observations.
The likelihood of random right-censored data is constructed in this manner; it is constructed so that will be estimated by maximum likelihood; it is observed that the ordered pairs are given by , for. The likelihood factorizes into two parts.
One relates to the censor times and the other the survival or lifetimes. Theorem 2. The joint likelihood of the observed data , , is given by the following: where is a constant.
The joint density of , where , for the indicator variable and a random variable independent of. For , Differentiating each side of the expression above with respect to z using the principles of calculus gives the following: For , Differentiating each side of the expression above with respect to z, using the principles of calculus gives the following: Combining 18 20 , we shall obtain the following: Equation 20 is, by definition, a constant product of , that is, Equation 22 gives the required results upon rearrangements The implications of Theorem 2 are enormous; firstly, it is important for estimating maximum likelihood involving random censor times.
We note from the theorem that involves the distribution of the censor times only. If there are no parameter estimates for G and ; that is to say, if G and are independent of , then this term acts as a constant multiple in L , when L is maximized with respect to , which takes us to the likelihood of Theorem 2 for fixed censor times.
Secondly, if we regard the observed censor times as conditionally fixed at times , then the term in the bracket is factored out of the conditional likelihood, which takes us to the fixed likelihood in Theorem 2. The argument above favors the fixed censorship model, which appears in Theorem 2.
The likelihoods constructed above are used primarily for analyzing parametric models; they also serve as the basis for determining the partial likelihoods used in semiparametric regression methods [ 17 , 21 , 22 ]. Reference [ 18 ] has discussed four common statistical methods that could be used in analyzing censored data. The first complete-data analysis method is adopted when the researcher decides to ignore the censored observation and conducts the analyses on only the uncensored observations.
This method has the advantage of simplicity. The disadvantages are many, and there is the loss of efficiency and estimation bias. The second method to analyzing censored data is the imputation approach.
This method is one of the popular methods for handling incomplete data but may not be appropriate for censored data. Additionally, the authors have posited that although this method seems acceptable, there were two underlying disadvantages: For right-censoring, it was noted that if there was an assumption that all censored cases failed that is, got the event of interest right after the time of censoring, then survival probabilities would be underestimated; on the other hand, if all censored cases never failed, then the survival probabilities would be overestimated.
For interval-censoring, it was revealed that the inappropriateness of imputation was quite unclear. Another method was to assume that the failure time after censoring followed a specific model and estimate the model parameters in order to impute the residual survival time time from censoring to failure. However, this method depends on the model assumptions, which are very difficult to check without information on survival after censoring. Many researchers use imputation techniques, especially right-point or mid-point imputation when the observations are interval-censored.
This may be due to a lack of statistical software packages for analysis. It has been pointed out that both right-point and mid-point imputations may generate some biased results. According to reference [ 9 ], the third method to analyzing censored data is dichotomizing the data. With this method, the problem of right-censoring and interval-censoring may be avoided if one analyzes the incidence of occurrence versus nonoccurrence of the event within a fixed period of time and disregards the survival times.
In this case, the dichotomized data can be easily analyzed by the standard techniques for binary outcomes, such as contingency tables and logistic regression. However, there are some disadvantages associated with this method: It cannot distinguish between the loss to follow-up and end-of-study censoring; the variability in the timing of the event among those who had the event within the observation period cannot be modeled; no time-dependent covariates such as age, smoking status or alcohol consumption status can be used in modeling.
The authors had further posited that the method of analyzing dichotomized data may be acceptable when the risk of failure was low, risk periods are long, and covariates are associated with preventing the event rather than prolonging the survival time. Such situations are common in many epidemiological studies. The fourth and final method is the likelihood-based approach.
This happens to be the most effective method of censoring problems. It uses methods of estimation that adjust for whether or not an individual observation is censored or not. Many of these methods could be viewed as maximizing the likelihood under certain model assumptions, including assumptions about the censoring mechanism. Likelihood-based methods include, for example, the Kaplan—Meier estimator of the survival function in a one-sample problem, the log-rank test for testing equality of two survival functions in a two-sample problem, and the Cox-regression and accelerated-failure-time models for analysis of time to event data with covariates.
The main advantage of the likelihood-based method is that it utilizes all the information available. However, as in the other methods, assumptions about the censoring mechanism are still required.
This is by far the most important method for analyzing censored data. This method adjusts for whether the subjects observed were censored or not. It also utilizes all the information available. The statistical approaches that fall within the likelihood method includes Kaplan—Meier estimator of the survival function for a one sample problem, the log-rank test for testing the equality of two survival curves in a two-sample problem, and the Cox regression and accelerated failure time models for analyzing time to events with covariates.
In constructing the likelihood for censored, it should be noted that the lifetimes and censoring times are independent; if they are not independent then special techniques must be used in constructing the likelihood function for censored. An observation corresponding to an exact event time provides information on the probability that the event occurs at this time which is approximately equal to the density function X at this time. For a right-censored observation, all we know is that the event time is larger than this time, so that the information is the survival function evaluated at the on-study time.
Similarly, for a left-censored observation, all we know is that the event has already occurred, so that the contribution to the likelihood is the cumulative distribution evaluated at the on-study time. For interval-censored data, we know only that the event occurred within the interval. Maximum likelihood estimation is used because it produces estimators that are consistent, asymptotically efficient, and asymptotically normal.
When the probability of each observation is represented by its probability density function, we obtain the likelihood function: where L represents the probability of the entire data. If censoring is present, then the likelihood function becomes as follows:. The likelihood function effectively combines uncensored and censored observations, in that if an individual is not censored, the probability of the event is , and if the individual is censored at , the probability of the event is , the survivorship function evaluated at.
Taking the natural log of L, the objective is to maximize the expression. Once the appropriate distribution has been specified, the process reduces to using a numerative method such as the Newton-Raphson algorithm to solve for the parameters. Most computer soft wares use the maximum likelihood approach to fit regression models to survival data. Survival models can be usefully viewed as ordinary regression models in which the response variable is time.
However, computing the likelihood function needed for fitting parameters or making other kinds of inferences is complicated by censoring. The likelihood function for a survival model, in the presence of censored data, can also be formulated as follows [ 23 — 25 ],. From calculus, the density function is the ratio of the hazard function and the survivorship function; substituting into will yield the expression. This requires that we maximize the likelihood function with respect to the parameter of interest and the unspecified baseline hazard and survivorship functions [ 26 — 28 ].
Reference [ 23 ] proposed using an expression he called partial likelihood function that depends only on the parameter of interest; he posited that the resulting parameters from the partial likelihood function would have the same distributional properties as the full maximum likelihood estimators; mathematical proofs of this conjecture came later which was based on the counting process approach by Martingales as detailed in [ 29 , 30 ].
The method of partial likelihood begins by assuming that there is a group of individuals, , that are at risk of failure just before the occurrence of. If only one failure occurs at , the conditional probability that the failure occurs to individual i, given that individual i has a vector of covariates x i is represented by the following:. Equation 28 is the hazard function for individual i at a specific point in time, divided by the sum of the hazard functions for all individuals in the risk set just before the occurrence of time.
Because is common to every term in the equation, it is eliminated. The partial likelihood function is obtained by taking the product of equation 28 overall points in time such that. Equation 29 does not depend on and can be maximized to provide an estimate of that is consistent and asymptotically normally distributed on the assumption that there are no tied times and excludes terms when [ 22 ].
In the study, the time period of interest was the time from the start of follow-up to death. The research question was whether this variable mortality status was affected or not by the insurance status of the adults.
At the termination of the study, there were some who had not obtained the event of interest and therefore were right-censored. The analysis was adjusted for other factors such as baseline age, gender, race, smoking status, alcohol consumption, obesity, and employment status.
In another development, a sample of 55 women and 45 men attending a smoking treatment Clinic were studied. A number of demographics were observed to help determine whether women and men revert to smoking for the same or different reasons.
The length of follow-up was five months and the variable of interest was the time from the start of follow-up to reverting into smoking. After 5 months, 38 subjects were lost to follow-up 16 women and 22 men , and 62 subjects 39 women and 23 men had reverted. We note especially that the number who were lost to follow-up were quite different between women and men.
Although this problem of censoring presents an obstacle of distortion, reference [ 31 ] has advanced a multivariate survival analysis method that could handle multiple events with censoring. They have made it possible to measure a bivariate probability density function for a pair of events.
They proposed a method called censored network estimation to discover partially correlated relationships and construct the corresponding network, which was composed of edges representing nonzero partial correlations on multiple censored events. To demonstrate its superior performance compared to conventional methods, they proposed a selection power for the partially correlated events which was evaluated in two types of networks with iterative simulation experiments.
Reference [ 32 ] has identified methods for censored outcomes which have become abundant in the literature; according to the authors, these methods for censored covariates have received little attention and dealt only with the issue of limit-of-detection. They have noted in particular that, for randomly censored covariates, an often-used method was the inefficient Complete-Case Analysis CCA method which consisted in deleting censored observations in the data analysis.
It was further noted that when censoring was not completely independent, the CCA method led to biased and spurious results. Additionally, they have noted that methods for missing covariate data, including type I and type II censoring as well as limit-of-detection, did not readily apply due to the fundamentally different nature of randomly censored covariates.
They then developed a novel method for censored covariates using a conditional mean imputation method which was based on either Kaplan—Meier estimates or a Cox proportional hazards model to estimate the effects of these covariates on a time-to-event outcome. They have evaluated the performance of the proposed method through simulation studies and showed that the imputation method provided a good reduction in bias and improved statistical efficiency. Finally, they have illustrated the method using data from the Framingham Heart Study to assess the relationship between offspring and parental age of onset of cardiovascular events.
Reference [ 33 ] has worked on a general-purpose approach to account for right-censored outcomes using Inverse Probability of Censoring Weighting IPCW.
They illustrated how IPCW can easily be incorporated into a number of existing machine learning algorithms used to mine big health care data including Bayesian networks, k-nearest neighbors, decision trees, and generalized additive models. Furthermore, they showed that their approach leads to better calibrated predictions than the three ad hoc approaches when applied to predicting the 5-year risk of experiencing a cardiovascular adverse event.
Reference [ 34 ] has noted that censoring due to a limit of detection or limit of quantification happens quite often in many medical studies and pointed out the conventional approaches to dealing with censoring when analyzing these data, which methods include the substitution method and the Complete Case Analysis CCA.
They clearly pointed out that the CCA and the substitution method usually led to biased estimates. It was intimated that the MLE approach appeared to perform well in many situations. They proposed an MLE approach to estimate the association between two measurements in the presence of censoring in one or both quantities.
The central idea was to use a copula function to join the marginal distributions of the two measurements. In various simulation studies, they showed that their approach outperforms existing conventional methods CCA and substitution analyses. Furthermore, they proposed a straightforward MLE method to fit a multiple linear regression model in the presence of censoring in a covariate or both the covariate and the response.
Finally, they compared and discussed the performance of their method with multiple imputations and missing indicator model approaches. We will discuss this in a later post. We can make sure of statistical computing languages e.
R to make us make these transformations. For instance, the epiR package in R has the epi. Hopefully this first post on survival analysis gave you a good idea of some of the basic concepts in survival analysis. Ultimately, there are 3 major goals in survival analysis:. In this first post, we will introduce survival analysis and basic concepts of it: Table of Contents What is Survival Analysis? Censoring One aspect that makes survival analysis difficult is the concept of censoring.
There are 3 main reasons why this happens: Individual does not experience the event when the study is over. Individual is lost to follow-up during the study period. Individual withdraws from the study. These 3 patients have three different trajectories: Patient A: Experiences a death before the study ends.
We count this as an event. Patient B: Survives passed the end of the study. Patient C: Withdraws from the study. As T denotes time, it can take on any value between 0 to infinity. We then use: t Specific value of interest for random variable T. Survivor Function The survivor function aka.
However, survival analysis is plagued by problem of censoring in design of clinical trials which renders routine methods of determination of central tendency redundant in computation of average survival time. The present essay attempts to highlight different methods of survival analysis used to estimate time to event in studies based on individual patient level data in the presence of censoring.
Section 2 highlights types of censoring encountered in a clinical trial, its types and potential statistical solutions. Survival analysis techniques, its assumptions and suitability of methods under different data conditions are illustrated in sections 3 and 4. The next section 5 discusses the importance of techniques to extrapolate estimate of life expectancy derived over a period of time exceeding the duration of trial follow-up.
Lastly, section 6 cites limitations and advantages of different methods and finally concludes by indicating possible future areas of research and practice for health economists and public health professionals.
After a total of articles retrieved, articles focussing only on methodology aspect were considered for the present review. Original articles were preferred following by subsequent discussion articles, which added substantially to the methodology.
Censoring is said to be present when information on time to outcome event is not available for all study participants. Participant is said to be censored when information on time to event is not available due to loss to follow-up or non-occurrence of outcome event before the trial end.
Broadly classifying two types of censoring are encountered, i. Point censoring is said to occur when despite continuous monitoring of outcome event, the patient is lost to followup or the event does not occur within the study duration. It is also known as right censoring which can be either end-of-study censoring or loss-to-follow-up censoring. An individual is said to be left censored if the patient had been on risk for disease for a period before entering the study.
However, left censoring is usually not a problem in clinical trials, since starting point is defined by an event such as entry of patient in trial, randomization or occurrence of a procedure or treatment. Individuals B and C are right censored while individual F is left censored [ Figure 1 ].
Problem of interval censoring arises when time to event may be known only up to a time interval. This situation occurs in case the assessment of monitoring is done at a periodical frequency. Practically, most observational studies dealing with non-lethal outcomes have periodical examination schedules and are thus interval censored. However, if the periodicity of examination is at a justified frequency, interval censored data can be dealt with as point censored.
Statisticians have devised various methods to deal with censored data which includes complete data analysis, imputation techniques or analysis based on dichotomized data. Detailed discussion of each of these methods is beyond the scope of the present essay due to space constraint; however, it is important to bear in mind the techniques available. The more effective methods that are widely used in survival studies encountering censored data are likelihood-based approaches survival analysis methods which adjust for the occurrence of censoring in each observation, and thus are advantageous that it uses all available information.
Survival analysis techniques used for dealing with censored data can be broadly classified into nonparamteric Kaplan Meier product limit method , parametric Weibull and exponential methods and semi-paramteric method Cox-proportional hazards method.
The latter two can also be applied as regression-based models. However, usual likelihoodbased functions of whole sample which are product of individual likelihood functions cannot be applied in the presence of complex censoring mechanisms, especially in the presence of both loss to follow-up and end of study censoring. In such situation, a joint distribution of survival and censoring times can be done.
However, the distribution of survival time in such situation is considered as non-identifiable. Hence, the assumption of independent censoring is imperative. In other words, censoring is independent of unusual high or low risk for occurrence of event which implies that survival times for censored and uncensored individuals is same and removal of censored individuals from analysis would yield an unbiased estimate of survival time or time to event.
This method computes the probability of dying at a certain point of time conditional to the survival up to that point. Thus it maximizes utilization of available information on time to event of the study sample.
They are not included in numerator at any point. Hypothetical illustration of estimating probability of survival S t i using Kaplan Meier estimator.
Censoring affects the shape of survival curve in a situation when a large number of individuals are censored at a single point of time leading to sudden spurious large jumps or large flat section in survival curve. Statistical significance of difference between two or more survival curves is determined by the log-rank test.
Reliability of the different portions of survival curve is dependent on the number of individuals at risk at that stage. Majority of studies dealing with analysis of survival time are likely to have some individuals for which outcome event and thus time to event or outcome is not recorded.
This is in view of paucity of resources money and time to carry forward the study till outcomes especially fatal outcomes are recorded for each and every study individual. To have a measure of maturity of data thus becomes imperative to show quality of data in terms of adequacy of individuals for which outcome event and thus time to event is recorded.
Simpler measures for maturity include average median follow-up period. However, a more robust graphical technique involves constructing a survival curve by reversing censoring i. Thus median time to follow-up is estimated using this technique which deals with censoring in the same manner as for time to survival. As in the non-parametric approaches in the analysis of time to event data, the models under parametric approach derive estimates of failure time statistics while accounting for the presence of censoring in the data.
The main difference between the two approaches is that the latter attempt to derive estimates using a parametric model, which male specific assumptions about the distribution of failure time through assuming a particular functional form for the hazard rate.
This functional form can specify the hazard rate as a function of time or can incorporate covariate information in which case the hazard rate is specified as a function of time and specific covariates. In this way, failure time is related to a set of covariates thus leading to a regression approach. Parametric methods of survival analysis assume distribution of hazard rate as a function of time besides assumption of independent censoring. Censoring plays a similar role in the models as in the case of the non-parametric hazard and the condition of independent censoring.
The censoring is adjusted for in parametric models by incorporation of hazard rate which uses similar Kaplan Meier estimator.
0コメント