ZHANG Yi-jin,WANG Cheng-yong
(1.School of Mathematics and Statistics,Wuhan University,Wuhan 430072,China)
(2.School of Mathematics and Statistics,Hubei University of Arts and Science,Xiangyang441053,China)
Abstract:Variable selection is commonly employed when the true underlying model has a sparse representation.Identifying signi ficant predictors will enhance the prediction performance of the fitted model.To solve this problem,among others,Zhang and Lu[1]developed a variable selection method under the framework of the proportional hazards model when one observes rightcensored data.In this paper,We consider the variable selection problem for the additive hazards model when one faces current status data.Motivated by Zhang and Lu[1],we develop an adaptive Lasso method for this problem.Some theoretical properties,including consistency and oracle properties are established under some weak regularity conditions.An extensive simulation is performed to show that the method performs competitively.This method is also applied to a real data set from a tumorigenicity study.
Keywords:Additive hazards model;current status data;adaptive Lasso;ADMM algorithm
Current status data or Case I interval-censored failure time data occur frequently in survival analysis when an exact event time of interest is not available,and only whether or not the event has occurred up to a certain random monitoring time.That event’s current status’is known.This kind of data are often encountered in epidemiological studies,carcinogenicity experiments,econometrics and reliability studies among others.Regression analysis of failure time data is one of the main objectives in survival analysis.In regression analysis,an important and challenging task is to identify the risk factors and their risk contributions.Often,not all the collected covariates may contribute to the predication of outcomes and we need to remove some unimportant covariates.
There are many variable selection techniques in linear regression models.Some of them have been extended to the survival analysis,for example,Bayesian variable selection methods for censored survival data were proposed by Faraggi and Simon[2].However,the sampling properties of this selection methods are largely unknown(see Fan and Li[3]).The least absolute shrinkage and selection operator(Lasso),proposed by Tibshirani[4],is a member of variable selection family based on a penalized likelihood approach with theL1-penalty.It can delete insigni ficant variables by estimating their coefficients as 0.Tibshirani[5]proposed using the Lasso for estimation and variable selection under the Cox model.However,the Lasso estimator does not possess the oracle properties(see[3]).Many other variable selection methods have been developed following Tibshirani[4].For example,the smoothly clipped absolute derivation(SCAD)by Fan and Li[6]and the adaptive Lasso(aLasso)by Zou[7].Both of them have nice properties.
So far many literatures have developed variable selection methods for right-censored data(see for example,[3],[5],[8]).In particular,some penalized methods have been established under the Cox’s proportional hazards model.For example,Tibshirani[5]proposed using the Lasso for the variable selection under the Cox model and right-censored data.Fan and Li[3]generalized the SCAD to the Cox model with right-censored data.The aLasso method also has been extended to the context of proportional hazards model when one observes right-censored data by Zhang and Lu[1].Huang et al.[9]studied the Lasso estimator in sparse,high-dimensional Cox model.Zhao et al[10]studied the simultaneous estimation and variable selection for interval-censored data under the Cox model.
The additive hazards model as an alternative model,which describes a different aspect of the association between the failure time and covariates than the proportional hazards model,is another commonly used regression model in survival analysis.A lot of theoretical results of the estimated regression parameters under additive hazards model have been well established(see for example,[11–13]).It is well-known that many efforts have been focused on the methods of variable selection for Cox model with right-censored observation data.However,as mentioned by Zhao et al[10],there exists little literature on variable selection for interval-censored data.There are relatively less studies developed for the additive hazards model with interval-censored data.This paper considers the variable selection method for case I interval-censored data under the additive hazards model.
The remainder of the paper is organized as follows.In Section 2,we will introduce some notations and assumptions that will be used in this paper.In Section 3,we develop an adaptive lasso method,and give its statistical properties.Section 4 gives some details about the ADMM algorithm that will be applied to solve the adaptive lasso.Section 5 provides some numerical results from an extensive simulation study to assess the performance of the proposed method,and Section 6 applies the proposed method to a real data set from a tumorigenicity study.
Consider a random sample ofnindependent subjects.Fori=1,...,n,letTiandCidenote the failure time of interest and censoring time of thei-th subject,andZi(t)=(Zi1(t),...,Zip(t))0be the vector of possibly time-dependent covariates.Furthermore,since only current status data are available for failure timeTi’s,the observed data are given by{Ci,δi=I(Ti≥Ci),Zi(t),i=1,...,n}.In the next section,we present methods for the cases in which the monitoring timeCis independent or dependent ofTandZ.
In this subsection,we suppose thatCis independent ofTandZ.To model the covariate effect,we assume that the hazard function ofTat timet,given the history of ap-dimensional covariate processZ(·)up tot,has the form

whereλ0(t)is an unspeci fied baseline hazard function,andβ0is ap-vector of unknown regression parameters.
Fori=1,...,n,de fineNi(t)=δiI(Ci≤t),andYi(t)=I(Ci≥t).It can be shown that the counting processNi(t)has the Cox type intensity process as follows


are martingales with respect to theσ- filtrationFt=σ{Ni(s),Yi(s),Zi(s):s≤t,i=1,...,n}.Thus,we can make inferences aboutβ0by applying the partial likelihood principle to model(2.2).For this,we first de fine the partial likelihood function as follows

Taking logarithm of it yields that



It can be seen that the Hessian matrix ofl1(β)is negative de finite,sol1(β)is concave inβ,that is,l1(β)has a unique maximizer.The estimateofβ0can be obtained by maximizing the functionl1(β),or solving the equationU1(β)=0.
When the censoring timeCis not independent of the covariate vectorZ,we describe the relationship betweenCandZby the following hazards model,

where Λc0(t)is an unknown cumulative baseline hazard function,andγ0is ap-vector of unknown regression parameters.We assumed thatCis conditionally independent ofTgiven the covariate vectorZ.
By the arguments leading to(2.2),it can be shown that,under model(2.1)and(2.3),the compensated counting processes

are martingales with respect to theσ- filtrationFt.The notationsNi(t)andH0(t)are the same as those de fined in subsection 2.1.We can also apply the partial likelihood principle to model(2.4)to make inferences for the unknown parametersβ0andγ0.That is,we can consider the following partial likelihood function

However,the functionL2(β,γ)above utilizes only the information ofCi’s with non zeroδi’s,and we mainly focus onβ,it would be more efficient to estimateγ0by applying the partial likelihood theory directly to the model(2.3).Hence,for the estimate ofγ0,we first consider the following partial likelihood function

The maximum partial likelihood estimatorofγ0can be obtained by maximizing the functionL3(γ).Of course,can also be obtained by solving the score equationUγ(γ)=0,where





In the following,we will discuss the development of a penalized or regularized procedure for covariate selection based on the functionsl1(β)andl2(β).
We assume that one observes right-censored data,to select and estimate important variables under the proportional hazards model,Zhang and Lu[1]proposed to minimize the penalized log partial likelihood function,

where(β)denotes the log partial likelihood based on the right-censored data and the proportional hazards model,the positive weights=(1,...,p)0is the maximizer of the log partial likelihood,λis a nonnegative penalization tuning parameter.
Consider the current status data under model(2.1),note that the intensity process of the counting processNi(t)also satis fies Cox type.This suggests that we can select variables by employing a similar method of Zhang and Lu[1].We propose the adaptive Lasso estimatornas follows,

or

The values ofωj’s can be chosen by different ways.In this paper,we specifyωj=1/|j|,where=(1,...,p)0is the maximizer of the log partial likelihoodli(β),i=1,2.
To study the oracle properties of the estimators,we first consider the penalized log partial likelihood function



and let ?β,?βγandDγdenote their limits atβ=β0andγ=γ0.

Let ?1(β10)= ?11(β10,0),where ?11(β10,0)is the leadingq×qsubmatrix of ?(β0)withβ20=0 andV1(β10)=V11(β10,0),whereV11(β10,0)is the leadingq×qsubmatrix ofV(β0)withβ20=0.The following theorem shows that?βnis root-nconsistent ifλn→0 at an appropriate rate.

ProofAs mentioned earlier,in the case of independent censoring,the log partial likelihood is

By Theorem 4.1 and Lemma 3.1 of Andersen and Gill[14],it follows that for eachβin a neighbourhood ofβ0,

It is sufficient to show that for any givenε>0,there exists a large constantKsuch that




Then we have

In the case of dependent censoring,we can write

Then we have

Since the maximum partial likelihood estimatorsatis fies,by the Taylor expansion,we have,for 1≤j≤q,


Therefore in(3.6)or(3.7),if we choose a sufficiently largeK,the first term is of the orderK2n?1.The second and third terms are of the orderKn?1,which are dominated by the first term.Therefore(3.5)holds and it completes the proof.
If theλnis chosen properly,the adaptive Lasso estimator has the oracle property.There are the properties we will show next.



We will show that,with probability tending to 1,for anyβ1satisfying||β1?β10||=Op(n?1/2),?Qi(β)/?βjandβjhave different signs forβj∈(?Kn?1/2,Kn?1/2)withj=q+1,...,p.For eachβin a neighbourhood ofβ0,by Taylor expansion,



Note thatn1/2(?0)=Op(1),so that we have




In the case of independent censoring,letU11(β)be the firstqelements ofU1(β)and let11(β)be the firstq×qsubmatrix of?H1(β).Then







In the case of dependent censoring,letU21(β;γ)be the firstqelements ofU2(β;γ)and let(β;γ)be the firstq×qsubmatrix of(β;γ).Then

whereβ?is betweenandβ0.The last equation is implied by sign()=sign(βj0)when n is large.LetM1(β10)=M11(β10,0),whereM11(β10,0)is the leadingq×qsubmatrix ofM(β0)withβ20=0 and ?β1(β10)= ?β11(β10,0),where ?β11(β10,0)is the leadingq×qsubmatrix of ?βwithβ20=0.Sincein distribution andin probability asn→∞.Furthermore,ifa nonnegative constant,we have



in distribution asn→∞.
RemarkIt is worth noting that asngoes to in finity,the adaptive Lasso can perform as well as the correct submodel was known.Since the proofs only require the root-nconsistency of,any root-nconsistent estimator ofβ0can be used as the adaptive weightρwithout changing the asymptotic properties.
The optimization problem(3.1)or(3.2)is strictly convex and therefore can be solved by many convex optimization algorithm.Here we present an algorithm based on the Alternating Direction Method of Multipliers(ADMM)[15].The ADMM algorithm solves problem in the form

with variablesx∈Rnandz∈Rm,whereA∈Rp×n,B∈Rp×m,andc∈Rp.The augmented Lagrangian is

ADMM consists of the iterations

withρ>0.
In ADMM form,the problem(3.1)or(3.2)can be written as



whereSis the soft thresholding operator satisfying


This algorithm gives very small values to the coefficients which should be estimated as zero and it converges quickly based on our empirical experience.
In this section,we examine the performance of the adaptive Lasso method under the additive hazards model and as a comparison,Lasso,smoothly clipped absolute deviation(SCAD),maximum partial likelihood estimators(MPLE)are also considered.For givenp,the covariatesZare assumed to follow the multivariate normal distribution with mean zero,variance one,and the correlation betweenZjandZkbeingρ|j?k|withρ=0.5,j,k=1,...,p.We setβ0j=1 for the first and last two components of the covariates andβ0j=0 for other components.The results given below are based on sample sizen=300 and 500 replications.

Table 1 displays the results on the covariate selection withp=10 or 20 in the case of independent censoring.In this case,the failure timesTiare generated from model(2.1)withλ0=0.5 or 1.For the observation timesCi,we generated it from the uniform distribution over(0,3.5)and the exponential distribution with parameterλ=0.5 or 0.7.One can see from Table 1 that the aLasso approach gives the smallest FP compared with other methods which means the aLasso chooses unimportant variables much less often than the other methods.At the same time,it kept a fairly high TP and low MWSE.The SCAD method gave the largest TP in most cases among the method considered here.
Table 2 displays the results on the covariate selection withp=10 or 20 in the case of dependent censoring.In this case,we consider different combinations ofλ0,λcandγ0.Here,we set all components ofγ0to be the same,for example,in Table 2,γ0=0.1 means=(0.1,0.1,...,0.1,0.1)in model(2.3).Keepingγ0unchanged,we list four combinations ofλ0andλcin each part,which corresponds toλ0=0.5 or 1,λc=0.5 or 0.7.As in the case of independent censoring,the aLasso approach gave the smallest FP in all dependent cases.
Also,it can be seen from Tables 1–2 that,as the number of covariates increases,the aLasso tends to give the smallest MWSE and largest TP among the methods considered.Overall,the adaptive Lasso performs well in terms of both variable selection and prediction accuracy.

Table 1.Results in the case of independent censoring.

Table 2.Results in the case of dependent censoring.

Table 2 Continued.
In this section,we apply the proposed regression selection procedure to a set of data on mice hepatocellular adenoma.This data set arises from a 2-year tumorigenicity study conducted by National Toxicology Program.In the study,groups of mice were exposed to chloroprene at different concentrations by inhalation.Each mouse was examined once for various tumors when it died.Some mice died naturally during the study,and the others who survived at the end of study were sacri ficed for examinations.At each examination time,tumors were observed if have developed,but the exact tumor onset times were unknown,therefore,only current status data can be obtained.
Here we considered the liver tumor data,and the covariates on which the information was collected include the initial weight of the mouse,the body weight change,the weight change rate,the gender of the mouse,the dose.For the analysis below,we will focus on 200 mice that either belong to the control group or belong to the PPM80 group.
To apply the aLasso regression procedure,let IW denote the initial weight of the mouse,BWC denote the body weight change and BWCR denote the weight change rate.We de-fine Gender=1 if the mouse was male and 0 otherwise,PPM80=0 if the mouse was in the control group and 1 otherwise.For the analysis,we performed the standardization on the three continuous covariates IW,BWC and BWCR.The analysis results given by the aLasso procedure are presented in Table 3.As in the simulation study and for comparison,we also include the analysis results obtained by applying the other penalized procddures discussed here.ALasso,Lasso and SCAD all suggest that the Gender and the initial weight of the mouse had no relationship with or signi ficant in fluence on the existence of hepatocellular adenoma.

Table 3 Analysis results of mice hepatocellular adenoma data.
This paper has discussed the variable selection problem for the additive hazards model based on current status data.In order to select important variables,a penalized log partial likelihood method is developed and the oracle properties are provided.The simulated results suggest that the proposed method performs well for dropping the unimportant variables and retaining the important variables.
As mentioned above,the proposed method can be seen as a generalization of the method given in Zhang and Lu[1],for the case that the model is proportional hazards model and the data is right-censored data.Therefore it could be generalized in several directions.For one,note that in the preceding sections,we assume thatCis independent ofZandT,it is straightforward to generalize the proposed method to the case where the censoring timeCis not independent ofZor other type data.
The second direction is that we can change the weightsρjwith other estimators since the proofs only require the root-nconsistency of?β.