You are viewing a javascript disabled version of the site. Please enable Javascript for this site to function properly.
Go to headerGo to navigationGo to searchGo to contentsGo to footer
In content section. Select this link to jump to navigation

A novel Log penalty in a path seeking scheme for biomarker selection

Abstract

Biomarker selection or feature selection from survival data is a topic of considerable interest. Recently various survival analysis approaches for biomarker selection have been developed; however, there are growing challenges to currently methods for handling high-dimensional and low-sample problem. We propose a novel Log-sum regularization estimator within accelerated failure time (AFT) for predicting cancer patient survival time with a few biomarkers. This approach is implemented in path seeking algorithm to speed up solving the Log-sum penalty. Additionally, the control parameter of Log-sum penalty is modified by Bayesian information criterion (BIC). The results indicate that our proposed approach is able to achieve good performance in both simulated and real datasets with other 1 type regularization methods for biomarker selection.

1.Introduction

Biomarker selection or feature selection from survival data is a topic of considerable interest. The regularization methods are group of feature selection methods that embed different penalized methods in the learning procedure into a single process. In this way, the regularization methods could reduce the overfitting problem. The 0-norm is a total number of non-zero elements in a vector, but it faces the problem of combinatory optimization, i.e., 0-norm is too complex and almost impossible to solve. Thus, alternative sparsity-promoting functions which are more computationally efficient in finding the sparse solution are desirable. The most popular alternative is to replace the 0-norm with the 1-norm (or Lasso) penalty function that is the least absolute shrinkage and selection operator [1]. There are also some 1 type regularizations can also take place of 0-norm, like smoothly clipped absolute deviation (or SCAD) [2], minimax concave penalty (or MCP) [3], group lasso [4]. Moreover, the general sparse representation method is to solve a linear representation system with the p-norm minimization problem, especially p= 0.1, 1/2, 2/3 or 0.9 [5, 6]. Recently, Candes et al. [7] proposed the Log-sum penalty that approximated 0-norm much better than other penalties by reweighting the 1-norm of transformed object. As shown in Fig. 1, these five states of art penalty methods satisfy properties of sparsity and continuity. Especially, the Log-sum penalty is sparser than other four penalty methods. Unlike Lasso and Elastic net biased estimators, the 1/2, SCAD and Log-sum penalties have unbiasedness, i.e., they can easily construct unbiased estimators when the coefficient is large. Then, Chartrand and Yin [8] used this iteratively reweighted 1 minimization algorithm in sparse signal recovery. Xia et al. [9] proposed multiple linear regression with Log-sum penalty in a thresholding representation theory for drug discovery. We take the advantage of the Log-sum penalty in strong sparsity and employ it to select the key factors without any prior information in survival data.

Figure 1.

Various penalized function for orthonormal design: (a) Lasso, (b) 1/2, (c) Elastic net, (d) SCAD, (e) Log-sum.

Various penalized function for orthonormal design: (a) Lasso, (b) ℓ1/2, (c) Elastic net, (d) SCAD, (e) Log-sum.

For survival analysis, there are mainly two types of survival model, namely Cox model or Cox proportional hazards model [10] and accelerated failure time (AFT) model [11]. The Cox model usually is used for predicting the hazard rate of a disease, whereas the AFT model is available to estimate survival time of the patients with simply regressing the exponential over the key risk predictors [12]. Besides, the physical interpretation of AFT model is similar with standard regression, so the AFT model comes out as an attractive alternative to the Cox proportional hazard model for censored failure time data [12]. Furthermore, the log-linear form of AFT model increases its robustness to the model misspecification and yield narrower confidence interval for regression coefficients [13]. After the penalized Cox model with 1-norm [14], Datta et al. [15] combined AFT model with 1-norm to predict failure time outcomes. Other penalized AFT models aim to obtain more accurate and sparse predictor for survival analysis, such as AFT via bridge penalization [16], 1/2-norm AFT model [17], etc. In this article, we employ path seeking scheme to accelerate solving the Log-sum penalty for predicting patient survival time with fewer biomakers.

The rest of this article is organized as follows: Section 2 introduces the Log-sum penalized AFT model. In Section 3, we implement our proposed novel Log penalty in path seeking scheme. Finally, we discuss the experimental results in Section 4 and make some conclusions in Section 5.

2.Log-sum penalized AFT model

Suppose the survival data have h patients (τi,δi,Xi)i=1h, where the sample vector of a survival times 𝐓=(τ1,τ2,τh)T and τi=min(ti,ci), here ti is the true survival time and ci is the time to the first censoring event (e.g., study conclusion, date of final follow up) for each sample i. The δ indicates the censoring time, i.e.,

δi={0,the right censoring time1,the completed time.

Xi denotes the gene expression data of i-th patient, i.e., Xi=(xi1,xi2,xik), where k is the number of genes.

The accelerated failure time (AFT) model is used to define the survival time τi as follows:

(1)
τi=exp(β0+XiβT+εi)

where βk is the coefficient vector of k variables, β0 is the intercept, and εiN(0,1) is independent random error. In this article, we employ the mean imputation method [15] that converts the censoring survival time τi to the estimated survival time G(τi) as the following estimated function:

(2)
G(τi)=δilog(τi)+(1-δi){S^(τi)}-1t(r)>tilog(t(r)ΔS^(t(r)))

where r is the amount of individuals at risk of failing just before time t(i) that are different censored survival times in an ascending order, and ΔS^(t(r)) is the step of Kaplan-Meier estimator S^ at time t(r) [18]. Then, we can directly using AFT model via standard least squares approach to minimize the loss function L(β) as follows:

(3)
L(β)=1hi=1h(yi-j=0kβjxij)2

where yiis replaced by the estimated survival time G(τi) in Eq. (2), i.e., the survival times τi logarithmically transformed into yi.

3.Implementation of Log-sum penalized AFT model

The regularization methods are used to reduce the overfitting problem of learning procedure through adding the penalty term, therefore the general regularization can be modeled as:

(4)
β^(λ)=argminβ{L(β)+λP(β)}

where βk is the coefficient of covariate, λ> 0 is a control parameter, L(β) represents the loss term and P(β) is the penalty term. Larger values of λ exert higher penalties on regression coefficients, resulting on inclusion of fewer variables in the model and vice versa. The generalized cross-validation [19] has been widely used for given an appropriate value of the control parameter. Huang et al. [20] used a modified Akaike’s information criterion (AIC) for choosing tuning parameter. Wang and Song [21] used Bayesian information criterion (BIC) for tuning parameter selection under AFT model with adaptive Lasso. Friedman [22] obtained the control parameter by solving the component ratios of the gradient of the loss function and regularization term that is called generalized path seeking scheme. This scheme is much faster than general convex optimizers for squared-error loss.

For the regularization term P(β), many penalties are proposed to bridge the gap between the 0 and 1 minimization. Such a Log-sum penalty function was originally introduced in [23] for basis selection which indicates that Log-sum based methods present uniform superiority over the conventional 1-type methods. The Log-sum sparsity-encouraging functional for survival analysis leads to:

(5)
β^=argminβ{L(β)+λj=1klog(|βj|+ξ)}

where ξ>0 is a positive parameter to ensure that the function is well-defined. Especially, the Log-sum penalty function behaves like the 0-norm when ξ0 [24]. In this article, we employ the path seeking method [22] to solve the Log-sum penalized AFT model through constructing a path directly and successively in parameter space. Let υ measure length along the path and the step size Δυ>0 can be calculated by

(6)
L(β^(υ))-L(β^(υ+Δυ))L(β^(υ))=0.01

Define

(7)
φj(υ)=-[L(β)βj]β=β^(υ)=-[1hi=1h(yi-j=0kβjxij)2βj]β=β^(υ)=[2hi=1hxij(yi-j=0kβjxij)]β=β^(υ)
(8)
ϕj(υ)=[j=1klog(|βj|+ξ)|βj|]β=β^(υ)=[log(|βj|+ξ)|βj|]β=β^(υ)=[1|βj|+ξ]β=β^(υ)
(9)
λj(υ)=φj(υ)ϕj(υ)

where λj(υ) is the value of λ in Eq. (4) corresponding to υ, and is also the ratio of loss functiongradient ϕj(υ) for L(β) in Eq. (3) and penalty function gradient φj(υ) with respect to |βj|. Without estimating λ, this path seeking scheme can accelerate solving the Log-sum penalty. Besides, ξ is chosen by ten-fold cross-validation. The details of the achievementfor Log-sum penalty are represented in Algorithm 1.

Algorithm 1 The algorithm of Log-sum penalty
1.Initialize:υ=0,{βj^(0)=0}1k
2.repeat
3.  Compute {λj(υ)}1k
4.    S={j|λj(υ)βj^(υ)<0}
5.  ifS= empty then
6.    j*=argmaxj|λj(υ)|
7.else
8.    j*=argmaxjS|λj(υ)|
9.end if
10. βj*^(υ+Δυ)=βj*^(υ)+Δυsign(λj*(υ))
11.   {βj^(υ+Δυ)=βj^(υ)}jj*
12.    υυ+Δυ
13.until λj(υ)=0

At first, we initialize the path, and then compute the vector λ(υ) by Eqs (7)–(9) in each step. Subsequently, the non-zero coefficients βj^(υ) are recognized. Those βj^(υ) have a sign opposite to that of their corresponding λj(υ). Generally there are non the coefficient corresponding to the largest component of λj(υ) in absolute value is selected. If one or more λj(υ)βj^(υ)<0, then the coefficient with corresponding large |λj(υ)| within this subset is instead selected. The selected coefficient βj*^(υ) is then incriminated through a small amount in the direction of the sign of its correspond λj*(υ) with all other coefficient residual unchanged, producing the solution for the next path point υ+Δυ. Iterations continue until all components of λ(υ) are zero.

4.Numerical experiments

4.1Simulated datasets

In order to simulate the high-dimensional and low-sample property of gene expression data, we assumed that 20 nonzero factors among k= 2000 variables with different fraction and sample size h= 90, 300 respectively based on the following model:

(10)
Y=u=120βuXu+σε

where Y denotes the vector of survival times logarithmically transformed yi=log(τi) in Eq. (3) without censored data, i.e., Y=(y1,y2,,yh), ε is an independent random noise that is generated from a normal distribution N(0,1), σ controls the noise strength and the coefficients of relevant features are specified as

β=(2,,25,-2,1.5,-1.7,2.5,-1.8,4,,410,0,,01980)

The 𝑿 value is simulated from an array tmpi0,,ik,(i= 1,,h) of independent standard normal distribution:

(11)
xij=ϱ×tmpi0+(1-ϱ)×tmpij

where the correlation coefficient ϱ are 0.1 and 0.3 respectively in our experiment.

Additionally, the both sensitivity and specificity for each procedure are calculated as follows:

(12)
𝑠𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦=#correctly selected genes#non – zero inβ=#correctly selected genes20
(13)
𝑠𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡𝑦=#correctly rejected genes#zero inβ=#correctly rejected genes1980

The optimal combination of ξ is selected under ten-fold cross-validation by minimizing the Bayesian information criterion (BIC) defined as

(14)
BICξ=-2hlog(mse(β))-log(h)2dfξ

where h is the total number of observations; dfξ is the number of nonzero parameters; and mse(β) measures the mean square error that is defined by

(15)
mse(β)=1hi=1h(τi-τi^)2

where the predicted value τi^=exp(j=0kβjxij). In our simulations and application, the optimal ξ is searched on grid points.

We also employ the concordance index (CI) to evaluate the predictive accuracy of survival models. CI or c-index can be interpreted as the fraction of all pairs of subjects whose predicted survival times are correctly ordered among all subjects that can actually be ordered. Therefore, it can be written as:

(16)
ci(β)=ij1(τi^<τj^andδi=1)ij1(τi<τjandδi=1)

Table 1

The results of different penalized methods in simulated data

σ ϱ Penaltyh= 90h= 300
SensitivitySpecificityMSECISensitivitySpecificityMSECI
1.00.1Lasso0.7580.89687.9240.8080.8500.97174.1750.908
1/2 0.8560.90857.7500.8630.9000.97829.8250.913
Elastic net0.7820.731149.3400.6380.8000.754117.3930.720
SCAD0.8450.89167.2540.7920.9000.97743.5020.872
Log-sum 0.902 0.953 54.615 0.871 0.950 0.986 23.845 0.928
0.3Lasso0.6840.78698.1830.7820.7620.86464.5360.858
1/2 0.7930.89462.7860.8090.8500.93559.7470.872
Elastic net 0.912 0.726174.5700.800 0.999 0.757141.6020.722
SCAD0.7600.88274.0090.7650.8000.94547.6810.885
Log-sum0.841 0.931 58.892 0.847 0.872 0.978 46.893 0.918
1.50.1Lasso0.7210.86071.0040.6430.8000.96754.6010.762
1/2 0.8400.90363.5490.7250.9100.97240.2190.887
Elastic net0.7500.683198.6270.5690.8500.705179.1180.641
SCAD0.7670.88969.0730.7030.8500.96551.6690.847
Log-sum 0.880 0.935 62.720 0.804 0.950 0.982 38.358 0.915
0.3Lasso0.6170.756119.4530.6910.6500.86195.2350.811
1/2 0.7500.85782.9440.7140.8000.92860.8120.849
Elastic net 0.860 0.605213.9050.571 0.959 0.621185.9170.645
SCAD0.7510.87986.8920.6090.8000.93768.8160.638
Log-sum0.800 0.917 65.092 0.784 0.850 0.974 48.973 0.890

The simulated experiments are repeated 100 times. From Table 1, we can conclude that the Log-sum penalty using the path seeking algorithm can achieve lower MSE with higher CI than other penalties. Furthermore, this Log-sum penalty results in higher sensitivity for identifying correct genes compared to the other four algorithms. With increasing sample size, the performance of Log-sum penalty is better. For example, when σ= 1.0, ϱ= 0.1, the performance of Log-sum in h= 300 has lower MSE with higher CI than in h= 90. The Elastic Net with 2-norm selected the largest number of genes within the synthetic data with poor performance, e.g., when σ= 1.0, ϱ= 0.3, h= 300 Elastic Net nearly selected 20 non-zero coefficients (i.e., 20 × 0.999), but it selected 481 irrelevant coefficients (i.e., 1980 × 0.243). With the correlation ϱ increasing among genes for various noise levels σ, the 1-norm (Lasso) cannot distinguish the key genes very well, while other 1 type penalties performed effectively, especially the Log-sum penalty.

4.2Real datasets

To further demonstrate the performance of these regularization methods, we compare our proposed method with other four penalties on GSE22210 microarray expression data from NCBI’s gene expression omnibus (GEO). This breast cancer dataset includes 1,452 genes and 167 samples [25]. We divide the data set at random two-thirds samples (117 samples) are training set and the remainders (50 samples) are used to test. Table 2 shows that the Log-sum penalty achieves best predicting survival time just with fewer genes than other 1 type regularization methods.

Table 2

The results in the GSE22210

Penalty# Selected genesMSECI
Lasso4621.0230.776
1/2 2325.2710.783
Elastic net15933.3310.809
SCAD3322.7540.791
Log-sum15 16.814 0.821

Table 3

The selected genes of different penalized methods in the GSE22210

Lasso 1/2 Elastic netSCADLog-sum
1XISTIL1BSERPINB2IL1BXIST
2LATXISTXISTXISTLAT
3IL1BHLA-DQA2IMPACTCCND1DNASE1L1
4DNASE1L1TGFAIL1BHLA-DQA2IL1B
5NFKB1CDKN1ALATNFKB1BCL2L2
6HDAC9GNMTCCND1PTHR1MEST
7BCL2L2LATNFKB1GNMTNFKB1
8ESR2BCL2L2TGFALATDAB2IP
9AFPHDAC9HLA-DQA2CD44ESR2
10LAMC1CD44RASGRF1ESR2APC

As see from the Table 3, some genes are selected by all methods such as XIST, LAT and IL1B. Missing from XIST RNA, the X chromosome causes the basal-like subtype of invasive breast cancer [26]. Furthermore, LAT is short for Linker for Activation of T cells that plays a crucial role in the TCR-mediated signaling pathways. The adoptive transfer of T cells appears to be a promising new treatment for various type-s of cancer [27]. Collado-Hidalgo et al. [28] provided evidence that polymorphisms in IL1B increase the production of proinflammatory cytokines triggered by the treatment, which subsequently affects persistent fatigue in the aftermath of breast carcinoma. There are some unique genes selected by Log-sum, such as MEST, DAB2IP, APC, etc. MEST is also known as paternally expressed gene 1 (PEG1) that are often detected in invasive breast carcinomas [29]. The DAB2IP as a bona fide tumor suppressor that frequently silenced by promoter methylation in aggressive human tumors [30]. Furthermore, aberrant methylation of the APC gene is frequent in breast cancers [31].

5.Conclusion

In this paper, we propose a novel Log-sum regularization estimator with the AFT model in the path seeking scheme. Comparing with other 1 type penalties, the results in both simulated and real datasets indicate that our proposed Log-sum penalty can effectively predict patient survival time with fewer biomakers. Thus, we believe it will be an effective tool for gene selection on high dimensional biological data.

Acknowledgments

The authors thank Dr. Zi-Yi Yang for excellent technical assistance. This work was supported by the Macau Science and Technology Develop Funds (Grant no. 003/2016/AFJ) of Macao SAR of China and China NSFC project (Contract no. 61661166011).

Conflict of interest

None to report.

References

[1] 

Tibshirani R. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B (Methodological). 1996; 267-288.

[2] 

Fan J, Li R. Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association. 2001; 96(456): 1348-1360.

[3] 

Zhang CH. Nearly unbiased variable selection under minimax concave penalty. The Annals of Statistics. 2010; 38(2): 894-942.

[4] 

Yuan M, Lin Y. Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology). 2006; 68(1): 49-67.

[5] 

Lyu Q, Lin Z, She Y, Zhang C. A comparison of typical Lpminimization algorithms. Neurocomputing. 2013; 119: 413-424.

[6] 

Xu Z, Chang X, Xu F, Zhang H. L1/2regularization: A thresholding representation theory and a fast solver. IEEE Transactions on neural networks and learning systems. 2012; 23(7): 1013-1027.

[7] 

Candes EJ, Wakin MB, Boyd SP. Enhancing sparsity by reweighted L1 minimization. Journal of Fourier Analysis and Applications. 2008; 14(5): 877-905.

[8] 

Chartrand R, Yin W. Iteratively reweighted algorithms for compressive sensing. In: Acoustics, speech and signal processing, 2008; ICASSP 2008. IEEE international conference on. IEEE; 2008. pp. 3869-3872.

[9] 

Xia LY, Wang YW, Meng DY, Yao XJ, Chai H, Liang Y. Descriptor Selection via Log-Sum Regularization for the Biological Activities of Chemical Structure. International Journal of Molecular Sciences. 2017; 19(1): 30.

[10] 

Cox DR. Regression models and life-tables. In: Breakthroughs in statistics. Springer; 1992; p. 527-541.

[11] 

Kalbfleisch JD, Prentice RL. The statistical analysis of failure time data. vol. 360; John Wiley & Sons; 2011.

[12] 

Wei LJ. The accelerated failure time model: a useful alternative to the Cox regression model in survival analysis. Statistics in Medicine. 1992; 11(14-15): 1871-1879.

[13] 

Hutton J, Monaghan P. Choice of parametric accelerated life and proportional hazards models for survival data: asymptotic results. Lifetime Data Analysis. 2002; 8(4): 375-393.

[14] 

Tibshirani R. The lasso method for variable selection in the Cox model. Statistics in Medicine. 1997; 16(4): 385-395.

[15] 

Datta S, Le-Rademacher J, Datta S. Predicting patient survival from microarray data by accelerated failure time modeling using partial least squares and LASSO. Biometrics. 2007; 63(1): 259-271.

[16] 

Huang J, Ma S. Variable selection in the accelerated failure time model via the bridge method. Lifetime Data Analysis. 2010; 16(2): 176-195.

[17] 

Chai H, Liang Y, Liu XY. The L1/2 regularization approach for survival analysis in the accelerated failure time model. Computers in Biology and Medicine. 2015; 64: 283-290.

[18] 

Datta S. Estimating the mean life time using right censored data. Statistical Methodology. 2005; 2(1): 65-69.

[19] 

Craven P, Wahba G. Smoothing noisy data with spline functions. Numerische Mathematik. 1978; 31(4): 377-403.

[20] 

Huang J, Ma S, Xie H. Regularized Estimation in the Accelerated Failure Time Model with High-Dimensional Covariates. Biometrics. 2006; 62(3): 813-820.

[21] 

Wang X, Song L. Adaptive Lasso variable selection for the accelerated failure models. Communications in Statistics-Theory and Methods. 2011; 40(24): 4372-4386.

[22] 

Friedman JH. Fast sparse regression and classification. International Journal of Forecasting. 2012; 28(3): 722-738.

[23] 

Coifman RR, Wickerhauser MV. Entropy-based algorithms for best basis selection. IEEE Transactions on information theory. 1992; 38(2): 713-718.

[24] 

Rao BD, Kreutz-Delgado K. An affine scaling methodology for best basis selection. IEEE Transactions on Signal Processing. 1999; 47(1): 187-200.

[25] 

Holm K, Hegardt C, Staaf J, Vallon-Christersson J, Jönsson G, Olsson H, et al. Molecular subtypes of breast cancer are associated with characteristic DNA methylation patterns. Breast Cancer Research. 2010; 12(3): R36.

[26] 

Richardson AL, Wang ZC, De Nicolo A, Lu X, Brown M, Miron A, et al. X chromosomal abnormalities in basal-like human breast cancer. Cancer Cell. 2006; 9(2): 121-132.

[27] 

June CH. Adoptive T cell therapy for cancer in the clinic. The Journal of Clinical Investigation. 2007; 117(6): 1466-1476.

[28] 

Collado-Hidalgo A, Bower JE, Ganz PA, Irwin MR, Cole SW. Cytokine gene polymorphisms and fatigue in breast cancer survivors: Early findings. Brain, Behavior, and Immunity. 2008; 22(8): 1197-1200.

[29] 

Pedersen IS, Dervan PA, Broderick D, Harrison M, Miller N, Delany E, et al. Frequent loss of imprinting of PEG1/MEST in invasive breast cancer. Cancer Research. 1999; 59(21): 5449-5451.

[30] 

Di Minin G, Bellazzo A, Dal Ferro M, Chiaruttini G, Nuzzo S, Bicciato S, et al. Mutant p53 reprograms TNF signaling in cancer cells through interaction with the tumor suppressor DAB2IP. Molecular Cell. 2014; 56(5): 617-629.

[31] 

Virmani AK, Rathi A, Sathyanarayana UG, Padar A, Huang CX, Cunnigham HT, et al. Aberrant methylation of the adenomatous polyposis coli (APC) gene promoter 1A in breast and lung carcinomas. Clinical Cancer Research. 2001; 7(7): 1998-2004.