Communications in Statistics - Simulation and Computation ISSN: 0361-0918 (Print) 1532-4141 (Online) Journal homepage: http://www.tandfonline.com/loi/lssp20 Double shrunken selection operator B. Yüzbaşı & M. Arashi To cite this article: B. Yüzbaşı & M. Arashi (2017): Double shrunken selection operator, Communications in Statistics - Simulation and Computation, DOI: 10.1080/03610918.2017.1395040 To link to this article: http://dx.doi.org/10.1080/03610918.2017.1395040 Accepted author version posted online: 23 Oct 2017. Submit your article to this journal Article views: 7 View related articles View Crossmark data Full Terms & Conditions of access and use can be found at http://www.tandfonline.com/action/journalInformation?journalCode=lssp20 Download by: [University of Florida] Date: 27 October 2017, At: 07:04 ACCEPTED MANUSCRIPT Double shrunken selection operator B. Yüzbaşı1∗and M. Arashi2 1 Downloaded by [University of Florida] at 07:04 27 October 2017 2 Department of Econometrics, Inonu University, Malatya, Turkey Department of Statistics, Shahrood University of Technology, Iran Abstract: The least absolute shrinkage and selection operator (LASSO) of Tibshirani (1996) is a prominent estimator which selects significant (under some sense) features and kills insignificant ones. Indeed the LASSO shrinks features larger than a noise level to zero. In this paper, we force LASSO to be shrunken more by proposing a Stein-type shrinkage estimator emanating from the LASSO, namely the Stein-type LASSO. The newly proposed estimator proposes good performance in risk sense numerically. Variants of this estimator have smaller relative MSE and prediction error, compared to the LASSO, in the analysis of prostate cancer data set. Key words and phrases: Double shrinking; Linear regression model; LASSO; MSE; Prediction error; Stein-type shrinkage estimator AMS Classification: 62G08, 62J07, 62G20 1 Introduction It is well-known that the least squares estimator (LSE) in the linear regression model, is unbiased with minimum variance. However, dealing with sparse linear models, it is deficient from prediction accuracy and/or interpretation. As a remedy, one may use the ∗ Corresponding author. Email: b.yzb@hotmail.com 1 ACCEPTED MANUSCRIPT ACCEPTED MANUSCRIPT least absolute shrinkage and selection operator (LASSO) estimator of Tibshirani (1996). It defines a continuous shrinking operation that can produce coefficients that are exactly “zero” and is competitive with subset selection and ridge regression retaining good properties of both the estimators. LASSO simultaneously estimates and selects the coefficients of a given linear regression model. Recently, Saleh and Raheem (2015a) have proposed an improved LASSO estimation technique based on Stein-rule, where they use uncertain prior information on parameters of interest. Baranchik (1970) introduced a family of minimax estimators that contains the James Downloaded by [University of Florida] at 07:04 27 October 2017 and Stein (1961). Ali and Saleh (1990) considered the preliminary and shrinkage estimations under more a general setting of a p-variate normal distribution with unknown covariance matrix. Ahmed et al. (2015) showed how using the expansions for the coverage probability of a confidence set centered at the James-Stein estimator can be used for a construction of confidence region with constant confidence level. Chang (2015) suggested doubly shrinkage estimators for larger sparse covariance matrices. Ahmed (2001) studied the asymptotic properties of Stein-type estimators in various contexts. Ahmed et. al (2007) introduced shrinkage, pretest and absolute penalty estimators in partially linear models. Ahmed and Fallahpour (2012) considered the estimation problem for the quasilikelihood model in presence of non-sample information. See Saleh (2006) and Ahmed (2014) for a comprehensive overview on shrinkage estimation with uncertain prior information. Saleh and Raheem (2015a) illustrated superiority of a set of LASSO-based shrinkage estimators over the classical LASSO estimator. Hansen (2016) numerically compared the L2 -risk of shrinkage and LASSO estimators and highlighted some of the limitations of LASSO. Roozbeh and Arashi (2016) and Yüzbaşı and Ahmed (2016) developed shrinkage ridge estimators in partial linear models. Very recently, Yüzbaşı et. al (2017) proposed pretest and shrinkage estimators in ridge regression linear models and compare their performance with some penalty estimators, including LASSO. Other related studies include Asar (2017), Saleh and Raheem (2015b). In this paper, we present a Steinian LASSO-type estimator by double shrinking the features. Specifically, following James and Stein (1961) and Stein (1981), we propose a 2 ACCEPTED MANUSCRIPT ACCEPTED MANUSCRIPT set of Stein-type LASSO estimators. For definition of James-Stein estimator, see Groβ (2012). We will illustrate how the proposed set of estimators perform well compared to the LASSO. In all comparisons, we use the L2 -risk measure of closeness, i.e., for any estimator bn of the vector-parameter θ, the L2 -loss function is given by L(θ; θ) b = kθ b − θk2 and the θ h i b . associated L2 -risk is evaluated by limn→∞ E nL(θ; θ) In what follows, we propose the set of Stein-type LASSO estimators and evaluate the performance of the proposed estimators, compared to the LASSO, via a Monte Carlo simulation study. We further investigate the superiority of the proposed estimators compared Downloaded by [University of Florida] at 07:04 27 October 2017 to the LASSO using the prostate cancer data set. 2 Linear Model and Estimators Consider the linear regression model Yi = β0 + β1 x1i + . . . + βp xpi + i = β0 + x> i β + i , i = 1, . . . , n, (2.1) where 1 , . . . , n are i.i.d. random variables with mean 0 and variance σ 2 . Without loss of generality, we will assume that the covariates are centered to have P mean 0 and take βb0 = n−1 nj=1 Yi = Ȳ and replace Yi in (2.1) by Yi − Ȳ to eliminate β0 . Then, we also assume Ȳ = 0 to better concentrate on the estimation of β = (β1 , . . . , βp )> . Following Knight and Fu (2000), we consider the bridge estimator of β by minimizing the penalized least squares criterion p n 2 λn X 1X > Yi − x i β + |βj |γ , n i=1 n j=1 (2.2) for a given λn with γ > 0. In consequent study, we only focus on the special case γ = 1, resulting the LASSO of Tibshirani (1996). We will provide some notes about the use of (2.2) in conclusions. 3 ACCEPTED MANUSCRIPT ACCEPTED MANUSCRIPT 2.1 Stein-type LASSO Following Stein (1981), we define the following set of general shrinkage estimators emanating from the LASSO estimator as bS = β b L + g(β b L ), β n n n (2.3) for some function g : Rp → Rp . Assume the following regularity conditions hold: Downloaded by [University of Florida] at 07:04 27 October 2017 (A1) C n = (A2) 1 n 1 n Pn i=1 xi x> i → C, where C is a non-negative definite matrix. max1≤i≤n x> i xi → 0. (A3) The function g(s) = g(s1 , . . . , sp ) is such that ∂g/∂si is continuous almost everywhere and E|(∂/∂Si )g(S)| < ∞, i = 1, . . . , p with S = (S1 , . . . , Sp ). (A4) The function g(·) is homogeneous of order −1. √ bS Proposition 1 Assume (A1)-(A4) and λn = O( n). Then, the shrinkage estimator β n has smaller L2 -risk than the LASSO, for all g(·) satisfying the following inequality L b )k2 + 2σ 2 tr CE [∇g(Z)] < 0, nkg(β n almost everywhere in g. (2.4) where ∇g(S) = (∂g(S)/∂S1 , . . . , ∂g(S)/∂Sp )> and Z ∼ Np (0, σ 2 C −1 ). Proof. Consider the difference in L2 -risk given by h i h i b L − βk2 − lim E nkβ b S − βk2 lim E nkβ n n n→∞ n h i n→∞ h io L 2 L b )k + 2E n(β b − β)> g(β b L) = − lim E nkg(β n n n D = n→∞ (2.5) √ √ Since λn is n-consistent, i.e., λn = O( n), using Theorem 1 of Knight and Fu (2000), √ bL √ bL D n(β n − β) → Np (0, σ 2 C −1 ). Let Z = n(β n − β). Using homogeneity of g(·) and Lemma 1 of Liu (1994), we obtain i L L > b b lim E n(β n − β) g(β n ) = h n→∞ 4 lim n→∞ √ √ nE Z > g( nZ) ACCEPTED MANUSCRIPT ACCEPTED MANUSCRIPT = lim E Z > g(Z) n→∞ 2 = σ lim tr CE [∇g(Z)] n→∞ (2.6) Substituting (2.6) in (2.5) gives n o b L )k2 + 2σ 2 tr CE [∇g(Z)] D = − lim E nkg(β n n→∞ Since the expected value of a negative random variable is negative, the result follows. Downloaded by [University of Florida] at 07:04 27 October 2017 Remark 1 In Proposition 1, the condition (2.4) can be rewritten as L b )k2 + 2σ 2 tr CE [∇g(Z)] < 0, lim nkg(β n almost everywhere in g n→∞ since σ 2 limn→∞ tr CE [∇g(Z)] = σ 2 tr CE [∇g(Z)]. Remark 2 The result of Liu (1994), which was used in the proof of Proposition1, is based on the identity of Stein (1981). Hence, the statistical properties of the shrinkage b S can be derived under √n-consistency of λn and the results of Stein (1981). estimator β n Therefore, we concentrate on Stein-type shrinkage estimators including the well-known Baranchik (1970) as relevant candidates. 2 b L )> (X > X)β b L /b Now, let a = (n − p)(p − 2)/(n − p + 2), Wn = (β b2 is a consistent n n σ and σ estimator of σ 2 and X = (x1 , . . . , xn )> . According to Remark 2, the well-known SteinL L b ) = −aW −1 β b , for small enough type shrinkage estimator is obtained if we take g(β n n n a. However, incorporating such function in (2.3), gives an estimator with undesirable properties. Apparently as soon as Wn < a, the proposed estimator changes the sign of LASSO. On the other hand, the new estimator does not scale LASSO component-wise. b L = (βbL , . . . , βbL )> , we define the Stein-type LASSO (SL) estimator with Hence, for β n 1n pn form b SL = β n > L 1 − aWn−1 βbjn |j = 1, . . . , p . 5 (2.7) ACCEPTED MANUSCRIPT ACCEPTED MANUSCRIPT To avoid negative values, the positive part of SL, namely positive rule Stein-type LASSO (PRSL) will be defined as b PRSL = β n 1 − aWn−1 + L βbjn |j = 1, . . . , p > , (2.8) where b+ = max(0, b). Then, the L2 -risk difference is given by Downloaded by [University of Florida] at 07:04 27 October 2017 b SL ) − R(β; β b PRSL ) D1 = R(β; β n n 2 X L −1 2 = − lim n I (Wn < a) βbjn E 1 − aWn n→∞ j i X h L bL −1 b +2 lim n E 1 − aWn I (Wn < a) βjn (βjn − βj ) n→∞ j < 0, since for values Wn < a, 1 − aWn−1 < 0 and the expected value of a positive random variable is always positive. Hence the positive part of SL has uniformly smaller L2 -risk compared to SL. Following Baranchik (1970), we also investigate the performance of the following alternative candidates b SL2 = β n a 1− Wn + 1 L βbjn |j = 1, . . . , p > (2.9) and b SL3 β n = ar (Wn ) 1− Wn L βbjn |j where r(x) is a concave function w.r.t x, i.e., r(x) = √ > = 1, . . . , p (2.10) x or r(x) = log |x|. These two later estimators are only considered in the real example. In forthcoming section, we investigate the performance of the PRSL estimator compared to the LASSO, via a Monte Carlo simulation. 6 ACCEPTED MANUSCRIPT ACCEPTED MANUSCRIPT 3 Simulation In this section we conduct a Monte Carlo simulation study to evaluate the performance of the PRSL with respect to the LASSO of Tibshirani (1996). We generate the vector of responses from following model: Yi = β1 x1i + . . . + βp xpi + i , i = 1, . . . , n, (3.11) where E(i |xi ) = 0 and E(2i ) = 1. Furthermore, we generated the predictors xij and Downloaded by [University of Florida] at 07:04 27 October 2017 errors i from N (0, 1). We consider the sample size n ∈ {50, 100} and the number of predictor variables p ∈ {10, 20, 30}. We used the similar scheme of Hansen (2007, 2016). √ Hence, we consider the regression coefficients are set βj = c 2αj −α/2 with α = 0.1, 0.5, 1 for j = 1, . . . , p. The larger values of α indicates that the coefficients βj decline more quickly with j. Also, the value of c controls the population R2 = c2 /(1 + c2 ), and is selected on a 20-point grid in [0, R2 ]. The number of simulations is initially varied. Finally, each realization is repeated 1000 times to obtain stable results. For each realization, we calculated the MSE of suggested estimators. All computations were conducted using the software R. b ∗ was evaluated by using MSE criterion, scaled by The performance of an estimator β n the MSE of LASSO so that the values of relative MSE (RMSE), is given by ∗ b ∗ MSE β n b L . RMSE β n = b MSE β (3.12) n If the RMSE is less than one, then it indicates performance superior to the LASSO. The results are reported graphically in Figure 1 for the ease of comparison. The figure has six panel plots which correspond to three values of α for n = 50, 100 and p = 10, 20, 30, and presents the RMSE values of the estimators in Equation 3.12 as a function of the population R2 . According to these plots, we can see clear trends. For example, in Figure 1(b), if the R2 varies from 0 to 0.1, then the PRSL has the smallest RMSE when α = 0.1, which indicates that it performs better than LASSO, followed by 7 ACCEPTED MANUSCRIPT ACCEPTED MANUSCRIPT 1.0 1.2 0.8 1.0 0.2 0.4 0.6 0.0 0.8 0.0 0.2 0.4 0.6 0.4 0.8 0.0 0.2 0.4 0.6 R2 R2 (d) n = 100, p = 10 (e) n = 100, p = 20 (f) n = 100, p = 30 0.6 0.8 1.2 1.0 0.8 0.2 0.4 0.6 0.8 0.4 0.2 0.2 0.0 LASSO PRSL(α = 0.1) PRSL(α = 0.5) PRSL(α = 1) 0.0 0.4 0.6 0.6 0.4 R2 LASSO PRSL(α = 0.1) PRSL(α = 0.5) PRSL(α = 1) 0.0 0.4 0.2 0.2 0.8 RMSE RMSE 0.6 RMSE 0.8 0.8 1.0 1.0 1.2 R2 LASSO PRSL(α = 0.1) PRSL(α = 0.5) PRSL(α = 1) 0.0 LASSO PRSL(α = 0.1) PRSL(α = 0.5) PRSL(α = 1) 0.2 0.2 LASSO PRSL(α = 0.1) PRSL(α = 0.5) PRSL(α = 1) 0.0 0.4 0.6 0.6 RMSE RMSE 0.8 0.8 0.6 RMSE 0.4 0.2 LASSO PRSL(α = 0.1) PRSL(α = 0.5) PRSL(α = 1) 0.0 Downloaded by [University of Florida] at 07:04 27 October 2017 (c) n = 50, p = 30 1.2 (b) n = 50, p = 20 1.0 (a) n = 50, p = 10 0.0 0.2 R2 0.4 0.6 0.8 R2 Figure 1: The RMSEs of suggested estimator for different values of α when R2 ∈ [0, 0.8] the PRSL when α = 0.5 and α = 1. On the other hand, for the intermediate values of R2 , the performance of PRSL is less efficient than the performance of LASSO. As summary, the performance of PRSL is more efficient than LASSO for the small values of population R2 , and it looses its efficiency when we increase in small amounts R2 , and finally the relative performance of all estimators become almost similar when R2 is close to 0.8. 4 Prostate Data Prostate data came from the study of Stamey et. al (1989) about correlation between the level of prostate specific antigen (PSA), and a number of clinical measures in men who were about to receive radical prostatectomy. The data consist of 97 measurements on the following variables: log cancer volume (lcavol), log prostate weight (lweight), age (age), log of benign prostatic hyperplasia amount (lbph), log of capsular penetration (lcp), seminal vesicle invasion (svi), Gleason score (gleason), and percent of Gleason scores 4 or 5 (pgg45). The idea is to predict log of PSA (lpsa) from these measured variables. A descriptions of the variables in this dataset is given in Table 1. 8 ACCEPTED MANUSCRIPT ACCEPTED MANUSCRIPT Downloaded by [University of Florida] at 07:04 27 October 2017 Variables lpsa lcavol lweight age lbph svi lcp gleason pgg45 Table 1: Discription of the variables of prostate data Description Remarks Log of prostate specific antigen (PSA) Response Log cancer volume Log prostate weight Age Age in years Log of benign prostatic hyperplasia amount Seminal vesicle invasion Log of capsular penetration Gleason score A numeric vector Percent of Gleason scores 4 or 5 Table 2: Estimation coefficients of the variables √ of prostate data LASSO PRSL SL2 SL3(r(x) = x) SL3(r(x) = log |x|) coef 2.478 2.294 2.303 0.852 1.691 lcavol 0.472 0.437 0.438 0.162 0.322 lweight 0.186 0.173 0.173 0.064 0.127 age 0.000 0.000 0.000 0.000 0.000 lbph 0.000 0.000 0.000 0.000 0.000 svi 0.368 0.340 0.342 0.126 0.251 lcp 0.000 0.000 0.000 0.000 0.000 gleason 0.000 0.000 0.000 0.000 0.000 pgg45 0.000 0.000 0.000 0.000 0.000 PE 3.316 2.533 2.540 2.338 1.112 RPE 1.000 0.764 0.766 0.705 0.335 Our results are based on 1000 case resampled bootstrap samples. Since there is no noticeable variation for larger number of replications, we did not consider further values. The performance of an estimator is evaluated by its prediction error (PE) via 10-fold cross validation (CV) for each bootstrap replicate. In order to easily compare, we also calculated the relative prediction error (RPE) of an estimator with respect to the prediction error of the LASSO. If the RPE of an estimator is less than one, then its performance is superior to the LASSO. In Table 2, we report both the estimation coefficient and the PEs of the five methods. According to these results, all suggested estimators outperform the LASSO. Figure 2 shows each estimates as a function of standardized bound s = |β|/max|β|. The vertical line represents the model for sb = 0.44, the optimal value selected “one standard error” rule with 10-fold CV, in which we choose the most parsimonious model whose error is no more than one standard error above the error of the best model. So, all methods gave non-zero coefficients to lcavol, lweight and svi. Also, Figure 3 shows box 9 ACCEPTED MANUSCRIPT ACCEPTED MANUSCRIPT LASSO 0.8 Predictors lcavol 0.6 lweight Coeffecients age lbph svi 0.2 lcp gleason 0.0 pgg45 0.00 0.25 0.50 s 0.75 1.00 Coeffecients SL2 Coeffecients PRSL 0.6 0.4 0.2 0.0 0.6 0.4 0.2 0.0 0.00 0.25 0.50 s 0.75 1.00 0.00 SL3 when r(x)= x 0.25 0.50 s 0.75 1.00 SL3 when r(x)=log( x ) 0.5 0.4 0.3 0.2 0.1 0.0 −0.1 Coeffecients Coeffecients Downloaded by [University of Florida] at 07:04 27 October 2017 0.4 0.2 0.1 0.0 0.00 0.25 0.50 s 0.75 1.00 0.00 0.25 0.50 s 0.75 1.00 Figure 2: The estimation of coefficients versus s tuning parameter of each methods. Here s is selected via 10-fold CV. The vertical line sb = 0.44 is selected by “one standard error” rule. 10 ACCEPTED MANUSCRIPT ACCEPTED MANUSCRIPT LASSO 1.00 Values 0.75 0.25 0.00 age gleason lbph lcavol lcp Predictors PRSL 1.00 lweight pgg45 svi SL2 1.00 0.75 Values Values 0.75 0.50 0.50 0.25 0.25 0.00 0.00 age gleason lbph lcavol lcp Predictors lweight pgg45 svi age gleason lbph SL3 when r(x)= x lcavol lcp Predictors lweight pgg45 svi SL3 when r(x)=log( x ) 0.4 0.6 0.3 Values Values Downloaded by [University of Florida] at 07:04 27 October 2017 0.50 0.2 0.4 0.1 0.2 0.0 0.0 −0.1 age gleason lbph lcavol lcp Predictors lweight pgg45 svi age gleason lbph lcavol lcp Predictors lweight pgg45 svi Figure 3: Box plots of 1000 bootstrap values of the listed methods coefficient estimates for the eight predictors in the prostate cancer example 11 ACCEPTED MANUSCRIPT ACCEPTED MANUSCRIPT plots of 1000 bootstrap replications of each methods with sb = 0.44. And, the results are consistent with Tibshirani (1996). 5 Conclusions In this paper, we employed the shrinkage idea of Stein (1981) to shrink the LASSO of Tibshirani (1996) more. Hence, under the concept of double shrinking, we proposed a double shrinkage estimator namely Stein-type LASSO. Some other similar double shrinkage Downloaded by [University of Florida] at 07:04 27 October 2017 estimators including the positive part of Stein-type LASSO also proposed as alternative options. Performance analysis of the proposed estimators investigated through a MonteCarlo simulation as well as a real data analysis. The new set of estimators propose smaller L2 -risk compared to the LASSO. Moreover, the prostate cancer data analysis illustrated that the Stein-type LASSO estimators have smaller prediction error compared to the LASSO. Regarding the function g(·) in (2.3), numerical analysis illustrated that concave and differentiable functions behave superiorly. Further, our proposal will also work for the minimizer of (2.2) for all values γ > 0, including the ridge regression estimator and subset selector. Hence, the proposed methodology can be applied for other estimators. Apart from this, there are many competitors to the LASSO in the context of variable selection, where we only focused on LASSO for the purpose of defining double shrinking idea. For further research, one can use this method to define double shrunken estimator other than the Stein-type LASSO. As such one can define the Stein-type SCAD (Fan and Li, 2001) adaptive LASSO (Zou, 2006), MCP (Zhang, 2010) and group LASSO (Yuan and Lin, 2006) estimators. Acknowledgments We would like to thank two anonymous referees for their valuable and constructive comments which significantly improved the presentation of the paper and led to put many details. 12 ACCEPTED MANUSCRIPT ACCEPTED MANUSCRIPT References Ahmed, S.E. (2014). Penalty, Shrinkage and Pretest Strategies: Variable Selection and Estimation, Springer, New York. Ahmed, S.E. (2001). Shrinkage estimation of regression coefficients from censored data with multiple observations. In Ahmed, S. E. and Reid, N., (Eds.), Lecture Notes in Statistics, 148, 103–120. SpringerVerlag, New York. Ahmed, S.E., Kareev, I., Suraphee, S., Volodin, A., and Volodin, I. (2015) Confidence sets based on the positive part James-Stein estimator with the asymptotically constant coverage probability, J. Statist. Downloaded by [University of Florida] at 07:04 27 October 2017 Comp. Sim., 85(12), 2506-2513. Ahmed, S.E. and Fallahpour, S. (2012). Shrinkage Estimation Strategy in Quasi-Likelihood Models, Statist. Probab. Lett., 82, 2170–2179. Ahmed, S.E., Doksum, K.A., Hossain, S. and You, J. (2007). Shrinkage, pretest and absolute penalty estimators in partially linear models, Aust. New Zealand J. Statist., 49(4), 435-454. Asar, Y. (2017). Some new methods to solve multicollinearity in logistic regression, Comm. Statist. Sim. Comp., 46, 2576-2586. Ali, A. M., and Ehanses Saleh, M. A. (1990). Estimation of the mean vector of a multivariate normal distribution under symmetry. Journal of Statistical Computation and Simulation, 35(3-4), 209-226. Baranchik, A.J. (1970). A family of minimax estimators of the mean of a multivariate normal distribution, Ann. Math. Statist., 41(2), 642-645. Chang, S. -M. (2015). Double shrinkage estimators for large sparse covariance matrices, J. Statist. Comp. Sim., 85(8), 1497-1511 Fan, J., and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties, J. Amer. Statist. Assoc., 96(456), 1348-1360. Groβ, J. (2012). Linear regression, Springer Science & Business Media. Hansen, B. E. (2007). Least squares model averaging, Econometrica, 75(4), 1175-1189. Hansen, B. E. (2016). The risk of James–Stein and Lasso shrinkage, Econometric Rev., 35, 1456-1470. James, W. and Stein, C. (1961). Estimation of quadratic loss, Proc. Fourth Berkeley Symp. Math. Statist. Prob., 1, 361-379. 13 ACCEPTED MANUSCRIPT ACCEPTED MANUSCRIPT Knight, K. and Fu, W. (2000). Asymptotics for LASSO-type estimators, Ann. Statist., 10, 1356-1378. Liu, J.S. (1994). Siegel’s formula via Stein’s identities, Statist. Prob. Lett., 21, 247-251. Roozbeh, M. and Arashi, M. (2016). Shrinkage ridge regression in partial linear models, Comm. Statist. Sim. Comp., 45(20), 6022-6044. Saleh. A. K. Md. Ehsanes. (2006). Theory of Preliminary Test and Stein-Type Estimation with Applications, Wiley; United Stated of America. Saleh, A. K. Md. Ehsanes and Raheem, E. (2015a). Improved LASSO, arXiv:1503.05160v1, 1-46. Downloaded by [University of Florida] at 07:04 27 October 2017 Saleh, A. K. Md. Ehsanes and Raheem, E. (2015b). Penalty, Shrinkage, and Preliminary Test Estimators under Full Model Hypothesis, arXiv:1503.06910, 1-28. Stamey, T.A., Kabalin, J.N., McNeal, J.E., Johnstone, I.M., Freiha, F., Redwine, E.A. and Yang, N. (1989). Prostate specific antigen in the diagnosis and treatment of adenocarcinoma of the prostate: II. radical prostatectomy treated patients, J Urology, 141(5), 1076-1083. Stein, C. (1981). Estimation of the mean of a multivariate normal distribution, Ann. Statist., 9, 1135-1151. Tibshirani, R. (1996). Regression shrinkage and selection via the LASSO, J. R. Stat. Soc. Ser. B. Stat. Methodol., 58(1), 267-288. Yüzbaşı, B., Ahmed, S.E. and Gungor, M. (2017). Improved Penalty Strategies in Linear Regression Models, REVSTAT–Statist. J., 15(2), 251-276. Yüzbaşı, B., and Ahmed, S.E. (2016). Shrinkage and penalized estimation in semi-parametric models with multicollinear data, J. Stat. Comput. Simul., 86(17), 3543-3561. Yuan, M., and Lin, Y. (2006). Model selection and estimation in regression with grouped variables, J. R. Stat. Soc. Ser. B. Stat. Methodol., 68(1), 49-67. Zhang, C. H. (2010). Nearly unbiased variable selection under minimax concave penalty, Ann. Statist., 38(2), 894-942. Zou, H. (2006). The adaptive lasso and its oracle properties, J. Amer. Statist. Assoc., 101(476), 1418-1429. 14 ACCEPTED MANUSCRIPT

1/--страниц