Abstract:
This research is aimed to compare the screening variables of Lasso, Adaptive Lasso, Elastic net and SCAD for the Multi - Split to find p-values in the regression analysis for high dimensional data. To analyze from the number of non-zero coefficients, false positives and false negatives after controlling False Discovery Rate (FDR) were collected and analyzed based on simulated data. The sample size are 10, 100 and 200. The numbers of non-zero coefficients is not equal to 0 are set to 10, 20 and 50 percent of sample size and the correlation among independent variables are 0, 0.5 and 0.9. he simulating and analyzing data in this study used the R 3.0.3 . It uses The False Positive (FP), The False Negative (FN) and the number of coefficients of independent variables is not equal to 0 by hypothesis testing after control by FDR., which is not equal to 0, that use as a tool to compare and performance measurement.The study showed that within the scope of the case considering the sample size of 10 .The tables of the number of coefficients of independent variables is not equal to 0 by hypothesis testing after control by FDR, FP and FN shows the value of the number of coefficients of independent variables is not equal to 0 by hypothesis testing after control by FDR and FN are go to the same direction. That is data screening by Adaptive Lasso are the most appropriate. On the other hand, in the table of FP data screening by Lasso, this will get to the right value but the value will not very clear. In case of the sample size are 100 and 200, the data screening by Adaptive Lasso and SCAD are the most appropriate but from the table of FP will approach Lasso and appropriate EN, which showed that Lasso and EN are effective to the data screening,that is less than Adaptive Lasso and SCAD.