Abstract:
The objectives of this research were :
(1) To modify the Multistage procedure of Marasinghe (1985) to be Two - phase Multistage procedure that could detect outliers either in response variable or in independent variables. In phase I, suspected outliers were identified using the Multistage procedure for response variable and the Cook's distance for independent variables. In phase ll, each observation in the suspected set was tested for its outlyingness using GESR (max|r¡|) statistic.
(2) To compare percentages of exactitude in outliers detection between Two - phase GESR (Paul and Fung 1991) and Two - phase Multistage procedure. The simulation study based on simple linear regression were conducted. The regression coefficient (β₁) was set at 1, 5 and 10. The sample size (n) was set 15, 25 and 50. The outliers were introduced by adding known value δ₁, δ₂ (0, 5, -5, 9 or -9) to two values of response variable and δ₃, δ₄ (0, 2, -2, 4 or -4) to two values of independent variable, randomly. Each experiment has 1000 replications.
The results of the study were :
(1) The Two - phase Multistage procedure often gives slightly higher percentages of exactitude in outliers detection than the Two - phase GESR procedure.
(2) When β₁ = 5 and 10, both procedures are able to detect outliers in both response variable and independent variable very well ( more than 90 % ). When β₁ = 1, both procedures fail to detect outliers in independent variable. However, when sample sizes increase, the percentages of outliers detection increase, obviously.
(3) The impact of the position of outliers, in the same side or different side of mean and their distances to the mean are not obvious. Except when β₁ = 1, outliers in independent variable were detected with lage distance from the mean give higher percentages of exactitude in outliers detection than those with the small distance from the mean.