In regression setting, there are several approaches in detecting the outliers. One of the approaches is to utilize the ‘standardized residual’ or ‘studentized resitual’. In linear regression, an outlier is an observation with large residual. In other words, it is an observation whose dependent-variable value is unusual given its values on the predictor variables.
The studentized residual is the quotient resulting from division of a residual by an estimate of its standard deviation. Just like the standard deviation, the studentized residual is very useful in detecting the outliers. For values outside the 3, 4, or 5 times standard deviation, we may have reasonable doubt that the values are outliers. In regression setting, observed values outside 3, 4, or 5 times the studentized residual are the targets for outliers.
In SAS, two regression procedures can be easily utilized to compute the studendized residual for detecting outliers. PROC REG and PROC GLM. The studentized residual is labelled as RSTUDENT in Output statement. Other regression procedure (such as PROC MIXED) also compute studentized residual as part of Influence test.
output out=newdata rstudent=xxx;
Further readings:
Regression with SAS - Regression Diagnostics
SAS version 9.3 PROC REG
SAS version 9.3 PROC GLM
No comments:
Post a Comment