Complete separation data is something like below:
There is complete separation because all of the cases in which Y is 0 have X values equal to or less than 4, and the cases in which Y is 1 have X values equal to or greater than 5. In other words, Maximal value in one group is less than the minimal value in another group. When maximal value in one group is equal to the minimal value in another group, quasi-complete separation data may occur.
If the explanatory variable is categorical, complete separation of data points could be something like this:
Response Failure Success
0 25 0
1 0 21
Where There are no successes when the value of the predictor variable is 0, and there are no failures when the value of the predictor variable is 1.
For maximum likelihood estimates to exist, there must be some overlaps in the two distributions. Since logistic regression models uses maximum likelihood estimates, when there is no overlaps of data points between two groups, the results from logistic regression models are unreliable and should not be credited.
Starting from SAS version 9.2, Proc Logistic provides Firth estimation for dealing with the issue of quasi or complete separation of data points.
model y = x /firth;
However, even after Firth estimation, the results should still be interpreted with extreme caution. Complete separation and quasi-complete separation of the data points may occur when the sample size is small and number of data points is not large or in the situation the samples are determined by the outcome (i.e., response) rather than explanatory variables – we see many publications where the analysis is based on the responders vs. non-responders.
When complete separation or quasi-complete separation occurs, for multivariate regression, the explanatory variable causing this situation should be identified and preferably excluded from the model. For univariate regression, other alternative statistical tests (for example group t-test) should be used.
- Computation of the Odds Ratio with Small or Zero Cell Counts by Dr Robin High
- Convergence Failures in Logistic Regression by Paul Allison
- A tutorial on logistic regression by Ying So
- What is new in SAS 9.2?