You are here

Predictive discriminant analysis versus logistic regression for two-group classification problems in educational settings

Download pdf | Full Screen View

Date Issued:
1996
Summary:
The cross-validated classification accuracy of predictive discriminant analysis (PDA) and logistic regression (LR) models was compared for the two-group classification problem. Thirty-four real data sets varying in number of cases, number of predictor variables, degree of group separation, relative group size, and equality of group covariance matrices were employed for the comparison. PDA models were built based on assumptions of multivariate normality and equal covariance matrices, and cases were classified using Tatsuoka's (1988, p. 351) minimum chi square rule. LR models were built using the International Mathematical and Statistical Library (IMSL) subroutine Categorical Generalized Linear Model (CTGLM), available with the 32-bit Microsoft Fortran v4.0 Powerstation. CTGLM uses a nonlinear approximation technique (Newton-Raphson) to determine maximum likelihood estimates of model parameters. The group with the higher log-likelihood probability was used as the LR prediction. Cross-validated hit-rate accuracy of PDA and LR models was estimated using the leave-one-out procedure. McNemar's (1947) statistic for correlated proportions was used in the statistical comparisons of PDA and LR hit rate estimates for separate-group and total-sample proportions (z = 2.58, a =.01). Total-sample and separate-group cross-validated classification accuracy obtained by PDA was not significantly different from that obtained by LR in any of the 31 data sets for which maximum likelihood estimates of LR model parameters could be calculated. This was true regardless of assumptions made about population sizes (i.e., equal or unequal). Neither theoretical nor data-based considerations were helpful in predicting these results. Although it does not appear from these data to make a difference which classification model is used, use of the method described in this study for comparing PDA and LR models will enable researchers to select the optimal classification model for a specific data set, regardless of data conditions.
Title: Predictive discriminant analysis versus logistic regression for two-group classification problems in educational settings.
324 views
88 downloads
Name(s): Meshbane, Alice.
Florida Atlantic University, Degree grantor
Morris, John D., Thesis advisor
Type of Resource: text
Genre: Electronic Thesis Or Dissertation
Issuance: monographic
Date Issued: 1996
Publisher: Florida Atlantic University
Place of Publication: Boca Raton, Fla.
Physical Form: application/pdf
Extent: 79 p.
Language(s): English
Summary: The cross-validated classification accuracy of predictive discriminant analysis (PDA) and logistic regression (LR) models was compared for the two-group classification problem. Thirty-four real data sets varying in number of cases, number of predictor variables, degree of group separation, relative group size, and equality of group covariance matrices were employed for the comparison. PDA models were built based on assumptions of multivariate normality and equal covariance matrices, and cases were classified using Tatsuoka's (1988, p. 351) minimum chi square rule. LR models were built using the International Mathematical and Statistical Library (IMSL) subroutine Categorical Generalized Linear Model (CTGLM), available with the 32-bit Microsoft Fortran v4.0 Powerstation. CTGLM uses a nonlinear approximation technique (Newton-Raphson) to determine maximum likelihood estimates of model parameters. The group with the higher log-likelihood probability was used as the LR prediction. Cross-validated hit-rate accuracy of PDA and LR models was estimated using the leave-one-out procedure. McNemar's (1947) statistic for correlated proportions was used in the statistical comparisons of PDA and LR hit rate estimates for separate-group and total-sample proportions (z = 2.58, a =.01). Total-sample and separate-group cross-validated classification accuracy obtained by PDA was not significantly different from that obtained by LR in any of the 31 data sets for which maximum likelihood estimates of LR model parameters could be calculated. This was true regardless of assumptions made about population sizes (i.e., equal or unequal). Neither theoretical nor data-based considerations were helpful in predicting these results. Although it does not appear from these data to make a difference which classification model is used, use of the method described in this study for comparing PDA and LR models will enable researchers to select the optimal classification model for a specific data set, regardless of data conditions.
Identifier: 12461 (digitool), FADT12461 (IID), fau:9355 (fedora)
Collection: FAU Electronic Theses and Dissertations Collection
Note(s): Thesis (Ed.D.)--Florida Atlantic University, 1996.
College of Education
Subject(s): Discriminant analysis
Regression analysis
Logistic distribution
Education, Higher--Research
Held by: Florida Atlantic University Libraries
Persistent Link to This Record: http://purl.flvc.org/fcla/dt/12461
Sublocation: Digital Library
Use and Reproduction: Copyright © is held by the author with permission granted to Florida Atlantic University to digitize, archive and distribute this item for non-profit research and educational purposes. Any reuse of this item in excess of fair use or other copyright exemptions requires permission of the copyright holder.
Use and Reproduction: http://rightsstatements.org/vocab/InC/1.0/
Host Institution: FAU
Is Part of Series: Florida Atlantic University Digital Library Collections.