DISCRIMINANT ANALYSIS — A CONCEPTUAL UNDERSTANDING

Published in

Analytics Vidhya

4 min readJan 28, 2021

Discriminant Analysis is a classification technique that deals with the data with a response variable and predictor variables. It is mainly used to classify the observation to a class or category based on the independent variables of the data. The two types of Discriminant Analysis: Linear Discriminant Analysis and Quadratic Discriminant Analysis.

Linear Discriminant Analysis (LDA):

It is a supervised technique and tries to predict the class of Dependent Variable using the linear combination of Independent Variables. It assumes that the independent variables are normally distributed (continuous and numerical) and equal variance/ covariance for the classes. This technique can be used both for classification and dimensionality reduction. When these assumptions are satisfied, LDA creates a Linear Decision Boundary.

This line can clearly discriminate between 0s and 1s in the dataset. The objective of LDA is to therefore argue the best line that separates 0s and 1s. However, even when these assumptions are violated, LDA performs well.

LDA Technique:

DS = β0 + β1*X1 + β2*X2 + — — + βk*Xk

Where

DS: Discriminant Score

β’s: Discriminant Weights/ Coefficients

X’s: Independent Variables

Weights are estimated so that groups are separated as clearly as possible of the discriminant functions. LDA constructs an equation that minimizes the possibility of misclassifying cases into their respective classes.

Assumptions of LDA:

1. Multivariate normality- Independent variables should be normally distributed for all labels.

2. Equal variance and covariance for all classes.

3. No multicollinearity and if present, it needs to be treated if the outcome is affected.

4. All the samples in the data should be independent of each other.

Criteria for LDA to Perform Well:

1. Minimizes the possibility of misclassifying cases into their respective classes.

2. Distance of points from the line i.e., how far away are the lag points are from the separating line.

3. Probability of being on the LHS and the probability of being on the RHS.

Standardized, Unstandardized and Structure Coefficients:

The main purpose to standardize the variable (where mean becomes 0, standard deviation becomes 1 and covariance becomes correlation) to bring in numerical stability. If the independent variables have units, then βs will inherit the reciprocals of the corresponding units. So as to make βs free from units, the original independent variables are standardized. The higher the values of βs imply the more important the corresponding independent variables become to distinguish between the class of dependent variables. It also shows us the variable importance. LDA prefers that the independent variables are more correlated, unlike linear regression.

When is LDA Used?:

1. When the classes are well separated. Logistic regression lacks stability when the classes are well separated, that is when LDA comes to the rescue.

2. When the data is small, LDA is more efficient.

3. When we have more than two classes, LDA is a better choice. In the case of binary classification, both logistic regression and LDA can be applied.

Steps to Perform LDA:

1. Compute the d-dimensional mean vectors for the different classes of the dataset.

2. Calculate the between-class variance i.e., the separability between the mean of different classes.

3. Calculate the within-class variance i.e., the separability between the mean and sample of each class.

4. Compute the Eigen Vectors and the corresponding Eigen Values for the scatter matrices. An eigenvector corresponding to real non-zero eigenvalue points in a direction that is stretched by the transformation and the eigenvalue is the factor by which it is stretched. A negative eigenvalue implies the direction is reversed.

5. Sort the Eigen Vectors by decreasing Eigen Values and choose k-Eigen Vectors with the largest Eigen Values to form a (n x k) dimensional matrix.

6. Construct a lower-dimensional space projection using Fisher’s criterion which maximizes the between-class variance and minimizes the within-class variance.

How Does LDA Make Predictions?:

LDA model uses Bayes Theorem to estimate the probability. They make predictions upon the probability that a new input dataset belongs to each class. The class with the highest probability is the output class and then LDA makes a prediction. The prediction is made simply by the use of the Bayes Theorem which estimates the probability of the output given the input. They also make use of the probability of each class and also the data belonging to that class.

Comparison of LDA with Other Techniques:

1. LDA, ANOVA, and Regression analysis express the dependent variable as a linear combination of independent variables.

2. The dependent variable in LDA is Categorical and independent variables are continuous. ANOVA uses a categorical independent variable and a continuous dependent variable.

3. LDA is closely related to PCA and Factor Analysis as both are linear transformation techniques i.e., they look for the linear combination of variables that best explain the data.

4. LDA is a supervised technique and PCA is an unsupervised technique as it ignores class labels.

Applications of LDA:

1. Separation of 0s and 1s.

2. Recognition of objects

3. Pattern recognition tasks

Quadratic Discriminant Analysis:

This is a variant of LDA and uses quadratic combinations of independent variables to predict the class in the dependent variable. It does not assume equal covariance of the classes, but the assumption of Normal Distribution still holds. QDA creates a quadratic decision boundary.

DS = β1*X1 + β2*X2 + β3*X12 + β4*X22 + β5*X1*X2

However, QDA cannot be used for dimensionality reduction.