This Tutorial has extensive knowledge for beginners and advance programmers and contains practical examples and commercial applications.  It covers topics from linear algebra, calculus, probability, and statistics and applied to Machine Learning. This purpose of this guide is to overcome the situation where most of the books are theoretical and difficult for the readers have no prior knowledge of statistics or mathematics.

The Machine Learning in Computer Vision needs a set of powerful classes for statistical classification, regression and clustering of data. Several of the algorithms are implemented as C++ classes.



OpenCV 2.4 or later

Principal Component Analysis

It is an algorithm to identify patterns in the data and finding the similarities and differences. Recognizing patterns in high dimensional data can be hard, PCA is useful tool for analyzing data and used for dimensionality reduction. It can be used for image compression or in face recognition process to reduce the high dimensions in data.


Example in MATLAB/Octave

Create some data in X :
X = [2.5 2.4;0.5 0.7;2.2 2.9;1.9 2.2;3.1 3.0;2.3 2.7;2 1.6;1 1.1;1.5 1.6;1.1 0.9];

plot(X(:,1), X(:,2), ‘+’);

Figure 1: Plot of Actual data

//Adjust Data: calculate and subtract mean from Data
mu =mean(X, 1);

1.81  1.91

Xm = X - repmat(mu, [size(X, 1) 1])

Figure 2: Adjusted Data
// Find the Covariance Matrix
C =(Xm' * Xm) / (size(X, 1) -1);


0.61656  0.61544
0.61544  0.71656


% Compute eigenvectors and eigenvalues

[v, D] =eig(C);

Figure 3: EigenVectors and Eigen Values

% Sort the eigenvalues

[D order] = sort(diag(D), 'descend'); 
V = V(:,order);

% Plot Mean adjusted data with eigenvectors overlayed

plot(Xm(:,1), Xm(:,2), ‘+’); hold on;
xlim([-2 2])
ylim([-2 2])

%Plot the eigenvectors

A = 10*V;
plot([-A(1,1) A(1,1)],[-A(1,2) A(1,2)],'b-.');
plot([-A(2,1) A(2,1)],[-A(2,2) A(2,2)],'b-.');
plot([0 0],[-2,2],'k:');
plot([-2 2],[0,0],'k:');

 Figure 4: A plot of Mean adjusted data with the eigenvectors overlayed on top.

Data Transformed using Two Eigenvectors

newX = Xm*V(:,1:end);

                            Figure 5: Transformed Data using 2 EigenVectors 
newX(:,2), ‘r+’);
Cnew = (newX' * newX) / (size(X, 1) -1);
[Vnew Dnew] = eig(Cnew);
hold on
A =10 * Vnew';
plot([-A(1,1) A(1,1)],[-A(1,2) A(1,2)],'g:');
plot([-A(2,1) A(2,1)],[-A(2,2) A(2,2)],'b:');
title(‘Transformeddata with 2 eigenvectors');

Figure 6: Plot of Transformed data using 2 Eigenvectors

Restore data using single eigenvector

% project on pc1

z =Xm*V(:,1);
%and reconstruct it

p =z*V(:,1)';
p =p + repmat(mu, [size(X, 1) 1])

 Figure 7: Restored data using PC1


%project on both Eigenvectors

z =Xm*V;

% and reconstruct it
p = z*V';
p = p + repmat(mu, [size(X, 1) 1])

Figure 8: Restored Original data using all Principal Components

Complete Code for PCA function in Octave

function [V newX D] = pca(X)
 mu = mean(X);
 Xm =X - repmat(mu, [size(X, 1) 1]);
 C = (Xm' * Xm) / (size(X, 1) -1);
 [V D] = eig(C);
 [D order] = sort(diag(D), 'descend'); 
 V = V(:,order);
newX = Xm*V(:,1:end);


The problem with the image representation we are given is its high dimensionality. Two-dimensional p x q grayscale images span a m = pq-dimensional vector space, so an image with 100 x 100 pixels lies in a 10,000-dimensional image space.


The PCA method finds the directions with the greatest variance in the data, called principal components.

#include <opencv2/highgui/highgui.hpp> 
using namespace cv; 
using namespace std; 
// Normalizes images in range between 0 and 255. 
Mat normalize(const Mat& src) { 
    Mat srcnorm; 
    normalize(src, srcnorm, 0255, NORM_MINMAX, CV_8UC1); 
    return srcnorm; 
int main(int argc, const char *argv[]) { 
    // Holds the images: 
    vector<Mat> db; 
    // create a matrix with the data in row: 
    int total = db[0].rows * db[0].cols; 
        Mat mat(total, db.size(), CV_32FC1); 
        for(int i = 0; i < db.size(); i++) { 
        Mat X = mat.col(i); 
        db[i].reshape(1, total).col(0).convertTo(X, CV_32FC1, 1/255.); 
    // Number of components to keep for the PCA: 
    int num_components = 3; 
    // Perform a PCA: 
    PCA pca(mat, Mat(), CV_PCA_DATA_AS_COL, num_components); 
    // The mean face: 
    imshow("avg", pca.mean.reshape(1, db[0].rows)); 
    // The first three eigenfaces: 
    imshow("pc1", normalize(pca.eigenvectors.row(0)).reshape(1, db[0].rows)); 
    imshow("pc2", normalize(pca.eigenvectors.row(1)).reshape(1, db[0].rows)); 
    imshow("pc3", normalize(pca.eigenvectors.row(2)).reshape(1, db[0].rows)); 
   return 0; 
Figure 9: Eigenfaces, Left to right, (Average, PC1,PC2,PC3)