Neural Nets, PCA and classification with Numpy

Links

Course webpage: link

Tasks

Task 1 - Classification

Task 1 was based around the Anuran Calls (MFCCs) data-set from which every student received a different subset of the data.

In this task we needed to process and classify into frog species classes, each feature vector component represented some characteristics of a frog, and all vectors had 24 dimensions.

Maximum Likelihood Estimation

In order to classify the data, covariance and correlation matrices needed to be estimated, we did so with the MLE method.

i.e. I had to calculate the variance and mean parameters of the underlying Gaussian distribution using values which maximize the Likelihood. This was a pretty simple calculation:

import numpy as np
import scipy.io
import math

# assumes none of the variables have 0 variance, i.e. no constant variables
def task1_1(X, Y):
    # Input:
    #  X : N-by-D data matrix (np.float64)
    #  Y : N-by-1 label vector (np.int32)
    # Variables to save
    #  S : D-by-D covariance matrix (np.float64) to save as 't1_S.mat'
    #  R : D-by-D correlation matrix (np.float64) to save as 't1_R.mat'

    # get sizes
    (N,D) = X.shape
    
    # mu hat - 1-by-D vector
    mu = np.mean(X,axis = 0)

    # N-by-D matrix
    centeredX = (X - mu)

    # sigma hat - D-by-D matrix
    S = (centeredX.T@centeredX)/N

    # correlation matrix - D-by-D matrix
    diag = S.diagonal().reshape(1,D)
    R = S / (np.sqrt(diag.T@diag))
    
    scipy.io.savemat('t1_S.mat', mdict={'S': S})
    scipy.io.savemat('t1_R.mat', mdict={'R': R})

Resulting Covariance and Correlation matrices (plotted in matplotlib):

I was very pleased with these graphs . There are clear correlations to be found both positive (red) and negative (blue) all over the graph, which is pretty interesting.

Principal Component Analysis

This was the final step before I could classify the data points into species! PCA is a very interesting technique which uses the eigenvalues of our covariance matrix in order to find the dimensions which will best serve as a "new plane" for our data minimising the loss of variance and information. Once we find those axes which do just that we project the entire data set onto this new plane (in this case 2D) and then analyse it.

There are a couple steps needed to carry out PCA:

Calculate eigenvectors and eigenvalues of your covariance matrix
Sort the eigenvalues and eigenvalues in tandem in descending order.
Project data onto the axes defined by the eigenvectors (the new axes)

So the first two pieces of work was pretty simple again (in addition to the eigenvalues/vectors we also had to find the cumulative variance and also made sure that the eigenvalues were unique for the purposes of the coursework):

def task1_3(Cov):
    # Input:
    #  Cov : D-by-D covariance matrix (np.float64)
    # Variales to save:
    #  EVecs : D-by-D matrix of column vectors of eigen vectors (np.float64)  
    #  EVals : D-by-1 vector of eigen values (np.float64)  
    #  Cumvar : D-by-1 vector of cumulative variance (np.float64)  
    #  MinDims : 4-by-1 vector (np.int32)  

    # find D
    D,D = Cov.shape

    # find the eigen values and eigen vectors in ascending order
    EVals,EVecs = np.linalg.eigh(Cov)

    # find indexes of those eigenvectors whose first element is negative and negate those columns
    for i in np.argwhere(EVecs[0] < 0):
        EVecs[0,i] = EVecs[0,i] * -1
    
    # reverse the order of EVals and EVecs
    EVals = EVals[::-1][np.newaxis,:].T
    EVecs = np.flip(EVecs,1)

    # calculate the cummulative variance which is the sum of eigen values at each addition
    Cumvar = np.cumsum(EVals,axis=0)

    # calculate MinDims using Cumvar
    minBounds = np.array([70,80,90,95])/100 * Cumvar[-1]  # the minimum variance bounds
    
    currDims = 1 
    i = 0 # current bound

    MinDims = np.zeros((4,1))
    while i < 4:
        if Cumvar[currDims] >= minBounds[i]:
            MinDims[i] = currDims 
            i += 1
        currDims += 1

    scipy.io.savemat('t1_EVecs.mat', mdict={'EVecs': EVecs})
    scipy.io.savemat('t1_EVals.mat', mdict={'EVals': EVals})
    scipy.io.savemat('t1_Cumvar.mat', mdict={'Cumvar': Cumvar})
    scipy.io.savemat('t1_MinDims.mat', mdict={'MinDims': MinDims})

Projecting was a little tricky to figure out, but I eventually figured out the linear algebra involved:

outEVe = scipy.io.loadmat("t1_EVecs.mat")["EVecs"]
    outEVa = scipy.io.loadmat("t1_EVals.mat")["EVals"]
    outCV = scipy.io.loadmat("t1_Cumvar.mat")["Cumvar"]
    outMD = scipy.io.loadmat("t1_MinDims.mat")["MinDims"]
  
    fig,ax = plt.subplots(1,figsize=set_size(width))

    X = np.arange(len(outCV))
    ax.set_xticks(X)
    ax.set_title("Cummulative Variance")
    ax.set_xlabel("Dimensions")
    ax.set_ylabel("Total Variance")

    ax.bar(X,outCV.flatten())

    #plt.savefig('cumvar.png')
    plt.show()
    
    # draw the data after PCA using the first 2 eigenvalues as new basis
    fig,ax = plt.subplots(1,figsize=set_size(width))
    
    # get matrix to transform data
    W = outEVe[:,[0,1]]

    # the dot product of each feature vector with each unit eigenvector: 1-by-D dot D-by-1 gives
    # the scalar projection or the distance of the vector projection along that eigenvector
    # D is then a N-by-2 matrix of the feature vectors expressed as a linear combination of the first 2 principal components
      
    D = data["X"]@W
    
    # plot each class in a different color
    for i in range(10):
        idxs = np.argwhere(data["Y_species"] == i)
        D = D
        X = D[idxs,0]
        Y = D[idxs,1]
        ax.scatter(X,Y,label = data["list_species"][i,0][0],s=3)
    ax.legend(loc="upper right",fontsize="x-small")
    ax.set_xlabel("PC1")
    ax.set_ylabel("PC2")
    ax.set_title("Data transformed onto first two principle components")
    fig.tight_layout()
    # plt.savefig('pca.png')

Result:

Pretty neat, eh?

This graph is 24 dimensional data "squashed" into a 2D plane. I love this stuff.

Classification

This was the hardest bit, we were told to use k-fold cross validation and classify the data points based on the estimated Gaussians. In the end my code looked like this:

import numpy as np
import scipy.io

def task1_mgc_cv(X, Y, CovKind, epsilon, Kfolds):
    # Input:
    #  X : trainingN-by-D matrix of feature vectors (np.float64)
    #  Y : trainingN-by-1 label vector (np.int32)
    #  CovKind : scalar (np.int32)
    #  epsilon : scalar (np.float64)
    #  Kfolds  : scalar (np.int32)
    #
    # Variables to save
    #  PMap   : trainingN-by-1 vector (np.ndarray) of partition numbers (np.int32)
    #  Ms     : C-by-D matrix (np.ndarray) of mean vectors (np.float64)
    #  Covs   : C-by-D-by-D array (np.ndarray) of covariance matrices (np.float64)
    #  CM     : C-by-C confusion matrix (np.ndarray) (np.float64)

    ## split the test data by classes

    # the number of classes
    C = 10
    D = X.shape[1]
    # the total number of samples
    Xcount = len(X)

    # idxs of elements in each class, C-by-len(C_i indices)
    ClassIdxs = np.zeros((C),dtype=object)
    for c in range(C):
        ClassIdxs[c] = np.nonzero(Y == c+1)[0] # Y ranges between 1 and 10

    # numbers of elements in each class
    vecLen = np.vectorize(len)
    Ncs = vecLen(ClassIdxs)
    # the initial partition lenghts for each class, C-by-1
    Mcs = Ncs//Kfolds

    PMap = np.zeros((Xcount,1))
    CM = np.zeros((C,C))

    # partition the elements except for the last partition
    for p in range(Kfolds):
        a = p * Mcs
        b = (p+1) * Mcs

        # clamp the elements in the last partition
        if p == Kfolds - 1:
            np.minimum(b,Ncs-1,out=b) # Ncs contains lengths, b contains indices, equalize the difference and take min
            b+=1 # we slice exclusively later so we need to keep the values as lenghts rather than indices
        # flattened elements taken from a portion of each class

        idxs = [ item for c,sublist in enumerate(ClassIdxs) for item in sublist[a[c]:b[c]]]
        PMap[idxs] = p

    # perform classification with each partition as a test set 
    for p in range(Kfolds):

        ## estimate the parameters
        Covs = np.zeros((C,D,D)) 
        Ms = np.zeros((C,D))

        testingMask = (PMap == p)
        trainingMask = (PMap != p)

        for c in range(C):
            # training is Msc[c]-by-D in size
            # labels range 1-10, so in referring to Y be careful with off-by-one errors
            testing = X[(testingMask & (Y == c+1)).ravel(),:]
            training = X[(trainingMask & (Y == c+1)).ravel(),:]
            # get size
            trainingN = len(training)

            # mu hat - 1-by-D vector
            mu = np.mean(training,axis = 0)

            Ms[c] = mu
            # trainingN-by-D matrix
            centeredX = (training - mu)

            # we estimate and regularize the matrices
            if CovKind == 1:
                # full covariance matrix calculation 
                Covs[c] = (centeredX.T@centeredX)/trainingN
                np.fill_diagonal(Covs[c], Covs[c].diagonal() + (epsilon * 1))

            elif CovKind == 2:
                # diagonal covariance matrix, just valculate auto variance
                np.fill_diagonal(Covs[c],np.sum(centeredX**2,axis=0)/trainingN)
                np.fill_diagonal(Covs[c], Covs[c].diagonal() + (epsilon * 1))

            else:
                # for shared covariance accumulate the result in 0, and later 'tile' the shared matrix
                S = ((centeredX.T@centeredX)/trainingN)
                np.fill_diagonal(S, S.diagonal() + (epsilon * 1))
                Covs[0] = S + Covs[0]

        if CovKind == 3:
            Covs[0] = Covs[0]/C
            Covs = np.tile(Covs[0],(C,1,1))

        ## perform classification and calculate confusion matrix as you go

        CM_p = np.zeros((C,C))

        # classify each sample according to posterior likelihoods
        
        # PLL = np.zeros((trainingN,1))
        for i in np.nonzero(testingMask)[0]:
            x = X[i]
            (pll,c) = max([(log_posterior(x,Ms[i],Covs[i]),i) for i in range(C)])

            # PLL[i] = c
            CM_p[Y[i]-1,c] += 1

        # accumulate the total confusion matrix
        CM += CM_p/len(np.nonzero(testingMask)[0]) 
        ## save results
        scipy.io.savemat('t1_mgc_'+str(Kfolds)+'cv'+str(p)+'_Ms.mat', mdict={'Ms':Ms})
        scipy.io.savemat('t1_mgc_'+str(Kfolds)+'cv'+str(p)+'_ck'+str(CovKind)+'_Covs.mat', mdict={'Covs': Covs})
    
        scipy.io.savemat('t1_mgc_'+str(Kfolds)+'cv'+str(p)+'_ck'+str(CovKind)+'_CM.mat', mdict={'CM':CM_p})
    
    CM /= Kfolds
    np.set_printoptions(precision=1)
    scipy.io.savemat('t1_mgc_'+str(Kfolds)+'cv'+str(Kfolds+1)+'_ck'+str(CovKind)+'_CM.mat', mdict={'CM':CM})
    scipy.io.savemat('t1_mgc_'+str(Kfolds)+'cv_PMap.mat', mdict={'PMap':PMap})

def log_posterior(x,mu,si):
    si_det = np.linalg.det(si)
    si_inv = np.linalg.inv(si)
    xCentered = (x - mu)[:,None]
    return ((-(1/2)* ((xCentered.T@si_inv@xCentered))) - ((1/2)*np.log(si_det)))[0,0]  # + np.log(prior) - assumed equal priors

It worked! I was quite happy to see the high accuracy values, it felt quite amazing.

Task 2 - Neural Networks

This part was really interesting, we did not use any training algorithms but had to work out the weights and network structure ourselves - mostly using logic gate neurons. We got to also try out different activation functions, a Sigmoid and a simple "cutoff" activation function.

My basic building blocks were hNeurons (simple activation function) and sNeurons (Sigmoid):

def task2_sNeuron(W, X):
    # Input:
    #  X : N-by-D matrix of input vectors (in row-wise) (np.float64)
    #  W : (D+1)-by-1 vector of weights (np.float64)
    # Output:
    #  Y : N-by-1 vector of output (np.float64)

    # the output is defined as y(x_i) = step(w.T*X[0]) where
    # X needs to be augmented with a one at the top

    # find out sizes
    (N,D) = X.shape

    # augment X
    X = np.concatenate((np.ones((N,1)),X),axis=1)
    
    # apply activation function and change type
    return 1/(1+np.exp(-(X@W)))


def task2_hNeuron(W, X):
    # Input:
    #  X : N-by-D matrix of input vectors (in row-wise) (np.double)
    #  W : (D+1)-by-1 vector of weights (np.double)
    # Output:
    #  Y : N-by-1 vector of output (np.double)

    # the output is defined as y(x_i) = step(w.T*X[0]) where
    # X needs to be augmented with a one at the top

    # find out sizes
    (N,D) = X.shape

    # augment X
    X = np.concatenate((np.ones((N,1)),X),axis=1)
    # apply activation function and change type
    return ((X@W) > 0).astype(np.double)

We were given a few polygons as decision boundaries, namely:

Polygon A: (-1,-0.5) (6,1.25) (6,6.25) (1,6)
Polygon B: (2.5, 3) (3.5, 3) (3.5, 3.5) (2.5, 3.5)

And then tasked to plot regions based on the following set notation:

Class 1: A, Class 0: ~A
Class 1: ~A and B, Class 0: A or ~B

As well as some other similar tasks.

For finding the weights for all of the above, I used a script which did the math for me:

f __name__ == "__main__":
    #                 A                 B                   C                   D
    polyA = np.array([[1.65241, 2.222], [1.92806, 1.71377], [2.51323, 2.1643], [2.35589, 2.73902]])

    #Layer 1
    # the 4 neurons in the 1st layer test weather a point is ahead or behind of each face of the polygon
    directions = polyA - polyA[[1,2,3,0],:]
    normals = directions[:,[1,0]] * np.array([[-1,1]])

    biases = np.sum(normals*polyA,axis=1)[:,None] * -1
    weightsMatrix1 = np.concatenate((biases,normals),axis=1)
    # normalize the weights

    weightsMatrix1 /= np.max(np.abs(weightsMatrix1),axis=1)[:,None]

    #Layer 2

    # The 4D XOR neuron
    # this neuron only returns 1 if all the previous neurons return a 0, meaning the point is not outside any faces

    # any presence in the x vector other than 0 will add a value > -1 meaning the output is 0
    # only all 4 zeros will output a 1
    weightsMatrix2 = np.array([[1,-1,-1,-1,-1]])

    f = open("task2 hNN A weights.txt",'x')

    L1Neurons,L1Dim = weightsMatrix1.shape
    for j in range(L1Neurons):
        for i in range(L1Dim):
            f.write("W(1,"+str(j+1)+','+str(i)+") : " + str(weightsMatrix1[j,i])+'\n')

    L2Neurons,L2Dim = weightsMatrix2.shape
    for j in range(L2Neurons):
        for i in range(L2Dim):
            f.write("W(2,"+str(j+1)+','+str(i)+") : " + str(weightsMatrix2[j,i])+'\n')

def t2():
    # poly 2.1
    PolyB = np.array([[-0.46478,3.19731], [5.65146,4.97528], [3.08446, 2.35946]]) #, [3.68124, -2.26483]
    directionsB = PolyB[[1,2,0],:] - PolyB
    normalsB = directionsB[:,[1,0]] * np.array([[-1,1]])

    biasesB = np.sum(normalsB*PolyB,axis=1)[:,None] * -1
    weightsMatrix1B = np.concatenate((biasesB,normalsB),axis=1)
    # normalize the weights

    weightsMatrix1B /= np.max(np.abs(weightsMatrix1B),axis=1)[:,None]
    print(weightsMatrix1B)

def t22():
    # poly 2.2
    PolyB = np.array([[3.08446, 2.35946],[3.68124, -2.26483],[-0.46478,3.19731]])
    directionsB = PolyB[[1,2,0],:] - PolyB
    normalsB = directionsB[:,[1,0]] * np.array([[-1,1]])

    biasesB = np.sum(normalsB*PolyB,axis=1)[:,None] * -1
    weightsMatrix1B = np.concatenate((biasesB,normalsB),axis=1)
    # normalize the weights

    weightsMatrix1B /= np.max(np.abs(weightsMatrix1B),axis=1)[:,None]
    print(weightsMatrix1B)

And then constructed the networks like so:

##Class 1: A,  Class 0: ~A

def task2_hNN_A(X):
    # Input:
    #  X : N-by-D matrix (np.ndarray) of input vectors (in row-wise) (np.float64), where D=2.
    # Output:
    #  Y : N-by-1 vector (np.ndarray) of output (np.float64)

    N,D = X.shape 
    
    weightsLayer1 = np.array([[1.0,-0.34994868685872166,-0.18980256091259212],
                            [0.22933150896320767,0.7699130167301808,-1.0],
                            [-1.0,0.3219838025165198,0.08814889248320788],
                            [-1.0,-0.7294263777054958,0.9924893972926816]])
              
    Nrns1,D1 = weightsLayer1.shape
    weightsLayer1 = weightsLayer1.T

    weightsLayer2 = np.array([[1,-1,-1,-1,-1]])
    Nrns2,D2 = weightsLayer2.shape
    weightsLayer2 = weightsLayer2.T

    # N-by-Nrns1
    layer1Output = np.zeros((N,Nrns1))
    for neuron in range(Nrns1):
        layer1Output[:,neuron] = t2.task2_hNeuron(weightsLayer1[:,neuron,None],X)[:,0]
    
    # N-by-Nrns2 
    layer2Output = np.zeros((N,Nrns2))
    for neuron in range(Nrns2):
        layer2Output[:,neuron] = t2.task2_hNeuron(weightsLayer2[:,neuron,None],layer1Output)[:,0]

    return layer2Output[:,0]

##Class 1: ~A and B, Class 0: A or ~B
def task2_hNN_AB(X):
    # Input:
    #  X : N-by-D matrix of input vectors (in row-wise) (np.float64), where D=2
    # Output:
    #  Y : N-by-1 vector of output (np.float64)

    # weights B
    #     [[-1.         -0.08723287  0.30008223]
    #  [-0.76903546  1.         -0.98133664]
    #  [-1.          0.29507611  0.03808055]
    #  [ 1.         -0.50965075 -0.38684878]
    
    N,D = X.shape 
                            
    # Layer 1
    weightsLayer1 = np.array([  #PolyA - 1 if outwith in each direction
                            [1.0,-0.34994868685872166,-0.18980256091259212],
                            [0.22933150896320767,0.7699130167301808,-1.0],
                            [-1.0,0.3219838025165198,0.08814889248320788],
                            [-1.0,-0.7294263777054958,0.9924893972926816],
                                #PolyB-1 triangle - 1 if outwith in each direction
                            [-1.         ,-0.08723287  ,0.30008223],
                            [-0.76903546  ,1.         ,-0.98133664],
                            [ 1.         ,-0.0764559  ,-0.323877  ],
                                #PolyB-2 triangle - 1 if outwith in each direction
                            [-1.          ,0.29507611  ,0.03808055],
                            [ 1.         ,-0.50965075 ,-0.38684878],
                            [-1.          ,0.0764559   ,0.323877  ]
                            ])
              
    Nrns1,D1 = weightsLayer1.shape
    weightsLayer1 = weightsLayer1.T

    # Layer2
    weightsLayer2 = np.array([  #PolyA - OR
                            [0, 1,1,1,1, 0,0,0, 0,0,0],
                                #PolyB-1 triangle - NOR
                            [1, 0,0,0,0, -1,-1,-1, 0,0,0],
                                #PolyB-2 triangle - NOR
                            [1, 0,0,0,0, 0,0,0, -1,-1,-1,],
                            ])
    Nrns2,D2 = weightsLayer2.shape
    weightsLayer2 = weightsLayer2.T

    # Layer3
    weightsLayer3 = np.array([  # Carry forward w1 
                            [0, 1,0,0],
                                # PolyB OR
                            [0, 0,1,1 ]]) 
    Nrns3,D3 = weightsLayer3.shape
    weightsLayer3 = weightsLayer3.T

    
    # Layer4
    weightsLayer4 = np.array([  # AND 
                            [-1, 0.6,0.6]]) 
    Nrns4,D4 = weightsLayer4.shape
    weightsLayer4 = weightsLayer4.T

    # N-by-Nrns1
    layer1Output = np.zeros((N,Nrns1))
    for neuron in range(Nrns1):
        layer1Output[:,neuron] = t2.task2_hNeuron(weightsLayer1[:,neuron,None],X)[:,0]
    # N-by-Nrns2 
    layer2Output = np.zeros((N,Nrns2))
    for neuron in range(Nrns2):
        layer2Output[:,neuron] = t2.task2_hNeuron(weightsLayer2[:,neuron,None],layer1Output)[:,0]

    # N-by-Nrns3
    layer3Output = np.zeros((N,Nrns3))
    for neuron in range(Nrns3):
        layer3Output[:,neuron] = t2.task2_hNeuron(weightsLayer3[:,neuron,None],layer2Output)[:,0]
    # N-by-Nrns4
    layer4Output = np.zeros((N,Nrns4))
    for neuron in range(Nrns4):
        layer4Output[:,neuron] = t2.task2_hNeuron(weightsLayer4[:,neuron,None],layer3Output)[:,0]
    
    return layer4Output[:,0]

We also had to do the second boundary detector using Sigmoid neurons, which just made the polygon rounder, but with a significantly high multiplier value for the activation it approximated the hNeuron network.

Results:

And that was pretty much it, with some tasks I missed out as they were very similar or just slightly different formats.

Overall this was a challenging coursework and I had to brush up on my Linear Algebra, but it was extremely rewarding and fun.