Links
Git: link
Course webpage: link
Tasks
Task 1 - Classification
Task 1 was based around the Anuran Calls (MFCCs) data-set from which every student received a different subset of the data.
In this task we needed to process and classify into frog species classes, each feature vector component represented some characteristics of a frog, and all vectors had 24 dimensions.
Maximum Likelihood Estimation
In order to classify the data, covariance and correlation matrices needed to be estimated, we did so with the MLE method.
i.e. I had to calculate the variance and mean parameters of the underlying Gaussian distribution using values which maximize the Likelihood. This was a pretty simple calculation:
import numpy as np
import scipy.io
import math
# assumes none of the variables have 0 variance, i.e. no constant variables
def task1_1(X, Y):
# Input:
# X : N-by-D data matrix (np.float64)
# Y : N-by-1 label vector (np.int32)
# Variables to save
# S : D-by-D covariance matrix (np.float64) to save as 't1_S.mat'
# R : D-by-D correlation matrix (np.float64) to save as 't1_R.mat'
# get sizes
(N,D) = X.shape
# mu hat - 1-by-D vector
mu = np.mean(X,axis = 0)
# N-by-D matrix
centeredX = (X - mu)
# sigma hat - D-by-D matrix
S = (centeredX.T@centeredX)/N
# correlation matrix - D-by-D matrix
diag = S.diagonal().reshape(1,D)
R = S / (np.sqrt(diag.T@diag))
scipy.io.savemat('t1_S.mat', mdict={'S': S})
scipy.io.savemat('t1_R.mat', mdict={'R': R})
Resulting Covariance and Correlation matrices (plotted in matplotlib):
I was very pleased with these graphs . There are clear correlations to be found both positive (red) and negative (blue) all over the graph, which is pretty interesting.
Principal Component Analysis
This was the final step before I could classify the data points into species! PCA is a very interesting technique which uses the eigenvalues of our covariance matrix in order to find the dimensions which will best serve as a "new plane" for our data minimising the loss of variance and information. Once we find those axes which do just that we project the entire data set onto this new plane (in this case 2D) and then analyse it.
There are a couple steps needed to carry out PCA:
- Calculate eigenvectors and eigenvalues of your covariance matrix
- Sort the eigenvalues and eigenvalues in tandem in descending order.
- Project data onto the axes defined by the eigenvectors (the new axes)
So the first two pieces of work was pretty simple again (in addition to the eigenvalues/vectors we also had to find the cumulative variance and also made sure that the eigenvalues were unique for the purposes of the coursework):
def task1_3(Cov):
# Input:
# Cov : D-by-D covariance matrix (np.float64)
# Variales to save:
# EVecs : D-by-D matrix of column vectors of eigen vectors (np.float64)
# EVals : D-by-1 vector of eigen values (np.float64)
# Cumvar : D-by-1 vector of cumulative variance (np.float64)
# MinDims : 4-by-1 vector (np.int32)
# find D
D,D = Cov.shape
# find the eigen values and eigen vectors in ascending order
EVals,EVecs = np.linalg.eigh(Cov)
# find indexes of those eigenvectors whose first element is negative and negate those columns
for i in np.argwhere(EVecs[0] < 0):
EVecs[0,i] = EVecs[0,i] * -1
# reverse the order of EVals and EVecs
EVals = EVals[::-1][np.newaxis,:].T
EVecs = np.flip(EVecs,1)
# calculate the cummulative variance which is the sum of eigen values at each addition
Cumvar = np.cumsum(EVals,axis=0)
# calculate MinDims using Cumvar
minBounds = np.array([70,80,90,95])/100 * Cumvar[-1] # the minimum variance bounds
currDims = 1
i = 0 # current bound
MinDims = np.zeros((4,1))
while i < 4:
if Cumvar[currDims] >= minBounds[i]:
MinDims[i] = currDims
i += 1
currDims += 1
scipy.io.savemat('t1_EVecs.mat', mdict={'EVecs': EVecs})
scipy.io.savemat('t1_EVals.mat', mdict={'EVals': EVals})
scipy.io.savemat('t1_Cumvar.mat', mdict={'Cumvar': Cumvar})
scipy.io.savemat('t1_MinDims.mat', mdict={'MinDims': MinDims})
Projecting was a little tricky to figure out, but I eventually figured out the linear algebra involved:
outEVe = scipy.io.loadmat("t1_EVecs.mat")["EVecs"]
outEVa = scipy.io.loadmat("t1_EVals.mat")["EVals"]
outCV = scipy.io.loadmat("t1_Cumvar.mat")["Cumvar"]
outMD = scipy.io.loadmat("t1_MinDims.mat")["MinDims"]
fig,ax = plt.subplots(1,figsize=set_size(width))
X = np.arange(len(outCV))
ax.set_xticks(X)
ax.set_title("Cummulative Variance")
ax.set_xlabel("Dimensions")
ax.set_ylabel("Total Variance")
ax.bar(X,outCV.flatten())
#plt.savefig('cumvar.png')
plt.show()
# draw the data after PCA using the first 2 eigenvalues as new basis
fig,ax = plt.subplots(1,figsize=set_size(width))
# get matrix to transform data
W = outEVe[:,[0,1]]
# the dot product of each feature vector with each unit eigenvector: 1-by-D dot D-by-1 gives
# the scalar projection or the distance of the vector projection along that eigenvector
# D is then a N-by-2 matrix of the feature vectors expressed as a linear combination of the first 2 principal components
D = data["X"]@W
# plot each class in a different color
for i in range(10):
idxs = np.argwhere(data["Y_species"] == i)
D = D
X = D[idxs,0]
Y = D[idxs,1]
ax.scatter(X,Y,label = data["list_species"][i,0][0],s=3)
ax.legend(loc="upper right",fontsize="x-small")
ax.set_xlabel("PC1")
ax.set_ylabel("PC2")
ax.set_title("Data transformed onto first two principle components")
fig.tight_layout()
# plt.savefig('pca.png')
Result:
Pretty neat, eh?
This graph is 24 dimensional data "squashed" into a 2D plane. I love this stuff.
Classification
This was the hardest bit, we were told to use k-fold cross validation and classify the data points based on the estimated Gaussians. In the end my code looked like this:
import numpy as np
import scipy.io
def task1_mgc_cv(X, Y, CovKind, epsilon, Kfolds):
# Input:
# X : trainingN-by-D matrix of feature vectors (np.float64)
# Y : trainingN-by-1 label vector (np.int32)
# CovKind : scalar (np.int32)
# epsilon : scalar (np.float64)
# Kfolds : scalar (np.int32)
#
# Variables to save
# PMap : trainingN-by-1 vector (np.ndarray) of partition numbers (np.int32)
# Ms : C-by-D matrix (np.ndarray) of mean vectors (np.float64)
# Covs : C-by-D-by-D array (np.ndarray) of covariance matrices (np.float64)
# CM : C-by-C confusion matrix (np.ndarray) (np.float64)
## split the test data by classes
# the number of classes
C = 10
D = X.shape[1]
# the total number of samples
Xcount = len(X)
# idxs of elements in each class, C-by-len(C_i indices)
ClassIdxs = np.zeros((C),dtype=object)
for c in range(C):
ClassIdxs[c] = np.nonzero(Y == c+1)[0] # Y ranges between 1 and 10
# numbers of elements in each class
vecLen = np.vectorize(len)
Ncs = vecLen(ClassIdxs)
# the initial partition lenghts for each class, C-by-1
Mcs = Ncs//Kfolds
PMap = np.zeros((Xcount,1))
CM = np.zeros((C,C))
# partition the elements except for the last partition
for p in range(Kfolds):
a = p * Mcs
b = (p+1) * Mcs
# clamp the elements in the last partition
if p == Kfolds - 1:
np.minimum(b,Ncs-1,out=b) # Ncs contains lengths, b contains indices, equalize the difference and take min
b+=1 # we slice exclusively later so we need to keep the values as lenghts rather than indices
# flattened elements taken from a portion of each class
idxs = [ item for c,sublist in enumerate(ClassIdxs) for item in sublist[a[c]:b[c]]]
PMap[idxs] = p
# perform classification with each partition as a test set
for p in range(Kfolds):
## estimate the parameters
Covs = np.zeros((C,D,D))
Ms = np.zeros((C,D))
testingMask = (PMap == p)
trainingMask = (PMap != p)
for c in range(C):
# training is Msc[c]-by-D in size
# labels range 1-10, so in referring to Y be careful with off-by-one errors
testing = X[(testingMask & (Y == c+1)).ravel(),:]
training = X[(trainingMask & (Y == c+1)).ravel(),:]
# get size
trainingN = len(training)
# mu hat - 1-by-D vector
mu = np.mean(training,axis = 0)
Ms[c] = mu
# trainingN-by-D matrix
centeredX = (training - mu)
# we estimate and regularize the matrices
if CovKind == 1:
# full covariance matrix calculation
Covs[c] = (centeredX.T@centeredX)/trainingN
np.fill_diagonal(Covs[c], Covs[c].diagonal() + (epsilon * 1))
elif CovKind == 2:
# diagonal covariance matrix, just valculate auto variance
np.fill_diagonal(Covs[c],np.sum(centeredX**2,axis=0)/trainingN)
np.fill_diagonal(Covs[c], Covs[c].diagonal() + (epsilon * 1))
else:
# for shared covariance accumulate the result in 0, and later 'tile' the shared matrix
S = ((centeredX.T@centeredX)/trainingN)
np.fill_diagonal(S, S.diagonal() + (epsilon * 1))
Covs[0] = S + Covs[0]
if CovKind == 3:
Covs[0] = Covs[0]/C
Covs = np.tile(Covs[0],(C,1,1))
## perform classification and calculate confusion matrix as you go
CM_p = np.zeros((C,C))
# classify each sample according to posterior likelihoods
# PLL = np.zeros((trainingN,1))
for i in np.nonzero(testingMask)[0]:
x = X[i]
(pll,c) = max([(log_posterior(x,Ms[i],Covs[i]),i) for i in range(C)])
# PLL[i] = c
CM_p[Y[i]-1,c] += 1
# accumulate the total confusion matrix
CM += CM_p/len(np.nonzero(testingMask)[0])
## save results
scipy.io.savemat('t1_mgc_'+str(Kfolds)+'cv'+str(p)+'_Ms.mat', mdict={'Ms':Ms})
scipy.io.savemat('t1_mgc_'+str(Kfolds)+'cv'+str(p)+'_ck'+str(CovKind)+'_Covs.mat', mdict={'Covs': Covs})
scipy.io.savemat('t1_mgc_'+str(Kfolds)+'cv'+str(p)+'_ck'+str(CovKind)+'_CM.mat', mdict={'CM':CM_p})
CM /= Kfolds
np.set_printoptions(precision=1)
scipy.io.savemat('t1_mgc_'+str(Kfolds)+'cv'+str(Kfolds+1)+'_ck'+str(CovKind)+'_CM.mat', mdict={'CM':CM})
scipy.io.savemat('t1_mgc_'+str(Kfolds)+'cv_PMap.mat', mdict={'PMap':PMap})
def log_posterior(x,mu,si):
si_det = np.linalg.det(si)
si_inv = np.linalg.inv(si)
xCentered = (x - mu)[:,None]
return ((-(1/2)* ((xCentered.T@si_inv@xCentered))) - ((1/2)*np.log(si_det)))[0,0] # + np.log(prior) - assumed equal priors
It worked! I was quite happy to see the high accuracy values, it felt quite amazing.
Task 2 - Neural Networks
This part was really interesting, we did not use any training algorithms but had to work out the weights and network structure ourselves - mostly using logic gate neurons. We got to also try out different activation functions, a Sigmoid and a simple "cutoff" activation function.
My basic building blocks were hNeurons (simple activation function) and sNeurons (Sigmoid):
def task2_sNeuron(W, X):
# Input:
# X : N-by-D matrix of input vectors (in row-wise) (np.float64)
# W : (D+1)-by-1 vector of weights (np.float64)
# Output:
# Y : N-by-1 vector of output (np.float64)
# the output is defined as y(x_i) = step(w.T*X[0]) where
# X needs to be augmented with a one at the top
# find out sizes
(N,D) = X.shape
# augment X
X = np.concatenate((np.ones((N,1)),X),axis=1)
# apply activation function and change type
return 1/(1+np.exp(-(X@W)))
def task2_hNeuron(W, X):
# Input:
# X : N-by-D matrix of input vectors (in row-wise) (np.double)
# W : (D+1)-by-1 vector of weights (np.double)
# Output:
# Y : N-by-1 vector of output (np.double)
# the output is defined as y(x_i) = step(w.T*X[0]) where
# X needs to be augmented with a one at the top
# find out sizes
(N,D) = X.shape
# augment X
X = np.concatenate((np.ones((N,1)),X),axis=1)
# apply activation function and change type
return ((X@W) > 0).astype(np.double)
We were given a few polygons as decision boundaries, namely:
- Polygon A: (-1,-0.5) (6,1.25) (6,6.25) (1,6)
- Polygon B: (2.5, 3) (3.5, 3) (3.5, 3.5) (2.5, 3.5)
And then tasked to plot regions based on the following set notation:
- Class 1: A, Class 0: ~A
- Class 1: ~A and B, Class 0: A or ~B
As well as some other similar tasks.
For finding the weights for all of the above, I used a script which did the math for me:
f __name__ == "__main__":
# A B C D
polyA = np.array([[1.65241, 2.222], [1.92806, 1.71377], [2.51323, 2.1643], [2.35589, 2.73902]])
#Layer 1
# the 4 neurons in the 1st layer test weather a point is ahead or behind of each face of the polygon
directions = polyA - polyA[[1,2,3,0],:]
normals = directions[:,[1,0]] * np.array([[-1,1]])
biases = np.sum(normals*polyA,axis=1)[:,None] * -1
weightsMatrix1 = np.concatenate((biases,normals),axis=1)
# normalize the weights
weightsMatrix1 /= np.max(np.abs(weightsMatrix1),axis=1)[:,None]
#Layer 2
# The 4D XOR neuron
# this neuron only returns 1 if all the previous neurons return a 0, meaning the point is not outside any faces
# any presence in the x vector other than 0 will add a value > -1 meaning the output is 0
# only all 4 zeros will output a 1
weightsMatrix2 = np.array([[1,-1,-1,-1,-1]])
f = open("task2 hNN A weights.txt",'x')
L1Neurons,L1Dim = weightsMatrix1.shape
for j in range(L1Neurons):
for i in range(L1Dim):
f.write("W(1,"+str(j+1)+','+str(i)+") : " + str(weightsMatrix1[j,i])+'\n')
L2Neurons,L2Dim = weightsMatrix2.shape
for j in range(L2Neurons):
for i in range(L2Dim):
f.write("W(2,"+str(j+1)+','+str(i)+") : " + str(weightsMatrix2[j,i])+'\n')
def t2():
# poly 2.1
PolyB = np.array([[-0.46478,3.19731], [5.65146,4.97528], [3.08446, 2.35946]]) #, [3.68124, -2.26483]
directionsB = PolyB[[1,2,0],:] - PolyB
normalsB = directionsB[:,[1,0]] * np.array([[-1,1]])
biasesB = np.sum(normalsB*PolyB,axis=1)[:,None] * -1
weightsMatrix1B = np.concatenate((biasesB,normalsB),axis=1)
# normalize the weights
weightsMatrix1B /= np.max(np.abs(weightsMatrix1B),axis=1)[:,None]
print(weightsMatrix1B)
def t22():
# poly 2.2
PolyB = np.array([[3.08446, 2.35946],[3.68124, -2.26483],[-0.46478,3.19731]])
directionsB = PolyB[[1,2,0],:] - PolyB
normalsB = directionsB[:,[1,0]] * np.array([[-1,1]])
biasesB = np.sum(normalsB*PolyB,axis=1)[:,None] * -1
weightsMatrix1B = np.concatenate((biasesB,normalsB),axis=1)
# normalize the weights
weightsMatrix1B /= np.max(np.abs(weightsMatrix1B),axis=1)[:,None]
print(weightsMatrix1B)
And then constructed the networks like so:
##Class 1: A, Class 0: ~A
def task2_hNN_A(X):
# Input:
# X : N-by-D matrix (np.ndarray) of input vectors (in row-wise) (np.float64), where D=2.
# Output:
# Y : N-by-1 vector (np.ndarray) of output (np.float64)
N,D = X.shape
weightsLayer1 = np.array([[1.0,-0.34994868685872166,-0.18980256091259212],
[0.22933150896320767,0.7699130167301808,-1.0],
[-1.0,0.3219838025165198,0.08814889248320788],
[-1.0,-0.7294263777054958,0.9924893972926816]])
Nrns1,D1 = weightsLayer1.shape
weightsLayer1 = weightsLayer1.T
weightsLayer2 = np.array([[1,-1,-1,-1,-1]])
Nrns2,D2 = weightsLayer2.shape
weightsLayer2 = weightsLayer2.T
# N-by-Nrns1
layer1Output = np.zeros((N,Nrns1))
for neuron in range(Nrns1):
layer1Output[:,neuron] = t2.task2_hNeuron(weightsLayer1[:,neuron,None],X)[:,0]
# N-by-Nrns2
layer2Output = np.zeros((N,Nrns2))
for neuron in range(Nrns2):
layer2Output[:,neuron] = t2.task2_hNeuron(weightsLayer2[:,neuron,None],layer1Output)[:,0]
return layer2Output[:,0]
##Class 1: ~A and B, Class 0: A or ~B
def task2_hNN_AB(X):
# Input:
# X : N-by-D matrix of input vectors (in row-wise) (np.float64), where D=2
# Output:
# Y : N-by-1 vector of output (np.float64)
# weights B
# [[-1. -0.08723287 0.30008223]
# [-0.76903546 1. -0.98133664]
# [-1. 0.29507611 0.03808055]
# [ 1. -0.50965075 -0.38684878]
N,D = X.shape
# Layer 1
weightsLayer1 = np.array([ #PolyA - 1 if outwith in each direction
[1.0,-0.34994868685872166,-0.18980256091259212],
[0.22933150896320767,0.7699130167301808,-1.0],
[-1.0,0.3219838025165198,0.08814889248320788],
[-1.0,-0.7294263777054958,0.9924893972926816],
#PolyB-1 triangle - 1 if outwith in each direction
[-1. ,-0.08723287 ,0.30008223],
[-0.76903546 ,1. ,-0.98133664],
[ 1. ,-0.0764559 ,-0.323877 ],
#PolyB-2 triangle - 1 if outwith in each direction
[-1. ,0.29507611 ,0.03808055],
[ 1. ,-0.50965075 ,-0.38684878],
[-1. ,0.0764559 ,0.323877 ]
])
Nrns1,D1 = weightsLayer1.shape
weightsLayer1 = weightsLayer1.T
# Layer2
weightsLayer2 = np.array([ #PolyA - OR
[0, 1,1,1,1, 0,0,0, 0,0,0],
#PolyB-1 triangle - NOR
[1, 0,0,0,0, -1,-1,-1, 0,0,0],
#PolyB-2 triangle - NOR
[1, 0,0,0,0, 0,0,0, -1,-1,-1,],
])
Nrns2,D2 = weightsLayer2.shape
weightsLayer2 = weightsLayer2.T
# Layer3
weightsLayer3 = np.array([ # Carry forward w1
[0, 1,0,0],
# PolyB OR
[0, 0,1,1 ]])
Nrns3,D3 = weightsLayer3.shape
weightsLayer3 = weightsLayer3.T
# Layer4
weightsLayer4 = np.array([ # AND
[-1, 0.6,0.6]])
Nrns4,D4 = weightsLayer4.shape
weightsLayer4 = weightsLayer4.T
# N-by-Nrns1
layer1Output = np.zeros((N,Nrns1))
for neuron in range(Nrns1):
layer1Output[:,neuron] = t2.task2_hNeuron(weightsLayer1[:,neuron,None],X)[:,0]
# N-by-Nrns2
layer2Output = np.zeros((N,Nrns2))
for neuron in range(Nrns2):
layer2Output[:,neuron] = t2.task2_hNeuron(weightsLayer2[:,neuron,None],layer1Output)[:,0]
# N-by-Nrns3
layer3Output = np.zeros((N,Nrns3))
for neuron in range(Nrns3):
layer3Output[:,neuron] = t2.task2_hNeuron(weightsLayer3[:,neuron,None],layer2Output)[:,0]
# N-by-Nrns4
layer4Output = np.zeros((N,Nrns4))
for neuron in range(Nrns4):
layer4Output[:,neuron] = t2.task2_hNeuron(weightsLayer4[:,neuron,None],layer3Output)[:,0]
return layer4Output[:,0]
We also had to do the second boundary detector using Sigmoid neurons, which just made the polygon rounder, but with a significantly high multiplier value for the activation it approximated the hNeuron network.
Results:
And that was pretty much it, with some tasks I missed out as they were very similar or just slightly different formats.
Overall this was a challenging coursework and I had to brush up on my Linear Algebra, but it was extremely rewarding and fun.