# Multiclass Classification using Clatern

Clatern is a machine learning library for Clojure, in the works. This is a short tutorial on performing multiclass classification using Clatern.

#### Importing the libraries

The libraries required for the tutorial are core.matrix, Incanter and Clatern. Importing them,

*NOTE: This tutorial requires Incanter 2.0 (aka Incanter 1.9.0). This is because both Incanter 2.0 and Clatern are integrated with core.matrix.*

### Dataset

This tutorial uses the popular Iris flower dataset. The dataset is available here: https://archive.ics.uci.edu/ml/datasets/Iris. For this tutorial, we’ll use Incanter to load the dataset.

Now converting the dataset into a matrix, where non-numeric columns are converted to either numeric codes or dummy variables, using the to-matrix function.

Now let’s split the dataset into a training set and a test set,

Splitting the training and test set into features and labels,

### Logistic Regression

Here comes the interesting part - training a classifier using the data. First, let’s try the logistic regression model. Gradient descent is a learning algorithm for the logistic regression model. The syntax of gradient descent is,

where,

*X* is input data,

*y* is target data,

*alpha* is the learning rate,

*lambda* is the regularization parameter, and

*num-iters* is the number of iterations.

alpha(default = 0.1), lambda(default = 1) and num-iters(default = 100) are optional.

That’s it. Here, gradient-descent is a function in the clojure.logistic-regression namespace. It trains on the provided data and returns a hypothesis in the logistic regression model. Now, **lr-h** is a function that can classify an input vector.

### K Nearest Neighbors

Next, let’s try the k nearest neighbors model. There is actually no training phase for this model. It can be directly used. The syntax for knn is,

where,

*X* is input data,

*y* is target data,

*v* is new input to be classified, and

*k* is the number of neighbours(optional, default = 3)

Let’s define a function to perform kNN on our dataset.

Similar to the logistic regression hypothesis, now **knn-h** can be used to classify an input vector.

### Classification

Both **lr-h** and **knn-h** are functions that take input feature vectors and classify them. So to classify a whole dataset, the function is mapped to all rows of the dataset.

Now **lr-preds** and **knn-preds** contains the classifications made by logistic regression and knn on the orignal dataset, respectively.

### Conclusion

So which model performs better here? Let’s write a function to assess the classification accuracy

Now let’s evaluate both the classifiers:

The accuracy of the models could vary highly depending on the shuffling of the dataset. These are values I averaged over 100 runs. Both models perform well on this datatset. So, that’s it for multiclass classification using Clatern. More work on Clatern to follow soon. So, keep an eye out :-)