# R Language: Implementing K Nearest Neighbours

## DOWNLOAD

## Intro

A generalized function for calculating nearest neighbours for any value of k. This is a set based solution that aims to optimise for speed by avoiding as many loops as possible.knn_general <- function

## Code

Ok this page is a work in progress and to begin with I’m just going to dump all my code here.

Now this is an implementation of KNN that I made in Q4 2016 and it was THE first thing I built in R. I tried to make it rely on as few loops as possible since… R is really slow.

I do think it’s imperfect in the way it tries to use `merge`

as an analogue to SQL’s `CROSS JOIN`

and `INNER JOIN`

.
The problem is that `merge`

returns a set that is completely different from the original ordering.
I mean it’s just a waste of time.

Well I had some design constraints, which is why I went about doing it this way.

Design constraints:

- Not allowed to use
`sort`

or`order`

. - Minimise the use of for-loops.

`knn_general`

Class Creation

First I create my class. It takes four parameters:

`trainObject`

: a vector of training data consisting of*Objects*.`testObject`

: a vector of test data. This vector would probably just be a singleton.`trainLabel`

: a vector of training data consisting of*Labels*.

```
knn_general <- function
( trainObject
, testObject
, trainLabel
, kValue
){
```

## Computing Mode

R does not appear to have a function of finding model values.

```
modlab <- function
(xx){
modlab <- aggregate(
as.numeric(xx)
, by=list(as.numeric(xx))
, FUN = length
) [which.max(aggregate(
as.numeric(xx)
, by=list(as.numeric(xx))
, FUN = length
) $x),1]
return(modlab)
}
```

## INITIALISE DATA FRAMES AND CREATE IDENTITIES

```
trainObject <- data.frame(rownames(trainObject),trainObject);
names(trainObject)[1] <- paste("trainID");
testObject <- data.frame(rownames(testObject),testObject);
names(testObject)[1] <- paste("testID");
trainLabel <- data.frame(trainLabel);
trainLabel <- data.frame(rownames(trainLabel),trainLabel);
names(trainLabel)[1] <- paste("labelID");
```

## CREATE DATA FRAME FOR STORING PREDICTED LABEL

This stores the predicted labels for our `testObject`

of every nearest neighbour at any level of k.

```
predicted <- data.frame(testObject[,1]);
names(predicted)[1] <- paste("testID");
```

This converts `testID`

from factor to numeric to maintain ordering.

```
predicted$testID <- as.numeric(levels(predicted$testID))[predicted$testID]
```