k Nearest Neighbours: k > 1

IntroPermalink

This time we want to find the nearest k neighbours to the test object.

For classification, we simply take a vote between them. In regression, we predict with the average of their labels.

Deciding kPermalink

To decide what size k to use, we can use cross validation.

For a large training set, we can use cross validation with 10 folds.

Computational AspectsPermalink

A large k can be computationally expensive.

Computing one distance takes time \(O(p)\) where \(p\) is the dimension of the objects (i.e. number of numeric attributes).

For each object in the test set, we need to calculate \(n\) distances. The total time required to calculate distances for each test object is \(O(np)\).

Curse of DimensionalityPermalink

As the number of observations needed increases exponentially with each attribute.

ExamplesPermalink

ClassificationPermalink

RegressionPermalink

Updated: