k Nearest Neighbours: k > 1
IntroPermalink
This time we want to find the nearest k neighbours to the test object.
For classification, we simply take a vote between them. In regression, we predict with the average of their labels.
Deciding kPermalink
To decide what size k to use, we can use cross validation.
For a large training set, we can use cross validation with 10 folds.
Computational AspectsPermalink
A large k can be computationally expensive.
Computing one distance takes time \(O(p)\) where \(p\) is the dimension of the objects (i.e. number of numeric attributes).
For each object in the test set, we need to calculate \(n\) distances. The total time required to calculate distances for each test object is \(O(np)\).
Curse of DimensionalityPermalink
As the number of observations needed increases exponentially with each attribute.