This time we want to find the nearest k neighbours to the test object.
For classification, we simply take a vote between them. In regression, we predict with the average of their labels.
To decide what size k to use, we can use cross validation.
For a large training set, we can use cross validation with 10 folds.
A large k can be computationally expensive.
Computing one distance takes time where is the dimension of the objects (i.e. number of numeric attributes).
For each object in the test set, we need to calculate distances. The total time required to calculate distances for each test object is .
Curse of Dimensionality
As the number of observations needed increases exponentially with each attribute.