## Approximate Nearest Neighbours in R and Spark

Background K-Nearest Neighbour is a commonly used algorithm, but is difficult to compute for big data. Spark implements a couple of methods for getting approximate nearest neighbours using Local Sensitivity Hashing; Bucketed Random Projection for Euclidean Distance and MinHash for Jaccard Distance. The work to add these methods was done in collaboration with Uber, which you can read about here. Whereas traditional KNN algorithms find the exact nearest neighbours, these approximate methods will only find the nearest neighbours with high probability.
[Read More]