Distance measures

Published

August 29, 2022

Distance measures are critical in many machine learning algorithms: they summarize the relative difference between two objects. K-Nearest Neighbors is an example of an algorithm that uses a distance measure at its core.

Every distance measure doesn’t work for every problem. Instead, we must select the appropriate function depending on the nature of the data.

Here is a list of four distance measures and a quick summary of how they work. Select every one of them that’s correct.

The Euclidean distance between two vectors is the square root of the sum of the squared differences between them.
The Manhattan distance between two vectors is the sum of their absolute differences.
The Hamming distance is a generalization of the Euclidean and Manhattan distances that we can tune depending on which distance measure we need.
The Minkowski distance computes the distance between two binary vectors.

Expand to see the answer

1, 2

While the Euclidean and Manhattan distances summary is correct, the Hamming and Minkowski summaries aren’t.

The Hamming distance computes the distance between two binary vectors. For example, you can calculate the distance between objects using a one-hot encoded feature.

The Minkowski distance is a generalization of the Euclidean and Manhattan distances. Both of these work with real-value vectors, but the Euclidean distance is the shortest path between objects, while the Manhattan distance is the rectilinear distance between them. Using the Minkowski distance, we can control which approach to use depending on the data.

Recommended reading

Check “4 Distance Measures for Machine Learning” for a complete explanation of these four distance measures.
“Five Common Distance Measures in Data Science With Formulas and Examples” is a deeper dive into these distance measures.