How to compute the Euclidean distance in Python numpy
The Euclidean distance is the “crow’s flight” distance or straight line distance between two points. For instance, if you look at the latitude and longitude of two cities, say New York and Boston, the Euclidean distance gives you the length of a rope stretched straight between these two cities.
The mathematical definition of the Euclidean distance involves taking the sum of squared differences between coordinates and then taking the square root of the sum (as shown in the below figure).
There are three ways to calculate the Euclidean distance using Python numpy. First, we can write the logic of the Euclidean distance in Python using sqrt(), sum(), and square() functions. Second, we can compute the Euclidean distance using dot products with dot(). Third, we can use the np.linalg.norm() function. We will go through these methods one by one.
Method 1: Compute Euclidean distance using sqrt(), sum(), and square() functions
Here is a simple way to compute the Euclidean distance. We simply code up the above formula using basic numpy functions, like so:
In the above code, we are first creating two locations at coordinates (-1,1) and (2,-3). (These are just for example and in reality you can plugin actual latitude and longitude coordinates of your favorite cities.) Note that the x-coordinates are off by a unit of 3 (i.e., 2 minus -1) and the y-coordinates are off by a unit of 4 (-3 minus 1). The exact signs don’t matter because we are squaring these differences and adding them (using np.square and then np.sum). This returns a single number, a scalar which is then used in the square root function to return the Euclidean distance.
If we run this program, we get:
This makes sense because the two differences where 3 and 4, and 3 and 4 are the two non-hypotenuse sides of a right angled triangle with 5 as the value of the hypotenuse, which is what is returned.
Method 2: Compute Euclidean distance using dot() and sqrt()
A second way to compute the Euclidean distance is to use the property that the Euclidean distance is basically the square of the dot product of the difference between the two points (as vectors) with itself. This works as follows:
Note that in the above code it is very important to first compute (location1-location) which for our example is (-3,4). When we do a dot product of (-3,4) with itself, we get: -3*-3 + 4*4 = 9+16 = 25. Finally we take the square root of it, to obtain 5. The output is the same as before:
Method 3: Compute Euclidean distance using the linalg.norm() function
The final approach to compute the Euclidean distance is the easiest and it uses the built-in function called linalg.norm() within the numpy module. Here is how that works:
Note that we compute the difference between the two locations as usual and pass it to the linalg.norm() function within numpy and it directly does the computation for us to yield the distance:
We have seen three different ways to compute the Euclidean distance using the Python numpy module. Which one is your favorite?
Finally, note that we have used points in two dimensional space (like latitude, longitude) and computed distances between them. But the same codes above will work even if your points are in a higher dimensional space, like 3 or higher.
If you liked this blogpost, you should explore a different way to compute distances such as the Manhattan distance.
Kodeclik is an online coding academy for kids and teens to learn real world programming. Kids are introduced to coding in a fun and exciting way and are challeged to higher levels with engaging, high quality content.