# Annnouncing nandist: calculating distances when data is missing

By Wouter Donders, vr 20 januari 2023, in category Blog

Calculating distances is a frequent part of data science when comparing the similarity or dissimilarity of data points. Distances can be measured in more than one metric, such as Euclidean, Cityblock, Minkowski, Chebyshev, Cosine, and many others. In Python, you can calculate these distance metrics using functions in SciPy's subpackage `scipy.spatial.distance`.

But data is rarely perfect, and you may have missing values in the components of your data points. If you have missing data and try to calculate a distance metric using SciPy's distance functions, it will (accurately) return `nan`, indicating that the distance cannot be calculated Although correct, that's not very useful.

For many use cases, it doesn't matter that a particular value is missing. It would be much more useful if the distance functions ignore these components and treat it as "any value", adding zero distance to your total distance.

Our open source Python package `nandist` implements drop-in replacements for SciPy's most frequently used distance functions, including `euclidean`, `cityblock`, `chebyshev`, `cosine`, `minkowski` and the fast distance functions for collections of data points `cdist` and `pdist`. The functions in `nandist` use the exact same API and therefore also support weighted distance calculations.

## Installation

You can simply install `nandist` using `pip`:

``````pip install nandist
``````

## Examples

### Calculating cityblock distance between two vectors

A simple example for calculating the cityblock distance between (0, 1) and (NaN, 0) is shown below

``````>>> import nandist
>>> import scipy
>>> import numpy as np

>>> # City-block distance between  (0, 1) and (NaN, 0)
>>> u, v = np.array([0, 1]), np.array([np.nan, 0])
>>> scipy.spatial.distance.cityblock(u, v)
nan
>>> nandist.cityblock(u, v)
1.0
``````

### Calculating cityblock distance between collections of data

You can get pairwise distances between arrays in two matrices using cdist. The NaNs do not need to be in the same component.

``````>>> import nandist
>>> import numpy as np

>>> # City-block distances between vectors A = [(0, 0), (1, NaN)] and vectors B=[(1, NaN) and (1, 1)]
>>> XA, XB = np.array([[0, 0], [1, np.nan]]), np.array([[1, np.nan], [1, 1]])
>>> Y = nandist.cdist(XA, XB, metric="cityblock")
array([[1., 2.],
[0., 0.]])
``````