Kernel Density Estimation

By: Matthew Conlen

Kernel density estimation is a really useful statistical tool with an intimidating name. Often shortened to KDE, it’s a technique that let’s you create a smooth curve given a set of data.

This can be useful if you want to visualize just the “shape” of some data, as a kind of continuous replacement for the discrete histogram. It can also be used to generate points that look like they came from a certain dataset - this behavior can power simple simulations, where simulated objects are modeled off of real data.

I hope this article provides some intuition for how KDE works.

To understand how KDE is used in practice, lets start with some points. The white circles on your screen were sampled from some unknown distribution.

As more points build up, their silhouette will roughly correspond to that distribution, however we have no way of knowing its true value.

The blue line shows an estimate of the underlying distribution, this is what KDE produces.

The KDE algorithm takes a parameter, bandwidth, that affects how “smooth” the resulting curve is. Use the control below to modify bandwidth, and notice how the estimate changes.

Bandwidth: 0.05

The KDE is calculated by weighting the distances of all the data points we’ve seen for each location on the blue line. If we’ve seen more points nearby, the estimate is higher, indicating that probability of seeing a point at that location.

Move your mouse over the graphic to see how the data points contribute to the estimation — the “brighter” a selection is, the more likely that location is. The red curve indicates how the point distances are weighted, and is called the kernel function. The points are colored according to this function.

Click to lock the kernel function to a particular location.

Changing the bandwidth changes the shape of the kernel: a lower bandwidth means only points very close to the current position are given any weight, which leads to the estimate looking squiggly; a higher bandwidth means a shallow kernel where distant points can contribute.

Bandwidth: 0.05

Next we’ll see how different kernel functions affect the estimate.

The concept of weighting the distances of our observations from a particular point, $x$ , can be expressed mathematically as follows:

\hat{f}(x) = \sum_{observations}^{}{K(\frac{x - observation}{bandwidth})}

The variable $K$ represents the kernel function. Using different kernel functions will produce different estimates. Use the dropdown to see how changing the kernel affects the estimate.

Kernel:
Bandwidth: 0.05
Amplitude: 3.00

That’s all for now, thanks for reading! I’ll be making more of these quick explainer posts, so if you have an idea for a concept you’d like to see, reach out on twitter.

Here are a few useful links: