Animations are excellent tools for teaching and building intuition. Here we show how a LOESS fit (with a 1-degree polynomial) is constructed, by performing a weighted regression in many overlapping windows.

First, we see how the broom package’s augment function is useful for plotting a LOESS fit.

library(lattice)
library(ggplot2)
library(broom)
theme_set(theme_bw())

mod <- loess(NOx ~ E, ethanol, degree = 1, span = .75)
fit <- augment(mod)

ggplot(fit, aes(E, NOx)) +
  geom_point() +
  geom_line(aes(y = .fitted), color = "red")

We now need to compute a separate regression centered around each point on the graph. broom’s inflate function is useful for this, followed by some dplyr manipulation to filter the distant ones (using a span of .75). Notice that we filter out all but the 75% closest points, and then calculate tricube weights, just like the loess function does, on those points.

library(dplyr)

dat <- ethanol %>%
  inflate(center = unique(ethanol$E)) %>%
  mutate(dist = abs(E - center)) %>%
  filter(rank(dist) / n() <= .75) %>%
  mutate(weight = (1 - (dist / max(dist)) ^ 3) ^ 3)

dat
## Source: local data frame [5,478 x 6]
## Groups: center [83]
## 
##    center   NOx     C     E  dist     weight
##     (dbl) (dbl) (dbl) (dbl) (dbl)      (dbl)
## 1   0.535 3.741  12.0 0.907 0.372 0.38324129
## 2   0.535 2.295  12.0 0.761 0.226 0.82699305
## 3   0.535 1.498  12.0 1.108 0.573 0.00000000
## 4   0.535 2.881  12.0 1.016 0.481 0.06815599
## 5   0.535 3.120   9.0 1.001 0.466 0.09868125
## 6   0.535 2.358  12.0 1.042 0.507 0.02901265
## 7   0.535 3.669  12.0 0.930 0.395 0.30402352
## 8   0.535 1.192  18.0 0.601 0.066 0.99542253
## 9   0.535 0.926   7.5 0.696 0.161 0.93491730
## 10  0.535 1.590  12.0 0.686 0.151 0.94609661
## ..    ...   ...   ...   ...   ...        ...

We are now ready to create an animation, where each frame uses a different center. The most important layer here is the geom_smooth step with the weight = weight aesthetic. This does all the work of adding a best-fit line from the weighted regression. We also add a geom_vline to show the current center.

p <- ggplot(dat, aes(E, NOx)) +
  geom_point(aes(alpha = weight, frame = center)) +
  geom_smooth(aes(group = center, frame = center, weight = weight), method = "lm", se = FALSE) +
  geom_vline(aes(xintercept = center, frame = center), lty = 2) +
  geom_line(aes(y = .fitted), data = fit, color = "red")

library(gganimate)
gg_animate(p)

unnamed-chunk-4

Range of “span” values

The span argument to loess is important: it determines how smooth the resulting curve is. How much? We can see in an animation:

fits <- data_frame(span = seq(.1, 1, .1)) %>%
  group_by(span) %>%
  do(augment(loess(NOx ~ E, ethanol, degree = 1, span = .$span)))

p <- ggplot(fits, aes(E, NOx, frame = span)) +
  geom_point() +
  geom_line(aes(y = .fitted), color = "red") +
  ggtitle("span = ")

gg_animate(p)

span_animation

Notice how dplyr and broom can work together to create a data frame ready for gg_animate.

Through faceting we can now span values at them same time through faceting, and then animate the moving best-fit line across each of them. This lets us create the full animation below.

spans <- c(.25, .5, .75, 1)

# create loess fits, one for each span
fits <- data_frame(span = spans) %>%
  group_by(span) %>%
  do(augment(loess(NOx ~ E, ethanol, degree = 1, span = .$span)))

# calculate weights to reproduce this with local weighted fits
dat <- ethanol %>%
  inflate(span = spans, center = unique(ethanol$E)) %>%
  mutate(dist = abs(E - center)) %>%
  filter(rank(dist) / n() <= span) %>%
  mutate(weight = (1 - (dist / max(dist)) ^ 3) ^ 3)

# create faceted plot with changing points, local linear fits, and vertical lines,
# and constant hollow points and loess fit
p <- ggplot(dat, aes(E, NOx)) +
  geom_point(aes(alpha = weight, frame = center)) +
  geom_smooth(aes(group = center, frame = center, weight = weight), method = "lm", se = FALSE) +
  geom_vline(aes(xintercept = center, frame = center), lty = 2) +
  geom_point(shape = 1, data = ethanol, alpha = .25) +
  geom_line(aes(y = .fitted, frame = E, cumulative = TRUE), data = fits, color = "red") +
  facet_wrap(~span) +
  ylim(0, 5) +
  ggtitle("x0 = ")

gg_animate(p)

multiple_spans