Have you ever heard of confidence intervals? Probably. Have you ever made them? Probably not. If you're like me, you studied confidence intervals in Stats 101, you're convinced of their importance, but when it comes to plotting data they have always seemed like too much trouble.
Getting into R, the popular language for computing statistics, I thought there would be some built-in function to make it easy -- maybe plot.with.confidence. No such luck. But I discovered it is easy in R *if* you learn how to use the language to its best advantage.
Below is my solution for binomial data. It is comes from an analysis of clicks and impressions over time. The code could easily be adapted for other distributions, like a normal distribution.
First, here is a plotting function to draw the statistic of interest plus lines above and below for confidence.
plot.conf = function(x, y, upper, lower, ...) {
plot(x, y, type='l', lwd=2, ylim=c(min(lower), mean(y) + 3*sd(y)), ...)
lines(x, upper, type='l', lty=1, lwd=.5)
lines(x, lower, type='l', lty=1, lwd=.5)
}
Notice the "..." for variable keyword arguments (similar to Python's **kwargs). Also notice the ylimits are based on the y values. Confidence intervals can become vary wide and scale a graph beyond comprehension.
Now the part that calculates the upper and lower confidence lines. The way to get a binomial confidence interval in R is to perform a binomial test. This returns an object which has a confidence interval attached, a common pattern in R.
We can loop implicitly over all our data points because R is a functional language.
CI = mapply(
function(s, t) {
b = binom.test(s, t, conf.level=0.95)
b$conf.int[1:2]
},
successes, trials
)
Mapply will apply a function of x parameters to x vectors. The return value(s) are put in a matrix. Remember that R will return the last expression in a function as its value.
Finally we call our plotting function with the appropriate arguments.
plot.conf(xvalues, successes / trials, CI[2,], CI[1,], main='Plot with confidence')