Aggregate Means and Standard Deviations
Imagine that we need to monitor the average response time of a webpage and the standard deviation daily. This is a simple ask. At the end of the day, we will get a list of response times \(x_1, x_2, ..., x_n\). Then we use the formulas to get the mean \(\mu\) and standard deviation \(\sigma\):
What if at the end of the month we want to get the mean and standard deviation for the whole month? Well, it is also a simple ask. The simpleset way is to pull the record of all response times and use the formulas again to compute the mean and standard deviation. But we already has computed the daily means and standard deviations. Can we aggregate the daily means and standard deviations into monthly statistics? More generally, can we aggregate means and standard deviations of subsets into that of the whole set? The answer is affirmative with the following formulas.
Proof. The aggregation of means is simple. Let \(\mathrm{sum}(S_i)\) be the sum of all elements in \(S_i\), then \(\mathrm{sum}(S_i) = n_i \mu_i\) by the definition of mean. Then
The mean \(\mu\) of \(S\) is
So the mean of the whole set is the weighted sum of the means of subsets. The aggregation of standard derivation is a bit trickier. We need to use the "Shortcut Formula for Variance" - \(Var X = E(X^2) - (EX)^2.\) Translating it to a set of samples \(\{x_1, ..., x_n\}\) with mean \(\mu\) and standard deviation \(\sigma\), it is
We will provide a proof of the "Shorcut Formula for Variance" at the end. For now, let's proceed assuming the shorcut formula. Applying the shortcut formula to a subset \(S_i\):
Apply the shorcut formula again but this time on the whole set \(S\), we get
Dividing both sides by \(n = \sum_i^k n_i\), we get the formula to aggregate standard deviations. \(\square\)
For completeness, I will include the proof of the well-known shortcut formula we use in the above proof.
Proof. By definition of the mean \(\mu\), we know \(n\mu = \sum_i^n x_i\). By definition of the standard deviation \(\sigma\),
That completes the proof of the shortcut formula. Indeed, if we divide both sides by \(n\), we can get the fancy form of the formula: \(Var X = E(X^2) - (EX)^2\). \(\square\)