Say I have a mean and standard deviation for a dataset of 5 elements.
I now add a sixth element. Is there a way to calculate the new mean and standard deviation using the information we had prior (i.e. not just recalculating the whole thing from scratch)?
For the mean, I see that I can just multiply the old one by $5$, add the new element, and divide by $6$.
I'm not sure if there's something I can do with the standard deviation, however.
$$\sigma_{old} = \sqrt{\sum_i (X_i - \mu_{old})^2}$$
$$\sigma_{new} = \sqrt{\sum_i (X_i - \mu_{new})^2 + (X_{new} - \mu_{new})^2}$$
$$\mu_{new} = \frac{\mu_{old}*N + X_{new}}{N+1}$$
$$\sigma^2_{new} = \sigma^2_{old} + \sum_i \left( (X_i - \mu_{new})^2 - (X_i - \mu_{old})^2 \right) + (X_{new} - \mu_{new})^2$$
After putting it in terms of the old stats, this becomes (I think)
$$\sigma^2_{new} = \sigma^2_{old} + \sum_i \left(2 X_i + \frac{(2N+1) \mu_{old} + X_{new}}{N+1} \right) \left(\frac{X_{new} - \mu_{old}}{N+1}\right) + (X_{new} - \frac{\mu_{old}*N + X_{new}}{N+1})^2$$
Is there anything better than this monstrosity?
$\endgroup$ 22 Answers
$\begingroup$Let's say you started with n points and have added an $(n+1)^{st}$. To handle the variance write $\mu_{new} = \mu_{old} + \delta$ . We see that we need to compute $$\sum_{i=1}^{n} (x_i - \mu_{new})^2$$ Where the sum is just taken over the old $x_i$'s (the contribution from the $(n+1)^{st}$ sample being easily incorporated. But $$(x_i - \mu_{new})^2 = (x_i - \mu_{old} - \delta)^2$$ So our sum becomes $$\sum_{i=1}^{n} (x_i - \mu_{new})^2 = \sum_{i}^{n} (x_i - \mu_{old})^2 - 2 \delta \sum_{1}^{n} (x_i - \mu_{old}) + n \delta^2 = \sum_{i}^{n} (x_i - \mu_{old})^2 + n \delta^2$$ Where the middle sum vanishes as the old x's sum to the old mean.
Combining all this (and trusting that no algebraic error has been made!) we see that $$Var_{new} = \frac{(x_{n+1}-\mu_{new})^2}{n+1}+ \frac{n}{n+1}Var_{old} + \frac{n}{n+1}\delta^2$$
Not too terrible!
$\endgroup$ $\begingroup$If $\mu_n = \frac1n \sum_{i=1}^n x_i$, $$\mu_{n+1} = \frac{1}{n+1}\sum_{i=1}^{n+1}x_i = \frac{1}{n+1}\big[x_{n+1} + \sum_{i=1}^nx_i\big] = \frac{x_{n+1}}{n+1} + \frac{n}{n+1}\mu_n$$
As you stated, the running sample standard deviation is much trickier. Check this link out: Incremental Mean and Standard Deviation Calculation
They provide derivations for the following incremental variance formula: $$\sigma_{n+1}^2 = \frac{1}{n+1}\big[n\sigma_n^2 + n(n-1)(\mu_{n+1} - \mu_n)^2\big]$$
$\endgroup$