**If you find any of this useful, please consider donating via PayPal to help keep this site going.**

**Email news@statisticool.com to sign up to receive news and updates**

# Streaming mean and standard deviation

**3/25/06**

If you have N numbers and you take the mean of these numbers, how fast can you do it? For small N, the calculation is very quick. However, when N gets large (and in this day and age with terrabytes of real-time Internet, genome, geophysical, satellite data, etc.) the calculation can take much too long, even for something this simple.

Streaming statistics is a way around this. Instead of storing N numbers, then calculating statistics, T, using that stored data, one calculates statistics in real-time, and updates the statistics as each new number arrives. The differences are summarized below

**Standard:** Get x_{1}, x_{2}, ..., x_{n}, store to vector X, calculate T(X)

**Streaming:** Get x_{1}, calculate T(x_{1}), get x_{2}, calculate f(x_{2},T(x_{1})), get x_{3}, calculate f(x_{3},f(x_{2},T(x_{1}))), etc.

With streaming statistics one can still store the data "for the record", but the statistics are not calculated as T(X).

Here is the theory behind a streaming mean and a streaming standard deviation.

Let the mean of t numbers be xbar_{t}. Then

_{t+1}= ((t-1)*xbar

_{t}+ x

_{t})/t

For the standard deviation, we first have to calculate a streaming mean of the t squared numbers, xbar2_{t}, and then the streaming standard deviation is

_{t+1}= ((t*xbar2

_{t}-t*xbar

_{t}

^{2})/(t-1))

^{.5}

I wrote the following program for my calculator that calculates a streaming mean and standard deviation.

strmstat()

Prgm

ClrIO

1->t

Input "Number "&string(t),xt

xt->xbart1

xt^2->xbartt1

Lbl a

1+t->t

Input "Number "&string(t),xt

((t-1)*xbart1+xt)/t->xbart1

((t-1)*xbartt1+xt^2)/t->xbartt1

Disp "Mean=",xbart1

Disp "Stddev=",sqrt((t*xbartt1-t*xbart1^2)/(t-1))

Goto a

EndPrgm

The above procedure made the mean and standard deviation calculatable by streaming. Therefore, any statistic that is a function of the mean and/or the standard deviation is able to be calculated by streaming.

If you enjoyed *any* of my content, please consider supporting it in a variety of ways:

- Email news@statisticool.com to sign up to receive news and updates
- Donate any amount via PayPal
- Take my Five Poem Challenge
- Buy ad space on Statisticool.com
- Visit my Amazon author page
- Buy what you need on Amazon using my affiliate link
- Follow me on Twitter here