Streaming mean and standard deviation

3/25/06

If you have N numbers and you take the mean of these numbers, how fast can you do it? For small N, the calculation is very quick. However, when N gets large (and in this day and age with terrabytes of real-time Internet, genome, geophysical, satellite data, etc.) the calculation can take much too long, even for something this simple.

Streaming statistics is a way around this. Instead of storing N numbers, then calculating statistics, T, using that stored data, one calculates statistics in real-time, and updates the statistics as each new number arrives. The differences are summarized below

Standard: Get x₁, x₂, ..., x_n, store to vector X, calculate T(X)

Streaming: Get x₁, calculate T(x₁), get x₂, calculate f(x₂,T(x₁)), get x₃, calculate f(x₃,f(x₂,T(x₁))), etc.

With streaming statistics one can still store the data "for the record", but the statistics are not calculated as T(X).

Here is the theory behind a streaming mean and a streaming standard deviation.

Let the mean of t numbers be xbar_t. Then

xbar_t+1 = ((t-1)*xbar_t + x_t)/t

For the standard deviation, we first have to calculate a streaming mean of the t squared numbers, xbar2_t, and then the streaming standard deviation is

stddev_t+1 = ((t*xbar2_t-t*xbar_t²)/(t-1))^.5

I wrote the following program for my calculator that calculates a streaming mean and standard deviation.

strmstat()
Prgm
ClrIO
1->t
Input "Number "&string(t),xt
xt->xbart1
xt^2->xbartt1
Lbl a
1+t->t
Input "Number "&string(t),xt
((t-1)*xbart1+xt)/t->xbart1
((t-1)*xbartt1+xt^2)/t->xbartt1
Disp "Mean=",xbart1
Disp "Stddev=",sqrt((t*xbartt1-t*xbart1^2)/(t-1))
Goto a
EndPrgm

The above procedure made the mean and standard deviation calculatable by streaming. Therefore, any statistic that is a function of the mean and/or the standard deviation is able to be calculated by streaming.

Please anonymously VOTE on the content you have just read:

Like:
Dislike: