MBS Research Course – Statistics Component Part 1       Dr. HN Mayrovitz

 

Standard Deviation

The standard deviation (SDm) of a set of measurements is an index of the scatter between the set of measured values. It indicates the variability of the values from the mean or average value Xm.

Calculating Standard Deviation of the Sample of Measured Data

The deviation from the mean (Xm) of each measurement is determined as (Xi – Xm). These deviations are squared as (Xi – Xm)2. The average of all squared deviations is calculated yielding a quantity called variance. The square root of the variance is the SDm.  

The standard deviation is expressed mathematically as:

SDm =  where Xm = (1/N) and N is the number of measurements

If the measured data is Normally distributed or if N is large then about 95% of all measured values will lie within 2 SDm of Xm and more than 99.7% will lie within 3 SDm of Xm.

Population Standard Deviation

The preceding shows how to calculate SDm for a measured sample set. But, that sample set is only one of many from some universe of possible samples. That universe is referred to as the population from which a given sample set is taken. So what can be said about the standard deviation of the population () based on the SDm of the measured sample? Or stated otherwise, can we estimate the standard deviation of the values found in the population from which our sample was taken? In turns out that  can be estimated from SDm as  = SDm/(N-1).

Confidence in the Sample Mean as an estimate of the Population Mean - Standard Errors

The measurements of a sample size N yields a mean value Xm. How do we determine how well Xm estimates the population mean, Xp? This estimate can be made using the standard error of the mean (SEM) as an estimate of the population SD (), by dividing SDm by the square root of N.      

SEM = SDm /. The level of confidence that Xm estimates Xp increases as SEM decreases.  

Confidence in the Sample SD as an estimate of Population SD

This depends on the standard error of the standard deviation (SESD) which can be calculated as:.

SESD =SDm /.  

Confidence in the estimate of the population SD (s) increases as SESD decreases which occurs when the sample measurements are greater (N) and/or the sample SDm is smaller.

 

Example #1.

A group of 120 young male adult medical students agrees to participate in your research study in which you will measure their seated blood pressures in one arm after 10 minutes of seated rest. The following average values (mmHg) and standard deviations were obtained.

What is the SEM for the measured data and what does it tell us about the population of all young male adult medical students?

Systolic: SEM = 11.29/sqrt(120) =1.03 mmHg  (estimate of the population SD, s)

Diastolic: SEM = 6.92/sqrt(120) = 0.63 mmHg

Since SEM is an estimate of the standard deviation of the population (s) and since more than 99.7% of values are expected to be within ±3s we can be very confident* that average blood pressures of young male adult medical students will be with the following ranges.

Systolic: 125.67 ± 3 x 1.03 mmHg or between 122.58 to 128.76 mmHg

Diastolic: 82.45 ± 3 x 0.63 mmHg or between    80.56 to 84.34 mmHg

* How to determine the actual level of confidence comes later.

Example 2.

In Example 1, what changes if we include considerations of the confidence in our estimate of s?

 

In example 1 we used the estimated s (based on SEM) as a single value and did not consider its estimated variability based on the standard error of the sample standard deviation (SESD).

Systolic: SESD = 11.29/sqrt(2 x 120) = 0.73 mmHg

So we can be confident that s is between 1.03  ± 3 x 0.73 mmHg or between -1.16 to 3.22 mmHg

How does this effect the result of Example 1? The short (simplified) answer is that it extends the estimated confidence interval because the single estimated SEM used in example 1 (1.03 mmHg) needs to be changed to take into account our confidence in the estimated population s. For example if the actual s were at the estimated upper bound (3.22 mmHg) then 3 s = 9.66 mmHg and not the

3 x 1.03 = 3.09 mmHg as used in Example 1. The actual method of taking into account the confidence intervals of the estimates is more complicated than this an will be considered later.

For now it is sufficient to realize that confidence in sample measurements to estimate the population depend on the SD of measured values and on the SESD.