MBS Research Course –
Statistics Component Part 1 Dr. HN
Mayrovitz
Standard Deviation
The standard deviation (SDm) of a set of measurements is an index of the scatter between the set of measured values. It indicates the variability of the values from the mean or average value Xm.
Calculating
Standard Deviation of the Sample of Measured Data
The deviation from the mean (Xm) of each measurement is determined as (Xi – Xm). These deviations are squared as (Xi – Xm)2. The average of all squared deviations is calculated yielding a quantity called variance. The square root of the variance is the SDm.
The standard deviation is expressed mathematically as:
SDm = where Xm
= (1/N)
and N is the number of measurements
If the measured data is Normally distributed or if N is large then about 95% of all measured values will lie within 2 SDm of Xm and more than 99.7% will lie within 3 SDm of Xm.
Population
Standard Deviation
The preceding shows how to calculate SDm for a measured sample set. But,
that sample set is only one of many from some universe of possible samples.
That universe is referred to as the population from which a given sample set is
taken. So what can be said about the standard deviation of the population () based on the SDm of the measured sample? Or stated
otherwise, can we estimate the standard deviation of the values found in the
population from which our sample was taken? In turns out that
can be estimated from SDm as
=
SDm/(N-1).
Confidence in the
Sample Mean as an estimate of the Population Mean - Standard Errors
The measurements of a sample size N yields a mean value Xm. How do we determine
how well Xm estimates the population mean, Xp? This estimate can be made using
the standard error of the mean (SEM) as an estimate
of the population SD (), by dividing SDm by
the square root of N.
SEM = SDm /. The level of confidence that Xm estimates Xp increases
as SEM decreases.
Confidence in the Sample SD as an
estimate of Population SD
This depends on the standard error of the standard deviation (SESD) which can be calculated as:.
SESD =SDm /.
Confidence in the estimate of the population SD (s) increases as SESD decreases which occurs when the sample measurements are greater (N) and/or the sample SDm is smaller.
Example #1.
A group of 120 young male adult medical students agrees to participate in your research study in which you will measure their seated blood pressures in one arm after 10 minutes of seated rest. The following average values (mmHg) and standard deviations were obtained.
What is the SEM for the measured data and what does it tell us about the population of all young male adult medical students?
Systolic: SEM = 11.29/sqrt(120) =1.03 mmHg (estimate of the population SD, s)
Diastolic: SEM = 6.92/sqrt(120) = 0.63 mmHg
Since SEM is an estimate of the standard deviation of the population (s) and since more than 99.7% of values are expected to be within ±3s we can be very confident* that average blood pressures of young male adult medical students will be with the following ranges.
Systolic: 125.67 ± 3 x 1.03 mmHg or between 122.58 to 128.76 mmHg
Diastolic: 82.45 ± 3 x 0.63 mmHg or between 80.56 to 84.34 mmHg
* How to determine the actual level of confidence comes later.
Example 2.
In Example 1, what changes if we include considerations of the confidence in our estimate of s?
In example 1 we used the estimated s (based on SEM) as a single value and did not consider its estimated variability based on the standard error of the sample standard deviation (SESD).
Systolic: SESD = 11.29/sqrt(2 x 120) = 0.73 mmHg
So we can be confident that s is between 1.03 ± 3 x 0.73 mmHg or between -1.16 to 3.22 mmHg
How does this effect the result of Example 1? The short (simplified) answer is that it extends the estimated confidence interval because the single estimated SEM used in example 1 (1.03 mmHg) needs to be changed to take into account our confidence in the estimated population s. For example if the actual s were at the estimated upper bound (3.22 mmHg) then 3 s = 9.66 mmHg and not the
3 x 1.03 = 3.09 mmHg as used in Example 1. The actual method of taking into account the confidence intervals of the estimates is more complicated than this an will be considered later.
For now it is sufficient to realize that confidence in sample measurements to estimate the population depend on the SD of measured values and on the SESD.