Overview
This lesson shows how to calculate scores and percentiles with the normal distribution.
Objectives
After completing this module, students should be able to:
Reading
Schumacker, Ch 4-6
Let’s pick up where we left off, with the normal distribution. We will begin by thinking about normally distributed populations, and then move on to thinking about how we can learn about large populations using random samples from those populations.
First, consider a normally distributed population, like the scores a large number of students got in an exam. Often we are interested in going back and forth between percentiles and distances from the mean in units of standard deviations. For instance, if your score is 2 standard deviations above the mean, what percentile are you in (that is, what percentage of the population did worse than you)? If you got an 82 on the exam and the mean was 75 with a standard deviation of 7, what percentile are you in? Or if you were in the 15th percentile (ie, 85% of everyone did better than you) and the mean was an 75 with a standard deviation of 7, what score did you get?
These are all just basic questions about the normal distribution that are easily solved if you just visualize things correctly and do a little simple algebra.
Here is the pdf and cdf of a normal distribution with a mean of 75 and a standard deviation of 7:
library(ggplot2)
normfun <- function(x){dnorm(x,75,7)}
ggplot(data=data.frame(x=c(50, 100)),aes(x)) + ylab("density") + xlab("outcome") +
stat_function(fun=normfun)
cumlfun <- function(x){pnorm(x,75,7)}
ggplot(data=data.frame(x=c(50, 100)),aes(x)) + ylab("cumulative distribution") + xlab("outcome") +
stat_function(fun=cumlfun)
Since all normal distributions are fundamentally the same shape, we can answer any question about the cdf by thinking in terms of how many standard deviations an observation is from the mean – its “z-score”. A person with a score of 82 in an exam where the mean was 75 with a standard deviation of 7 is in the same percentile as a person with a score of 59 in an exam with a mean of 50 and a standard deviation of 9: both are one sd above the mean.
If the mean is 0 and the standard deviation is 1, we can calculate the percentile for being 1 sd above the mean with pnorm as usual:
pnorm(1,0,1)
[1] 0.8413447
If we want the percentile in an exam where the mean was 75, sd was 7, and you got an 82:
pnorm(82,75,7)
[1] 0.8413447
Conversely, if you know you are in the 15th percentile (ie, cdf(x) = 0.15) and the mean is 75 and sd is 7, then what score did you get:
qnorm(0.15,75,7)
[1] 67.74497
You can visually confirm these last two results by looking at the previous graphs. Note how a score of 82 is a bit above 0.75 on the cdf (0.8413 to be precise), and that the 0.15 percentile (y axis) corresponds to a score a bit under 70 (67.74 to be precise).
Again, the z-score is just the term for how many standard deviations one is above or below the mean (this term only applies to normal distributions, and as you might guess, a negative z-score is the number of standard deviations below the mean).
Whenever we want to move back and forth between percentiles and raw scores (or whatever our measurement is), we go via the z-score:
\[z = \frac{x-\mu}{\sigma}\]
Again, the z score is just how many \(\sigma\)s a value (\(x\)) is above or below the mean (\(\mu\)). A \(x\) that is two sd’s above the mean means that \(x - \mu\) is \(2\sigma\), which yields a z-score of \(2\).
It used to be that instead of having nice computers to calculate percentiles from scores or vice versa, people had to rely on a tables composed by those few who did have computers (or lots of patience). But since all normal distributions are essentially the same, and since it would be nuts to print out a different table for every combination of mean and sd under the sun, there was only one table you had to carry with you: one that translated between z-scores and percentiles. If you know the mean and sd, you can calulate the z-score from a given \(x\) value and thence the percentile, or if you know the percentile, you can calculate the z-score and thence the raw \(x\). Here’s an example of part of a z-table:
To get the percentile, look up the first digit of the z-score on the left, and the second along the top (eg, row 11 column 1, z=1.00, corresponds to 0.8413); or for the reverse, find the percentile and match to the z-score. Luckily we don’t need to bother with such things any more, but it’s worth knowing – especially the part about z-scores just being a measure of sd’s above or below the mean.