The Central Limit Theorem (Part 3)
In the activity The Central Limit Theorem (Part 1), we concluded with the following observations on the Central Limit Theorem.
- If you draw samples from a normal distribution, then the distribution of sample means is also normal.
- The mean of the distribution of sample means is identical to the mean of the "parent population," the population from which the samples are drawn.
- The higher the sample size that is drawn, the "narrower" will be the spread of the distribution of sample means.
To provide visual evidence of the narrowing spread, the following code will produce the image shown in Figure 1. The four examples pictured use sample sizes of 5, 10, 20, and 30. (Note: It is very difficult to type the following commands without error. In a moment, we'll provide a more efficient way to enter the commands. In the meantime, give the commands a try, if only to appreciate how difficult it is to get them correctly entered in sequence.)
> par(mfrow=c(2,2))
>
> n=5
> xbar_5=rep(0,500)
> for (i in 1:500){ xbar_5[i]=mean(rnorm(n,mean=100,sd=10))}
> hist(xbar_5,prob=TRUE,breaks=12,xlim=c(70,130))
>
> n=10
> xbar_10=rep(0,500)
> for (i in 1:500) { xbar_10[i]=mean(rnorm(n,mean=100,sd=10))}
> hist(xbar_10,prob=TRUE,breaks=12,xlim=c(70,130))
>
> n=20
> xbar_20=rep(0,500)
> for (i in 1:500) { xbar_20[i]=mean(rnorm(n,mean=100,sd=10))}
> hist(xbar_20,prob=TRUE,breaks=12,xlim=c(70,130))
>
> n=30
> xbar_30=rep(0,500)
> for (i in 1:500) { xbar_30[i]=mean(rnorm(n,mean=100,sd=10))}
> hist(xbar_30,prob=TRUE,breaks=12,xlim=c(70,130))
Some comments are in order:
- There is a new command, par(mfrow=c(2,2)), which divides the graphics window into two rows and two columns (see Figure 1). The par stands for "parameter," and there are a whole host of that can alter the figure window (type ?par for more information).
- Think of mfrow as meaning "multiple figures entered by rows." Because there are two rows and two columns, the first histogram command creates a plot in row 1, column 1, the second in row 1, column 2, the third in row 2, column 1, and the fourth in row 2 column 2. There is also a command mfcol that enters the figures by columns.
- The for loops are explained in the activity The Central Limit Theorem (Part 1).
- Note that xlim=c(70,130) creates a common axis so that a proper comparison of the spreads can be made.
In Figure 1, note that the spread narrows as the sample size increases from n = 5 to n = 10, n = 20, and n = 30.

Figure 1. A visual comparison of the distribution of sample means as the sample size increases.
Note that the mean of each distribution of sample means is 100 (each of the histograms in Figure 1 balances about 100), the same as the mean of the parent population from which the samples are drawn.
Sourcing Files
If you were successful in typing the lines that produced Figure 1, then you are far better typist than me. When commands must be correctly entered in sequence, such as the lines of code that produced Figure 1, it can be exasperating to make a mistake, knowing that you must retype the entire sequence anew.
A far more efficient way to proceed is to use R's editor to enter the lines of code, save the file, then execute the code in the file. This way, if you make a simple typing mistake, you make a change to the file, save, then execute the file anew. R calls these files of code "source files," and executing the file is called "sourcing the file."
On the Mac operating system (Windows and Linux should be similar), you open the editor by selecting New Document from the File menu (there is also an icon for new documents on the console toolbar). When the editor opens, type the following lines.
par(mfrow=c(2,2))
n=5
xbar_5=rep(0,500)
for (i in 1:500){ xbar_5[i]=mean(rnorm(n,mean=100,sd=10))}
hist(xbar_5,prob=TRUE,breaks=12,xlim=c(70,130))
n=10
xbar_10=rep(0,500)
for (i in 1:500) { xbar_10[i]=mean(rnorm(n,mean=100,sd=10))}
hist(xbar_10,prob=TRUE,breaks=12,xlim=c(70,130))
n=20
xbar_20=rep(0,500)
for (i in 1:500) { xbar_20[i]=mean(rnorm(n,mean=100,sd=10))}
hist(xbar_20,prob=TRUE,breaks=12,xlim=c(70,130))
n=30
xbar_30=rep(0,500)
for (i in 1:500) { xbar_30[i]=mean(rnorm(n,mean=100,sd=10))}
hist(xbar_30,prob=TRUE,breaks=12,xlim=c(70,130))
Save the file as central3_1.R. The .R extension reminds us that the file contains R source code.
To execute the code in the file central3_1.R, select Source File from the File menu (there is also an icon for "sourcing" files on the console toolbar). This opens a dialog where you can browse your system in the usual manner and select the saved source file (central3_1.R in our case). Clicking the Open button will execute the source code in the file and produce the image in Figure 1. If you have errors, or wish to make adaptions, correct your source code, save, then "source" the file again.
On our system, "sourcing" the file central3_1.R sends the following command to the console:
> source("/Users/darnold/Documents/MathDept/trunk/MathDept/html/R/code/central3_1.R")
This line shows both the path and filename of the file containing our source code. We've found that should errors occur, or should we wish to add further code or make changes, we can save our changes, then hit the "up-arrow" on our keyboard to replay this line in our console. Hitting enter executes the line and "sources" our saved changes, a most efficient way to work.
Examining the Workspace
Once you've "sourced" your code, the workspace will reflect variables used in your code. (Note: Your workspace may differ.)
> ls() [1] "i" "n" "xbar_10" "xbar_20" "xbar_30" "xbar_5"
The variable i should contain the last value of iteration.
> i [1] 500
The variable n should contain the last sample size used in your code.
> n [1] 30
The variables xbar_5, xbar_10, xbar_20, and xbar_30 should contain 500 sample means with the indicated sample size. For example, xbar_5 contains the means of 500 samples of size n = 5 drawn from a normal population having mean μ = 100 and standard deviation σ = 10.
> xbar_5 [1] 94.47642 98.71575 106.02464 104.54210 102.46225 95.55821 [7] 106.01677 103.29282 101.07612 93.36717 97.08930 99.53715 [13] 98.72121 91.59586 100.12131 100.20003 99.66180 97.42841 [19] 102.15585 93.75008 98.89875 103.13431 101.45754 97.65430 [25] 103.02266 97.22582 95.37828 96.91194 98.07255 96.93296 ... ... [493] 102.79047 92.16152 103.85065 95.74779 92.83653 97.43172 [499] 99.70777 106.23438
The histograms in Figure 1 indicate that the spread of the distributions of sample means narrows with increasing sample size. Let's examine the standard deviations of xbar_5, xbar_10, xbar_20, and xbar_30. The sd command is used to find the standard deviation of any collection of data.
> sd(xbar_5) [1] 4.52229 > sd(xbar_10) [1] 3.179132 > sd(xbar_20) [1] 2.165316 > sd(xbar_30) [1] 1.820735
Sure enough, as the sample size increases, the standard deviation, which is a measure of the spread of the distribution, clearly decreases.
Adding Data Points
Let's add a couple more data points. Save the following lines in the file central3_2.R.
n=5
xbar_5=rep(0,500)
for (i in 1:500){ xbar_5[i]=mean(rnorm(n,mean=100,sd=10))}
n=10
xbar_10=rep(0,500)
for (i in 1:500) { xbar_10[i]=mean(rnorm(n,mean=100,sd=10))}
n=20
xbar_20=rep(0,500)
for (i in 1:500){ xbar_20[i]=mean(rnorm(n,mean=100,sd=10))}
n=30
xbar_30=rep(0,500)
for (i in 1:500) { xbar_30[i]=mean(rnorm(n,mean=100,sd=10))}
n=40
xbar_40=rep(0,500)
for (i in 1:500) { xbar_40[i]=mean(rnorm(n,mean=100,sd=10))}
n=50
xbar_50=rep(0,500)
for (i in 1:500) { xbar_50[i]=mean(rnorm(n,mean=100,sd=10))}
Now, collect the sample sizes in a vector ssizes and standard deviations of the distributions of sample means in a vector sdevs, then plot the standard deviations versus the sample sizes. Add these lines to the code in the file central3_2.R.
ssizes=c(5,10,20,30,40,50) sdevs=c(sd(xbar_5),sd(xbar_10),sd(xbar_20),sd(xbar_30),sd(xbar_40),sd(xbar_50)) plot(ssizes,sdevs)
Save the file central3_2.R then "source" the file to produce the scatterplot shown in Figure 2.

Figure 2. Plotting the standard deviations of the distributions of sample means versus the sample size.
Modeling Standard Deviation versus Sample Size
Let's examine the current workspace in R. Note that this next command is executed at the command line.
> ls() [1] "i" "n" "sdevs" "ssizes" "xbar_10" "xbar_20" [7] "xbar_30" "xbar_40" "xbar_5" "xbar_50"
Good! All the data we need exists in the current workspace.
We will now try to find the specific relationship between the standard deviations of the distributions of sample means and the corresponding sample size. In the activity Transforming Data in R, we examined two models: power functions and exponential functions. Let's plot the logarithm of sdevs versus ssizes.
> plot(log(ssizes),log(sdevs))
The above command produces the scatterplot shown in Figure 3.

Figure 3. Plotting the logarithm of sdevs versus the logarithm of ssizes.
The fact that the plot of the logarithm of sdevs versus ssizes in Figure 3 is linear indicates that a power function would be a good model for the relationship between sdevs and ssizes.
As we learned in the activity Transforming Data in R, we first fit a line to the logarithm of sdevs versus the logarithm of ssizes in Figure 1.
> res=lm(log(sdevs)~log(ssizes))
> res
Call:
lm(formula = log(sdevs) ~ log(ssizes))
Coefficients:
(Intercept) log(ssizes)
2.2962 -0.4949
Let's add the line of best fit to the data in Figure 3.
> abline(res)
This produces the line of best fit in Figure 4.

Figure 4. Adding the line of best fit..
The fact that the line fits the data in Figure 4 so well is a strong indication that a power function is the relationship we seek between the standard deviations of the distribution of sample means and the sample sizes.
The output in the variable res tells us that the line of best fit in Figure 4 has the following equation.
log sdevs = 2.2962 - 0.4949 log ssizes
We exponentiate both sides of the last equation.
elog sdevs = e2.2962 - 0.4949 log ssizes
On the left, the exponential and the logarithm are inverses. On the right, the exponential of a sum is the product of the exponentials.
sdevs = e2.2962 e - 0.4949 log ssizes
We compute e2.2962.
> exp(2.2962) [1] 9.936352
Thus, our last equation becomes:
sdevs = 9.936352 e - 0.4949 log ssizes
We use a property of logarithms to move - 0.4949 into the exponent.
sdevs = 9.936352 elog ssizes - 0.4949
Again, the logarithm and exponential are inverses, which leads to the following result.
sdevs = 9.936352 ssizes - 0.4949
Conclusions
Suppose that we round the numbers in our last result to obtain:
sdevs = 10 ssizes - 0.5
Note the number 10. It is not a coincidence that this number is the standard deviation of the parent population. Remember, we drew our samples from a normal distribution with mean μ = 100 and standard deviation σ = 10.
Indeed, here is the final conclusion of the Central Limit Theorem.
Central Limit Theorem. Let X be a random variable having normal distribution with mean μ and standard deviation σ. Let X be the random variable representing the mean of a sample of size n drawn from the distribution of X. Then:
- The distribution of X will be normal.
- The mean of the distribution of X is identical to the mean of the distribution of X. In symbols, we write:

- The standard deviation of the distribution of X is related to the standard deviation of the distribution of X by the following equation:

Note that the result in item #3 is identical to our experimental result, repeated here for convenience.
sdevs = 10 ssizes - 0.5
The comparison is better seen if we use a little algebra to change the form of the result in item #3.
![]()
Thus, to find the standard deviation of the distribution of sample means, take the standard deviation of the parent population and divide it by the square root of the sample size.
Sampling from Distributions that are not Normal
Here is the key fact: The hypotheses and conclusions of the Central Limit Theorem as stated above hold regardless of the distribution of the parent population, provided the sample size is large enough. We now will examine this statement through examples.
Drawing from a Uniform Distribution
Consider the uniform distribution on the interval [0,10].
> x=seq(0,10,0.1) > plot(x,y,type="l") > plot(x,y,type="l",lwd=2,col="red")
These commands produce the uniform density function on [0,10] in Figure 5.
![The uninform density function on [0,10].](graphics/central3_8.gif)
Figure 5. The uniform density function on [0,10].
The mean of the uniform distribution on the interval [a, b] is given by the formula:
![]()
Thus, the mean of the uniform distribution on [0,10] is:
![]()
You need to use a little calculus to determine the standard deviation of the uniform distribution on the interval [a, b], but the result is:
![The standard deviation of the uniform distribution on the interval [a,b] is found by taking the difference of b and a, squaring, dividing by 12, then taking the square root of the result.](graphics/central3_11.png)
Thus, the standard deviation of the uniform distribution on the interval [0,10] is:
![The standard deviation of the uniform distribution on [0,10] is iapproximately 2.886751.](graphics/central3_12.png)
Now, let's draw 500 samples of size n=100 from the uniform distribution on [0,10].
> n=100
> xbar=rep(0,500)
> for (i in 1:500) { xbar[i]=mean(runif(n,min=0,max=10))}
> hist(xbar,prob=TRUE)
These commands produce the histogram in Figure 6.
![Histogram of means of samples from the uniform distribution o [0,10] having sample size n = 100.](graphics/central3_13.gif)
Figure 6. Histogram of means of samples of size n = 10 drawn from the uniform distribution [0,10].
Now, let's check the conclusions of the Central Limit Theorem.
- The first conclusion of the Central Limit Theorem is the fact that the distribution of sample means is normal. This is evident in Figure 6.
- The second conclusion of the Central Limit Theorem states that the mean of the sample means is the same as the mean of the distribution from which the samples are drawn. In Figure 6, the mean of the distribution of sample means appears to be about 5 (estimate the balance point in Figure 6), which is the same as the mean of the uniform distribution on [0,10]. However, let's do one more check:
> mean(xbar) [1] 5.002197
Looks good!
- The third conclusion of the Central Limit Theorem states that the standard deviation of the distribution of sample means is calculated with:

Let's check this prediction.
> sd(xbar) [1] 0.2936821
That's pretty good agreement!
Drawing from an Exponential Distribution
In the activity Continuous Distributions in R, we examined the exponential function having the following probability density function:
![]()
The mean and standard deviation of this exponential distribution are μ = 1/λ and σ = 1/λ, respectively.
Let's consider the exponential distribution with λ = 1.
> lambda=1 > y=dexp(x,rate=lambda) > plot(x,y,type="l",lwd=2,col="red")
These commands produce the histogram in Figure 7.

Figure 7. The exponential density function for λ = 1.
The mean and standard deviation of this distribution are:
μX = 1/λ = 1/1 = 1 and σX = 1/λ = 1/1 = 1
Now, let's draw 500 samples of size n=100 from the exponential distribution with λ = 1.
> n=100
> xbar=rep(0,500)
> for (i in 1:500) { xbar[i]=mean(rexp(n,rate=lambda))}
> hist(xbar,prob=TRUE)
These commands produce the histogram of the sample means shown in Figure 8.

Figure 8. Distribution of sample means of size n = 100 from the exponential distribution with λ = 1.
Now, let's check the conclusions of the Central Limit Theorem.
- The first conclusion of the Central Limit Theorem is the fact that the distribution of sample means is normal. This is evident in Figure 8.
- The second conclusion of the Central Limit Theorem states that the mean of the sample means is the same as the mean of the distribution from which the samples are drawn. In Figure 8, the mean of the distribution of sample means appears to be about 1 (estimate the balance point in Figure 8), which is the same as the mean of the exponential distribution with λ = 1. However, let's do one more check:
> mean(xbar) [1] 0.9997962
Looks good!
- The third conclusion of the Central Limit Theorem states that the standard deviation of the distribution of sample means is calculated with:

Let's check this prediction.
> sd(xbar) [1] 0.09948667
That's pretty good agreement!
Enjoy!
We hope you enjoyed this third activity on the Central Limit Theorem system. We encourage you to explore further. You might try repeating the experiments in this activity with different "parent" distributions.
