Theory Stuff 1

<< Click to Display Table of Contents >>

Navigation:  Introduction > Marbles in a bag >

Theory Stuff 1

Previous pageReturn to chapter overviewNext page

I have a distribution of p values, one from each of my sample draws of marbles from the bag.

 

I have computed the mean and the standard deviation of the p values, and consumed a nice cup of coffee with a donut in the process.

 

Just for convenience (my convenience), let's say that the mean of the p values was 0.60, and the standard deviation was 0.10.

 

Now I get down one of my statistics books. How can I use all of these numbers, all these p values and the corresponding distribution of them, to get an estimate of the true number of red marbles in the bag?

 

The book tells me to think binomial, to consider use of the binomial distribution.

 

According to the binomial distribution, the distribution of p values will follow a normal curve. If the mean is 0.60 and the standard deviation is 0.10, 68% of the samples of size n=20 will have a p value between 0.50 and 0.70; 95% of them will have a p value between 0.40 and 0.80.

 

Okay then, what was my objective?

 

It was originally to find the true proportion of red marbles in the bag. That won't be possible unless I count all of the marbles, but there are thousands of them and my time is limited. So my objective has changed: how can I get the best possible estimate of the proportion of red marbles in the bag?

 

I have used an old book, the binomial distribution, plus a calculator, a cup of coffee and a donut, and ended up with a "confidence interval", a range within which I believe the true proportion will lie:

 

I can be 95% confident that the proportion is between 0.40 and 0.80.

 

But wait just a minute. That's really a pretty big range. Couldn't I be more precise, more exact?

 

Another cup of coffee and: I remember my Statistics 101 class. One of the big messages was: draw large samples whenever you can.

 

So. I start over.

 

This time I will draw samples of 100 marbles at a time.

 

I draw lots of them, lots and lots. Each time I compute p, the proportion of red marbles in the sample of 100.

 

Suppose (for convenience) that the mean p value turns out to be 0.60 again. But now, using binomial theory once more, the standard error, that is, the standard deviation, will drop way down to about 0.05 because I have a larger sample.

 

Now I'm in better shape. Now my 95% interval will be 0.50 to 0.70 -- this is better. After doing all this work, I now have quite a bit of confidence in saying that the true proportion of red marbles is between these two values, that is, in percentage terms, 50% to 70%. If I had time I might give it another go, using samples of 200 marbles. But I won't. I just want this message to be out there:

 

So: I can be more confident if I draw larger samples from the bag.

 

Keep this in mind as I now switch back to the matter of developing a test with a number of items.

 

Suppose I took all the word processing questions you wrote just now and turned my attention to one of the students in your word processing class, the class you've been teaching for weeks now.

 

Your student has (supposedly) learned almost all of the stuff material required to answer each and every question. I might see all those questions as marbles, and the student's brain as the "bag".

 

How can I estimate the proportion of questions which the student will get correct when tested? What proportion of the questions does the student know the answer to?

 

Hmmm .... reaching into the bag now will not be so easy -- it has become a brain, but ....

 

---

FootImage1

Footnote: in real life I can get all the information above by drawing just a single sample -- this bit about drawing lots and lots of samples is theory-talk. I'll make the sample draw as large as possible, and I will draw just a single sample. Drawing a sample of 100 marbles would be a reasonable size. I can use the binomial just as I have even with only one sample.