﻿ To Halve and Hold

# To Halve and Hold

This option is used to create two random samples of data records, dividing a data set into halves on a random basis.

How does it do it?  It begins by making a copy of the original Data and CCs worksheets, placing them in a new workbook.  For convenience, assume that Excel calls this new workbook "Book1".

Then Halve&Hold uses two standard Excel functions to generate a set of random numbers between 1 and the number of data records in the original Data worksheet, denoted as "ArraySize" below:

Randomize

{... more code ...}

RandomValue = Int((ArraySize * Rnd) + 1)

{... more code ...}

The Randomize function provides a seed to Excel's Rnd routine.  It uses the computer's clock to do this, guaranteeing that the random numbers generated will differ each time Halve&Hold is run.

Random numbers are generated until half of the original data records have been fingered (that is, identified).  The unfingered records are then deleted from Book1's Data worksheet.

Then another copy of the original Data and CCs worksheets is made, and placed in a second new workbook, which we may call "Book2" for purposes of this discussion.

Next, the data records known to reside in Book1's Data worksheet are deleted from Book2's Data worksheet, and we end up with two essentially random samples of the original data, leaving the original untouched.

When the number of data records in the original Data worksheet is not an even number, Book1 will have one more data record in it than Book2.

How to generate a smaller random sample of data records?  Halve&Hold always creates halves, workbooks whose Data worksheets have 50% of the records in the original Data worksheet.  To get a sample with 25%, run Halve&Hold again, using one of the 50% samples -- for example, if Book1 contains 50% of the original Data records, run Halve&Hold with Book1 to get two new random samples, each with 25% of the original Data records.

Who uses Halve&Hold?  Researchers and teachers, often people who are going on to undertake some sort of IRT analysis.  At times one wants to have two samples of the original data; one of these might be used to calibrate an IRT model, with the second sample then used to validate the calibration.

Teachers might use Halve&Hold to demonstrate sampling variance -- how do Lertap's scores and item statistics vary as we compare one of the samples with the other?

Time trials, September 2003, on a Pentium 4 running at 2 GHz: with 3,000 original records, the two halves were created in 18.8 seconds.  With a bit over 11,000 original records, the two halves were ready in 4 minutes 18.4 seconds.