Sorting Huge Datasets with Limited Disc Space


When you are selecting and sorting records from a large file, Suprtool does not know how many records might be selected by the If command, so it has to reserve enough sort scratch space in case every record is selected. On a system with an insufficient amount of free transient disc space, there may not be enough space for this temporary scratch file to be built. Suprtool allows you to reduce the size of the scratch file using the Numrecs command, but you have to know what value to use, either a specific number of records or a percentage of the input file. What do you do if you cannot predict how many records may be selected?

One approach, in use at Redwood Health Services in Santa Rosa, CA, is to forget about Numrecs, and just use two Suprtool passes. The first pass selects the records without sorting them, putting the selected records into a file, then the second pass sorts the file. By default, the sort scratch file in the second pass is built exactly large enough for the number of records being read.

    :run suprtool.pub.robelle
   >base mybase
   >get bigdataset
   >if {...selection criteria...}
   >output myfile
   >xeq
   >input myfile
   >sort key-field
   >output=input
   >xeq
The resulting file, myfile, has the selected records in the desired sequence.

[Mike Shumko]

....Back to the Suprtool Q&A Page