Sort

PURPOSE:  The SORT command arranges all the records or selected records in the current Data Set (DS) on the basis of data in the field(s) specified.  The sorted records can be stored on the current DS or written to a separate DS so that the current DS is unchanged.

Syntax

SORT ON \\[-]fields\\

[[IF; UNLESS clause] TO ds_name [IN DBL dbl_name]]

[SHOWING clause]

[VIA clause]

[STOP; END clause]

[/BY RECORD/; BY KEY]

[IN n PAGES]

[n WAY]

\\[-]fields\\

is one or more field names, separated by commas, specifying the sort sequence with the major field first.  The sort is normally in ascending order.  To sort by descending order, precede the field name with a minus sign [-].

IF; UNLESS clause

limits the sort to records meeting the condition(s) specified, If a conditional clause is used, a TO ds_name clause must also be used.  The TO ds_name can be the current DS, in which case the current DS is replaced by the sorted records meeting the specified condition(s).  Described in detail in Chapter 7.

TO ds_name

specifies the destination of the sorted data.  If TO ds_name is omitted, the sorted data will overwrite the current DS.  If TO ds_name is included, the destination DS must already exist.  The current DS and destination DS must have identical Schema Definitions (SD).  If the destination DS already contains data, the SORT command will clear it.

IN DBL dbl_name

is used to send sorted data to a DS in another Data base Library (DBL).

SHOWING clause

displays (or saves) the specified fields in the order in which they appear after the sort . Described in detail in Chapter 7.

VIA clause

described in detail in Chapter 7.  If a Process Module (PM) is called by the SORT command, an output DS must be named, and any changes made in the PM will be applied only to the output DS.  The sort of the keys is completed, then the processing is done as the records are being written out in sorted order.  Described in detail in Chapter 7.

STOP; END clause

described in detail in Chapter 7.

BY KEY

If the sort key is small in relation to the size of the record the sort will take less time if the BY KEY clause is specified.

BY RECORD

causes the record to be carried along with the key in the sort operations. BY RECORD is the default if no BY option is expressed.

IN n PAGES

determines the amount of memory allocated to the first pass of the sort.  The default is 4 pages, or 10, 240 characters.  Maximum is 100 pages.  The more pages allocated the faster the sort will be performed.

n WAY

determines how many strings are merged in each pass through the data.  The default is 4; maximum is 16.  Will effect efficiency.

EXAMPLE

*USE DS AUTHOR
*SORT ON AUTHOR_LNAME,STATE IF CONTRACT="1" TO CAUT &
SHOW AUTHOR_LNAME,STATE,ZIP

After the records for contracted authors (IF CONTRACT=1) are selected and sorted on LNAME, STATE, the fields LNAME, STATE, ZIP are displayed at the terminal in the new order.  The DS CAUT now contains the same records shown at the terminal, in the same order.  Meanwhile, the current DS, AUTHORS, has not been changed or affected in any way.

NOTES:  Fields can be used in the SORT command in any order; i.e., there is no restriction to the order in which they appear in the SD.

Note when sorting binary records: all fields of new binary records are initialized to null characters (@CHR 0).  A character field that has been set to blanks will sort differently than a field which has not received any data at all.

The sequence of the sorted records can be affected by the ENABLE SEQUENCE LOWER command.  By default, ACCENT R considers the upper and lower case of a given letter as identical, and would produce the following sort:

Mackay, B

MacKay, T

MacLeon, M

MacTavish, C

To obtain a true ASCII sort, i.e., all upper case letters preceding all lower case letters, give the command ENABLE SEQUENCE LOWER before sorting.  The four names would then be sorted:

MacKay, T

MacLeon, M

MacTavish, C

Mackay, B

The default sorting algorithm in ACCENT R has been designed to use the minimum amount of memory.  The options BY RECORD; BY KEY, IN n PAGES and n WAY allow the amount of memory allocated to be varied to the sort and to control its efficiency.  As a general rule, allocating more memory speeds up the operation.  The following simplified explanation of how default sorting works explains where the three options come into play.

When the SORT command is given, ACCENT R copies out the key fields and addresses for the records in blocks, or pages, of 2560 characters.  Each block is sorted into one string.  In the second pass, ACCENT R merges these strings, four at a time.  Thus, if the first pass creates 64 strings, the next pass creates 16 strings.  The third pass creates four strings.  On the final pass, when these four strings are merged, the whole record is read by using the records address and copied to the output phase.

With the BY RECORD option [default], the initial copy will take the whole record instead of just the key fields into the sort phase.  With small records it is more efficient to sort the entire records rather than just the key.  On large records you should only sort the keys then the sort will rearrange the records at the end of the sort operation.

If IN n PAGES is set to a value other than 4, the first strings created are larger, in multiples of 2560.  For example, IN 8 PAGES would create initial strings of 20,479 characters each.

If n WAY is set to a value other than 4, each subsequent pass would merge that number of strings.  For example, if the option is 2 WAY, an initial set of 64 strings would be merged, 2 at a time, into 32 strings, then 2 at a time into 16 strings, and those 16 into 8 strings, etc.

The most efficient sorting algorithm depends on many factors, among them the size of the DS, the record length, the number of key fields, the order before sorting, and the order after sorting.  The defaults of all three options will serve for most applications.  However, a combination of three factors make it practical to examine the effect of these options on the tradeoffs between processing time and cost: if the DS contains several thousand records, if sorting is a frequent operation, and if the sorts tend to follow a repetitive pattern.

ENABLE CPU TRACE shows the CPU seconds used by each command.  By enabling CPU TRACE and then executing the normal pattern of SORT commands for the application, while varying the options discussed above, the most efficient and the most cost-effective algorithms for the sorts involved can determined.

SEE ALSO:  ENABLE CPU TRACE