JCL

XSUM – Remove Duplicate Records using SORT

XSUM Control Statement captures the dropped duplicate records which are eliminated out of DFSORT in a separate file. The SUM Control Statement deletes records with equal control fields and optionally summarizes specified numeric fields on those records. The SUM control statement specifies that, whenever two records are found with equal sort or merge control fields, the contents of their summary fields are to be added, the sum is to be placed in one of the records, and the other record is to be deleted.

1. If the EQUALS option is in effect the first record of summed records is kept.

2. If the NOEQUALS option is in effect, the record to be kept is unpredictable.

3. If the ZDPRINT option is in effect, positive summed ZD values are printable.

4. If the NZDPRINT option is in effect, positive summed ZD values are not printable.

The way in which DFSORT processes short SUM summary fields depends on whether the VLSHRT or NOVLSHRT option is in effect. A short field is one where the variable-length record is too short to contain the entire field; that is, the field extends beyond the record. For details about sorting, merging, and summing short records.

Syntax:

The format of the SUM statement is:

   SUM {FIELDS={NONE}                         }    
   SUM {FIELDS=(p1,l1,f1 {,p2,l2,f2) ... )    }   
   SUM {FIELDS=(p1,l1 {,p2,l2) ... ),FORMAT=f } {,XSUM}

Where XSUM means the dropped records are written in the dataset specified by SORTXSUM data set.

//SORTXSUM DD DSN=XXXXXX.OUTPUT.SORTOUT,
//         DISP=OLD

SORTXSUM: Output file for a SORT or MERGE function. The records eliminated by SUM processing are written to the SORTXSUM DD.

SUM FIELDS=(5,5,ZD,12,6,PD,21,3,PD,35,7,ZD)
SUM FORMAT=ZD,FIELDS=(5,5,12,6,PD,21,3,PD,35,7)
SUM FIELDS=(5,5,ZD,12,6,21,3,35,7,ZD),FORMAT=PD

Summary Field Formats	Length	Description
BI	2, 4, or 8 bytes	Unsigned binary
FI	2, 4, or 8 bytes	Signed fixed-point
FL	4, 8, or 16 bytes	Signed hexadecimal floating-point
PD	1 to 16 bytes	Signed packed decimal
ZD	1 to 31 bytes	Signed zoned decimal

XSUM examples

In below example SORTXSUM will contain the duplicate records present in the start position 5 for 4 bytes. Unique record will be written to SORTOUT file.

//STEP01 EXEC PGM=SORT
//SYSOUT     DD SYSOUT=*
//SORTIN     DD DSN=INPUT.FILE1,DISP=SHR
//SORTOUT    DD DSN=INPUT.FILE1,
//           DISP=(NEW,CATLG,DELETE),UNIT=3390, 
//           SPACE=(CYL,(5,1)),DCB=(LRECL=22)
//SORTXSUM   DD DSN=INPUT.FILE1,
//           DISP=(NEW,CATLG,DELETE),UNIT=3390,
//           SPACE=(CYL,(5,1)),DCB=(LRECL=22)
//SYSIN      DD *
  SORT FIELDS=(5,4,CH,A)
  SUM FIELDS=NONE,XSUM
/*

SORTXSUM will contain the duplicate records present in the start position 5 for 4 bytes for record having value of ‘XYZ’ in position 45. Unique record having value of ‘XYZ’ in position 45 will be written to SORTOUT file.

//SYSIN      DD *
  SORT FIELDS=(5,4,CH,A)
  INCLUDE COND=(45,3,CH,EQ,C'XYZ')
  SUM FIELDS=NONE,XSUM
/*

With SUM FIELDS=NONE and EQUALS in effect, DFSORT eliminates “duplicate records” by writing the first record with each key to the SORTOUT data set and deleting subsequent records with each key. A competitive product offers an XSUM operand that allows the deleted duplicate records to be written to a SORTXSUM dataset. While DFSORT does not support the XSUM operand, DFSORT does provide the equivalent function and a lot more with the SELECT operator of ICETOOL. SELECT lets you put the records that are selected in the TO data set and the records that are not selected in the DISCARD data set. So an ICETOOL SELECT job to do the XSUM function might look like this:

//XSUM JOB ...
//DOIT EXEC PGM=ICETOOL
//TOOLMSG DD SYSOUT=*
//DFSMSG DD SYSOUT=*
//IN DD DSN=... input data set
//OUT DD DSN=... first record with each key
//SORTXSUM DD DSN=... subsequent records with each key
//TOOLIN DD * 
  SELECT FROM(IN) TO(OUT) ON(1,3,CH) FIRST DISCARD(SORTXSUM)
/*

This will put the first occurrence of each ON field (sort key) in the OUT data set and the rest of the records in the SORTXSUM data set.

If IN contained the following records:

If IN contained the following records:
 J03 RECORD 1
 M72 RECORD 1
 M72 RECORD 2
 J03 RECORD 2
 A52 RECORD 1
 M72 RECORD 3
 
OUT would contain the following records:
 A52 RECORD 1
 J03 RECORD 1
 M72 RECORD 1
 
SORTXSUM would contain the following records:
 J03 RECORD 2
 M72 RECORD 2
 M72 RECORD 3

SELECT also allows you to use multiple ON fields (that is, multiple keys), and DFSORT control statements (for example, INCLUDE or OMIT), such as in this SELECT statement:

//TOOLIN DD *
  SELECT FROM(IN) TO(OUT1) ON(1,3,CH) ON(25,3,PD) FIRST -
  DISCARD(XSUM1) USING(CTL1)
//CTL1CNTL DD *
  INCLUDE COND=(11,7,CH,EQ,C'PET-RAT')
/*

And SELECT can do much more than that. Besides FIRST, it also lets you use FIRST(n), FIRSTDUP, FIRSTDUP(n), LAST, LASTDUP, ALLDUPS, NODUPS, HIGHER(x), LOWER(y) and EQUAL(v). You can use TO(outdd) alone, DISCARD(savedd) alone, or TO(outdd) and DISCARD(savedd) together, for any of these operands. So you can create data sets with just the selected records, just non-selected records, or with both the selected records and non-selected records, for all of these cases. Here’s a few more SELECT statements to show some of its capabilities:

* Put duplicates in DUPS and non-duplicates in NODUPS

SELECT FROM(DATA) TO(DUPS) ON(5,8,CH) ALLDUPS DISCARD(NODUPS)

* Put records with 5 occurrences (of the key) in EQ5

SELECT FROM(DATA) TO(EQ5) ON(5,8,CH) EQUAL(5)

* Put records with more than 3 occurrences (of the key) in GT3, and

* records with 3 or less occurrences in LE3.

SELECT FROM(DATA) TO(GT3) ON(5,8,CH) HIGHER(3) DISCARD(LE3)

* Put records with 9 or more occurrences in OUT2.

SELECT FROM(DATA) ON(5,8,CH) LOWER(9) DISCARD(OUT2)

* Put last of each set of duplicates in DUP1

SELECT FROM(DATA) TO(DUP1) ON(5,8,CH) LASTDUP

Admin

Next SQL Predicates: use in COBOL DB2 Queries »

Previous « Split dataset using Mainframe SORT utility

Published by

Admin

Tags: Remove Duplicate RecordsSORTXSUMSUMSUM FIELDS=NONEXSUM

6 years ago

OUTER JOIN Queries: Common Errors and Resolutions
OUTER JOIN Queries are a valuable tool in SQL, allowing you to retrieve data from…
OUTER JOIN – step by step walkthrough with examples
The JOIN operation allows you to combine rows from two or more tables based on…
OUTFIL control statement in SORT JCL
The OUTFIL control statement describes the output file(s) and the processing to be done on…