JCL

XSUM – Remove Duplicate Records using SORT

XSUM Control Statement captures the dropped duplicate records which are eliminated out of DFSORT in a separate file. The SUM Control Statement deletes records with equal control fields and optionally summarizes specified numeric fields on those records. The SUM control statement specifies that, whenever two records are found with equal sort or merge control fields, the contents of their summary fields are to be added, the sum is to be placed in one of the records, and the other record is to be deleted.

1. If the EQUALS option is in effect the first record of summed records is kept.

2. If the NOEQUALS option is in effect, the record to be kept is unpredictable.

3. If the ZDPRINT option is in effect, positive summed ZD values are printable.

4. If the NZDPRINT option is in effect, positive summed ZD values are not printable.

The way in which DFSORT processes short SUM summary fields depends on whether the VLSHRT or NOVLSHRT option is in effect. A short field is one where the variable-length record is too short to contain the entire field; that is, the field extends beyond the record. For details about sorting, merging, and summing short records.

Syntax:    

The format of the SUM statement is:

   SUM {FIELDS={NONE}                         }    
   SUM {FIELDS=(p1,l1,f1 {,p2,l2,f2) ... )    }   
   SUM {FIELDS=(p1,l1 {,p2,l2) ... ),FORMAT=f } {,XSUM}

Where XSUM means the dropped records are written in the dataset specified by SORTXSUM data set.

//SORTXSUM DD DSN=XXXXXX.OUTPUT.SORTOUT,
//         DISP=OLD

SORTXSUM: Output file for a SORT or MERGE function.  The records eliminated by SUM processing are written to the SORTXSUM DD.

SUM FIELDS=(5,5,ZD,12,6,PD,21,3,PD,35,7,ZD)
SUM FORMAT=ZD,FIELDS=(5,5,12,6,PD,21,3,PD,35,7)
SUM FIELDS=(5,5,ZD,12,6,21,3,35,7,ZD),FORMAT=PD
Summary Field FormatsLengthDescription
BI2, 4, or 8 bytesUnsigned binary
FI2, 4, or 8 bytesSigned fixed-point
FL4, 8, or 16 bytesSigned hexadecimal floating-point
PD1 to 16 bytesSigned packed decimal
ZD1 to 31 bytesSigned zoned decimal

XSUM examples

In below example SORTXSUM will contain the duplicate records present in the start position 5 for 4 bytes. Unique record will be written to SORTOUT file.

//STEP01 EXEC PGM=SORT
//SYSOUT     DD SYSOUT=*
//SORTIN     DD DSN=INPUT.FILE1,DISP=SHR
//SORTOUT    DD DSN=INPUT.FILE1,
//           DISP=(NEW,CATLG,DELETE),UNIT=3390, 
//           SPACE=(CYL,(5,1)),DCB=(LRECL=22)
//SORTXSUM   DD DSN=INPUT.FILE1,
//           DISP=(NEW,CATLG,DELETE),UNIT=3390,
//           SPACE=(CYL,(5,1)),DCB=(LRECL=22)
//SYSIN      DD *
  SORT FIELDS=(5,4,CH,A)
  SUM FIELDS=NONE,XSUM
/*

SORTXSUM will contain the duplicate records present in the start position 5 for 4 bytes for record having value of ‘XYZ’ in position 45. Unique record having value of ‘XYZ’ in position 45 will be written to SORTOUT file.

//SYSIN      DD *
  SORT FIELDS=(5,4,CH,A)
  INCLUDE COND=(45,3,CH,EQ,C'XYZ')
  SUM FIELDS=NONE,XSUM
/*

With SUM FIELDS=NONE and EQUALS in effect, DFSORT eliminates “duplicate records” by writing the first record with each key to the SORTOUT data set and deleting subsequent records with each key. A competitive product offers an XSUM operand that allows the deleted duplicate records to be written to a SORTXSUM dataset. While DFSORT does not support the XSUM operand, DFSORT does provide the equivalent function and a lot more with the SELECT operator of ICETOOL. SELECT lets you put the records that are selected in the TO data set and the records that are not selected in the DISCARD data set. So an ICETOOL SELECT job to do the XSUM function might look like this:

//XSUM JOB ...
//DOIT EXEC PGM=ICETOOL
//TOOLMSG DD SYSOUT=*
//DFSMSG DD SYSOUT=*
//IN DD DSN=... input data set
//OUT DD DSN=... first record with each key
//SORTXSUM DD DSN=... subsequent records with each key
//TOOLIN DD * 
  SELECT FROM(IN) TO(OUT) ON(1,3,CH) FIRST DISCARD(SORTXSUM)
/*

This will put the first occurrence of each ON field (sort key) in the OUT data set and the rest of the records in the SORTXSUM data set.

If IN contained the following records:

If IN contained the following records:
 J03 RECORD 1
 M72 RECORD 1
 M72 RECORD 2
 J03 RECORD 2
 A52 RECORD 1
 M72 RECORD 3
 
OUT would contain the following records:
 A52 RECORD 1
 J03 RECORD 1
 M72 RECORD 1
 
SORTXSUM would contain the following records:
 J03 RECORD 2
 M72 RECORD 2
 M72 RECORD 3

SELECT also allows you to use multiple ON fields (that is, multiple keys), and DFSORT control statements (for example, INCLUDE or OMIT), such as in this SELECT statement:

//TOOLIN DD *
  SELECT FROM(IN) TO(OUT1) ON(1,3,CH) ON(25,3,PD) FIRST -
  DISCARD(XSUM1) USING(CTL1)
//CTL1CNTL DD *
  INCLUDE COND=(11,7,CH,EQ,C'PET-RAT')
/*

And SELECT can do much more than that. Besides FIRST, it also lets you use FIRST(n), FIRSTDUP, FIRSTDUP(n), LAST, LASTDUP, ALLDUPS, NODUPS, HIGHER(x), LOWER(y) and EQUAL(v). You can use TO(outdd) alone, DISCARD(savedd) alone, or TO(outdd) and DISCARD(savedd) together, for any of these operands. So you can create data sets with just the selected records, just non-selected records, or with both the selected records and non-selected records, for all of these cases. Here’s a few more SELECT statements to show some of its capabilities:

* Put duplicates in DUPS and non-duplicates in NODUPS

SELECT FROM(DATA) TO(DUPS) ON(5,8,CH) ALLDUPS DISCARD(NODUPS)

* Put records with 5 occurrences (of the key) in EQ5

SELECT FROM(DATA) TO(EQ5) ON(5,8,CH) EQUAL(5)

* Put records with more than 3 occurrences (of the key) in GT3, and

* records with 3 or less occurrences in LE3.

SELECT FROM(DATA) TO(GT3) ON(5,8,CH) HIGHER(3) DISCARD(LE3)

* Put records with 9 or more occurrences in OUT2.

SELECT FROM(DATA) ON(5,8,CH) LOWER(9) DISCARD(OUT2)

* Put last of each set of duplicates in DUP1

SELECT FROM(DATA) TO(DUP1) ON(5,8,CH) LASTDUP
Admin

Share
Published by
Admin

Recent Posts

Increase Transparency and Collaboration Product Backlog

A well-maintained product backlog is crucial for successful product development. It serves as a single…

3 weeks ago

Product Backlog – Incremental value to the customer

Incremental value to the customer refers to the gradual delivery of small, functional parts of…

4 weeks ago

Product Market, Customer’s Desire, Need, and Challenges

A Product Market refers to the group of potential customers who might be interested in…

4 weeks ago

PAL-EBM Professional Agile Leadership – EBM Certification

The Professional Agile Leadership - Evidence-Based Management (PAL-EBM) certification offered by Scrum.org is designed for…

4 months ago

PAL I Professional Agile Leadership Certification

The Professional Agile Leadership (PAL I) certification, offered by Scrum.org, is designed to equip leaders…

4 months ago

Scrum Master Certification: CSM, PSM, SSM

Choosing the right Scrum Master Certification depends on your current experience and career goals. If…

6 months ago