DB2

SQL Query Optimization: Tips & Tricks

SQL Query optimization is a process of writing optimized SQL queries to improve database performance. It is one of the factors that affect application performance. When working with large-scale data, even the most minor change can have a dramatic impact on performance.

Top 10 ways to improve performance

Indexing: Proper indexing ensures optimal access to the database.
Column Selection: Specify the columns in SELECT query instead of SELECT* to avoid extra fetching load on the database.
Fetching data/Rows return: Always fetch limited data by coding predicate with WHERE clause.
Avoid multiple SQL requests: Minimize multiple SQL requests, try to accomodate in few requests.
Matching and Sorting records: Use EXITS() for matching if the record exists. Try to avoid sorting records in query.
Subqueries: Avoid correlated sub queries as it searches row by row, impacting the speed of SQL query processing.
Wildcards: Use wildcards (e.g. %xx%) wisely as they search the entire database for matching results.
Scalar & Arithmetic Functions: Avoid using scalar and arithmatic functions.
Loading: Use a temporary table to handle bulk data. Avoid tablespace scans.
Selecting Rows: Use the clause WHERE instead of HAVING for primary filters, ‘Fetch First xx Rows’, ‘Update where Current of Cursor’, ‘Delete Where Current of Cursor’, and ‘RIDS’, OPTIMIZE for n ROWS etc.

Let’s review this section for details about SQL query optimization considerations that can help you to maximize the performance of database applications.

SQL Query Optimization – Column Selection

Minimize the number of columns retrieved and/or updated. Extra columns increase the row size of the result set & the result will increase disk I/O and degrade performance. In addition, using SELECT * may prevent the use of covering indexes, further potentially hurting query performance. It can impact

The optimizer choosing ‘Index only’ access i.e. retrieving very few columns can encourage index-only access.
Expensiveness of any sorts.
Optimizer’s choice of join methods.

SQL Query Optimization – Rows Returned

Minimize the number of rows searched and/or returned. Code predicates to limit the result to only the rows needed. Avoid generic queries that do not have a WHERE clause.

SQL Query Optimization – Unnecessary SQL & multiple SQL requests

Consider accomplishing as much as possible with a single call, rather than multiple calls. Avoid unnecessary execution of SQL.

This is huge in the performance tuning of programs, especially batch programs because they tend to process more data. Every time an SQL call is sent to the database manager, there is overhead in sending the SQL statement to DB2, going from one address space in the operating system to the DB2 address space for SQL execution.

In general, developers need to minimize:

The number of time cursors are Opened/Closed
The number of random SQL requests (noted as synchronized reads in DB2 monitors).

SQL Query Optimization – Singleton SELECT vs Cursor

If a single row is returned – Singleton SELECT .. INTO

Outperforms a Cursor
Error when more than 1 row is returned
The row can be updated by another program after the singleton SELECT but before the subsequent UPDATE, causing a possible data integrity issue.
When a SELECT statement is used only for data retrieval – use FOR FETCH ONLY. FOR READ ONLY clause provides the same function.

If multiple rows are returned – Cursor

Requires overhead of OPEN, FETCH, and CLOSE
For Row Update: When the selected row must be retrieved first– Use FOR UPDATE OF clause with a CURSOR.

SQL Query Optimization – Sorting

Sorts can be expensive. At times an SQL query may execute multiple sorts in order to get the result set back as needed. Take a look at the DB2 to explain the tool to see if any sorting is taking place, then take a look at the SQL statement and determine if anything can be done to eliminate sorts.

DISTINCT – always results in a sort. If duplicates are to be eliminated from the result set, try: ‘Group By’ which looks to take advantage of any associated indexes to eliminate a sort for uniqueness. Rewriting the query using an ‘In’ or ‘Exists’ subquery. This will work if the table causing the duplicates (due to a one to many relationship) does not have data being returned as part of the result set.

SELECT DISTINCT E.EMPNO, E.LASTNAME 
FROM EMP E, EMPPROJACT EPA 
WHERE E.EMNO = EPA.EMPNO

Can be written as
 
SELECT E.EMPNO, E.LASTNAME
FROM EMP E, EMPPROJACT EPA 
WHERE E.EMPNO = EPA.EMPNO
GROUP BY E.EMPNO, E.LASTNAME

SELECT E.EMPNO, E.LASTNAME
FROM EMP E 
WHERE E.EMNPO IN
(SELECT EPA.EMPNO 
FROM EMPPROJACT EPA) 

SELECT E.EMPNO, E.LASTNAME
FROM EMP E 
WHERE EXISTS
(SELECT 1
FROM EMPPROJACT EPA
WHERE EPA.EMPNO = E.EMPNO)

UNION – always results in a sort.
UNION ALL – does not sort, but retains any duplicates.
ORDER BY – may be faster if columns are indexed & use it to guarantee the sequence of the data.
GROUP BY – Specify only columns that need to be grouped, may be faster if the columns are indexed & do not include extra columns in SELECT list or GROUP BY because DB2 must sort the rows.
Use very few conditions in CASE statement & avoid to use more WHEN conditions with in CASE statement.

SELECT EMPNO, FIRSTNME, MIDINIT, LASTNAME, 
CASE 
 WHEN EDLEVEL < 15 THEN 'SECONDARY' 
 WHEN EDLEVEL < 19 THEN 'COLLEGE' 
 ELSE 'POST GRADUATE' 
END 
FROM EMPLOYEE

SQL Query Optimization – Sub-selects

DB2 processes the sub-select (inner select) first before the outer select
You may be able to improve performance of complex queries by coding a complex predicate in a sub-select
Applying the predicate in the sub-select may reduce the number of rows returned

SQL Query Optimization – Inline Views

Inline views allow the FROM clause of a SELECT statement to contain another SELECT statement
May enhance performance of the outer select by applying predicates in the inner select
Useful when detail and aggregated data must be returned in a single query

SQL Query Optimization – Indexes

Create indexes for columns you frequently:

ORDER BY
GROUP BY (better than a DISTINCT)
SELECT DISTINCT
JOIN

SQL Query Optimization – Data Conversion

When comparing column values to host variables – use the same
Data Type
Length
When DB2 must convert data, available indexes are sometimes not used.

SQL Query Optimization -JOIN Predicates

Response time -> determined mostly by the number of rows participating in the join
Provide accurate join predicates
Never use a JOIN without a predicate
Join ON indexed columns
Use Joins over subqueries
When the results of a join must be sorted –
limiting the ORDER BY to columns of a single table can avoid a sort
specifying columns from multiple tables causes a sort
Favor coding explicit INNER and LEFT
OUT joins over RIGHT OUTER joins
EXPLAIN converts RIGHT to LEFT join

SELECT emp.empno, emp.lastname, dept.deptname
FROM emp LEFT OUTER JOIN dept
ON emp.workdept = dept.deptno
WHERE emp.salary > 50000.00
WITH UR;

Works correctly but the outer join is performed first before any rows are filtered out.

SELECT emp.empno, emp.lastname, dept.deptname
FROM (SELECT empno, lastname
FROM emp WHERE salary > 50000.00) as e
LEFT OUTER JOIN dept
ON emp.workdept = dept.deptno
WITH UR;

Works better, applies the inner join predicates first, reducing the number of rows to be joined.

SQL Query Optimization – OR vs. UNION

OR requires Stage 2 processing.
Consider rewriting the query as the union of 2 SELECTs, making index access possible.
UNION ALL avoids the sort, but duplicates are included.
Monitor and EXPLAIN the query to decide which is best.

SQL Query Optimization – BETWEEN Clause

BETWEEN is usually more efficient than <= predicate and the >= predicate except when comparing a host variable to 2 columns.

Stage 1: WHERE Col1 <= :hostvar AND col2 >= :hostvar 
Stage 2: WHERE :hostvar BETWEEN col1 and col2

SQL Query Optimization – IN Instead of Like

If you know that only a certain number of values exist and can be put in a list

Use IN or BETWEEN
IN (‘Value1’, ‘Value2’, ‘Value3’)
BETWEEN :valuelow AND :valuehigh
Rather than: LIKE ‘Value_’
Avoid the % or the _ at the beginning because it prevents DB2 from using a matching index and may cause a scan.
Use the % or the _ at the end to encourage index usage.

SQL Query Optimization – NOT

Predicates formed using NOT are Stage 1 but they are not indexable. So for Subquery when using negation logic: Use NOT Exists (DB2 tests non-existence) instead of NOT IN (DB2 must materialize the complete result set).

Preferable not to use NOT clause in WHERE clause, rather less than/greater than/less than equals/greater than equals. e.g. WHERE HIREDATE <= :WS-DATE

SQL Query Optimization – EXISTS

Use EXISTS to test for a condition and get a True or False returned by DB2 and not return any rows to the query:

SELECT col1 FROM table1 
WHERE EXISTS
(SELECT 1 FROM table2 
WHERE table2.col2 = table1.col1)

SQL Query Optimization – Arithmetic in Predicates

An index is not used for a column when the column is in an arithmetic expression. It can be Stage 1 but not indexable.

For example:

SELECT EMPNO, LASTNAME 
FROM EMPLOYEE 
WHERE SALARY * 1.1 > 50000.00

Should be coded as:

SELECT EMPNO, LASTNAME 
FROM EMPLOYEE 
WHERE SALARY > 50000.00 / 1.1

SQL Query Optimization – Scalar Function

Scalar functions are not indexable but you can use scalar functions to offload work from the application program.

Examples:

DATE functions
SUBSTR
CHAR

SELECT EMPNO, LASTNAME 
FROM EMPLOYEE 
WHERE YEAR(HIREDATE) = 2005

Should be coded as:

SELECT EMPNO, LASTNAME 
FROM EMPLOYEE 
WHERE HIREDATE BETWEEN ‘2005-01-01’ and ‘2005-12-31’

SQL Query Optimization – OPTIMIZE for n ROWS

For online applications, use ‘With OPTIMIZE for n ROWS’ to attempt to influence the access path DB2 chooses without this clause, DB2 chooses the best access path for batch processing With this clause, DB2 optimizes for a quicker response for online processing. Try Optimize for 1, for 10, for 100.

SQL Query Optimization – Stage 1 vs Stage 2 Predicates

Always try to code predicates as Stage 1 and indexable. In general, Stage 2 predicates do not perform as well and consume extra CPU.

Stage 1 (DB2 Data Manager) is responsible for translating the data stored on pages into a result set of rows and columns. Predicates that are written in a fairly straightforward way can usually be evaluated by the Data Manager with relatively little expense.
Stage 2 (Relational Data Services) handle more complex predicates, data transformations, and computations. These Stage 2 predicates are much more expensive for DB2 to resolve than Stage 1 due to additional processing and additional code path. Additionally, RDS cannot make effective use of indexes.

SQL Query Optimization – ROWSET positioning (multi-row fetch, multi-row update, and multi-row insert)

Db2 supports the manipulation of multiple rows on fetches, updates, and insert processing. Now having the ability to fetch, update, or insert more than 1 row at a time reduces network traffic and other related costs associated with each call to DB2. The recommendation is to start with 100-row fetches, inserts, or updates, and then test other numbers. It has been proven many times that this process reduces runtime on average by 35%. Consult the IBM DB2 manuals for further detail and coding examples.

SQL Query Optimization – Tablespace scans

The predicate(s) may be poorly coded in a non-indexable way.
The predicates in the query do not match any available indexes on the table. Should have indexes even on small tables.
The table could be small, and DB2 decides a tablespace scan may be faster than index processing.
The catalog statistics say the table is small, or maybe there are no statistics on the table.
The predicates are such that DB2 thinks the query is going to retrieve a large enough amount of data that would require a tablespace scan.
The predicates are such that DB2 picks a non-clustered index, and the number of pages to retrieve is high enough based on total number of pages in the table to require a tablespace scan.
The tablespace file or index files could physically be out of shape and need a REORG.

SQL Query Optimization – Non-Existence

When coding logic to determine what rows in a table do not exist in another table, there are a couple of common approaches. One approach is to code outer join logic and then check ‘Where D.MGRNO IS NULL’ from the other table, or coding ‘Not Exists’ logic. The following 2 examples both bring back employees that are not managers on the department table, yet the 2ndone is most often the more efficient. The DB2 Visual Explain tool shows by each predicate when filtering is accomplished.

Example 1:

SELECT E.EMPNO, E.LASTNAME
FROM EMPLOYEE E LEFT JOIN 
DEPARTMENT D ON D.MGRNO = E.EMPNO 
WHERE D.MGRNO IS NULL

Example 2:

SELECT E.EMPNO, E.LASTNAME 
FROM EMPLOYEE E 
WHERE NOT EXISTS 
(SELECT 1 
FROM DEPARTMENT D 
WHERE D.MGRNO = E.EMPNO)

SQL Query Optimization – ‘Update where Current of Cursor’, ‘Delete Where Current of Cursor’, and ‘RIDS’.

What to do when the cursor is ‘Read Only’:

Try defining a ‘Dynamic’ scrollable cursor. This at times allows for an ‘Order By’ in the cursor definition, while allowing a ‘Delete Where Current of Cursor’ statement within the processing.
Fetch the ROWID (RID as of V9) for each row being processed in the ‘Read Only’ cursor, and execute all update or delete statements using the ROWID/RID value in place of the key fields for better performance.
Try defining a Scrollable cursor. This will materialize the cursor’s data into a workfile but then allows for cursor updating. DECLARE SENSITIVE, FETTCH SENSITIVE
Take advantage of the V8 ‘NOFOR’ pre-compile option. In static SQL, eliminates the need for the FOR UPDATE or FOR UPDATE OF clause in DECLARE CURSOR statements. When you use NOFOR, your program can make positioned updates to any columns that the program has DB2 authority to update.

SQL Query Optimization – Left Outer Joins vs Right Outer Joins

When coding outer join logic, it does not matter whether the developer codes a ‘Left Outer Join’ or a ‘Right Outer Join’ in order to get the logic correct, as long as they have the starting ‘Driver’ table coded correctly. There is no difference between a Left and Right outer join other than where the starting ‘Driver’ is coded. This is not really a tuning tip, but rather a tip to help all developers understand that left outer joins are more readable.

Developers in DB2 should only code ‘Left Outer Joins’. It is more straightforward because the starting ‘Driver’ table is always coded first, and all subsequent tables being joined to have ‘Left Outer Join’ coded beside them, making it more understandable and readable.

Suboptimal SQL statement:

SELECT TAB_A.COL1, TAB_B.COL1 
FROM TAB_A 
LEFT OUTER JOIN TAB_B 
ON TAB_A.COL3 = TAB_B.COL3 
WHERE TAB_A.COL1=123 
  AND TAB_B.COL2=456;

Optimized SQL statement:

SELECT TAB_A.COL1,TAB_B.COL1 
FROM TAB_A 
LEFT OUTER JOIN TAB_B 
ON TAB_A.COL3 = TAB_B.COL3
AND TAB_B.COL2=456  
WHERE TAB_A.COL1=123;

Predicates for any INNER joins can stay in the WHERE clause. If tables TAB_A and TAB_B are defined as views, the optimizer can push these predicates into the views.

SQL Query Optimization – ‘Fetch First xx Rows’

SELECT EMPNO, LASTAME, SALARY, DEPTNO 
FROM EMP 
WHERE SALARY BETWEEN 50000.00 and 100000.00 
FETCH FIRST 25 ROWS ONLY

If the optimizer knows exactly what you intend to retrieve it can make decisions based on that fact, and often times optimization will be different based on this known fact than if it was not coded, and the program just quit processing after the first 25.

Use FETCH 1 ROW ONLY in SQL queries where you want to fetch the count of all rows i.e. COUNT (*). Also, add check for SQLCODE

SELECT COUNT(*)
FROM EMPLOYEE
WHERE EMPLOYEE_ID <= 30100
AND SQLCODE = 0
FETCH 1 ROW ONLY;

Read DB2 blogs : Click Here IBM DB2 Manual :Click Here

Admin

Next Agile Coach Interview Questions and Answers - I »

Previous « DB2 Predicates : Performance Tuning of DB2 Queries

Published by

Admin

Tags: DB2 OptimizationDB2 Query OptimizationQuery OptimizationSQL Query Best PracticesSQL Query OptimizationSQL Query Standards

5 years ago

OUTER JOIN – step by step walkthrough with examples
The JOIN operation allows you to combine rows from two or more tables based on…
OUTFIL control statement in SORT JCL
The OUTFIL control statement describes the output file(s) and the processing to be done on…
Conditional Expressions use in COBOL
Conditional expressions identify conditions that are tested to enable the object program to select between…