SQL Query optimization is a process of writing optimized SQL queries to improve database performance. It is one of the factors that affect application performance. When working with large-scale data, even the most minor change can have a dramatic impact on performance.
Let’s review this section for details about SQL query optimization considerations that can help you to maximize the performance of database applications.
Minimize the number of columns retrieved and/or updated. Extra columns increase the row size of the result set & the result will increase disk I/O and degrade performance. In addition, using SELECT * may prevent the use of covering indexes, further potentially hurting query performance. It can impact
Minimize the number of rows searched and/or returned. Code predicates to limit the result to only the rows needed. Avoid generic queries that do not have a WHERE clause.
Consider accomplishing as much as possible with a single call, rather than multiple calls. Avoid unnecessary execution of SQL.
This is huge in the performance tuning of programs, especially batch programs because they tend to process more data. Every time an SQL call is sent to the database manager, there is overhead in sending the SQL statement to DB2, going from one address space in the operating system to the DB2 address space for SQL execution.
In general, developers need to minimize:
If a single row is returned – Singleton SELECT .. INTO
If multiple rows are returned – Cursor
Sorts can be expensive. At times an SQL query may execute multiple sorts in order to get the result set back as needed. Take a look at the DB2 to explain the tool to see if any sorting is taking place, then take a look at the SQL statement and determine if anything can be done to eliminate sorts.
SELECT DISTINCT E.EMPNO, E.LASTNAME FROM EMP E, EMPPROJACT EPA WHERE E.EMNO = EPA.EMPNO Can be written as SELECT E.EMPNO, E.LASTNAME FROM EMP E, EMPPROJACT EPA WHERE E.EMPNO = EPA.EMPNO GROUP BY E.EMPNO, E.LASTNAME SELECT E.EMPNO, E.LASTNAME FROM EMP E WHERE E.EMNPO IN (SELECT EPA.EMPNO FROM EMPPROJACT EPA) SELECT E.EMPNO, E.LASTNAME FROM EMP E WHERE EXISTS (SELECT 1 FROM EMPPROJACT EPA WHERE EPA.EMPNO = E.EMPNO)
SELECT EMPNO, FIRSTNME, MIDINIT, LASTNAME, CASE WHEN EDLEVEL < 15 THEN 'SECONDARY' WHEN EDLEVEL < 19 THEN 'COLLEGE' ELSE 'POST GRADUATE' END FROM EMPLOYEE
SQL Query Optimization – Sub-selects
SQL Query Optimization – Inline Views
SQL Query Optimization – Indexes
Create indexes for columns you frequently:
SQL Query Optimization – Data Conversion
SQL Query Optimization -JOIN Predicates
SELECT emp.empno, emp.lastname, dept.deptname FROM emp LEFT OUTER JOIN dept ON emp.workdept = dept.deptno WHERE emp.salary > 50000.00 WITH UR;
Works correctly but the outer join is performed first before any rows are filtered out.
SELECT emp.empno, emp.lastname, dept.deptname FROM (SELECT empno, lastname FROM emp WHERE salary > 50000.00) as e LEFT OUTER JOIN dept ON emp.workdept = dept.deptno WITH UR;
Works better, applies the inner join predicates first, reducing the number of rows to be joined.
SQL Query Optimization – OR vs. UNION
SQL Query Optimization – BETWEEN Clause
BETWEEN is usually more efficient than <= predicate and the >= predicate except when comparing a host variable to 2 columns.
Stage 1: WHERE Col1 <= :hostvar AND col2 >= :hostvar Stage 2: WHERE :hostvar BETWEEN col1 and col2
SQL Query Optimization – IN Instead of Like
If you know that only a certain number of values exist and can be put in a list
SQL Query Optimization – NOT
Predicates formed using NOT are Stage 1 but they are not indexable. So for Subquery when using negation logic: Use NOT Exists (DB2 tests non-existence) instead of NOT IN (DB2 must materialize the complete result set).
Preferable not to use NOT clause in WHERE clause, rather less than/greater than/less than equals/greater than equals. e.g. WHERE HIREDATE <= :WS-DATE
SQL Query Optimization – EXISTS
Use EXISTS to test for a condition and get a True or False returned by DB2 and not return any rows to the query:
SELECT col1 FROM table1 WHERE EXISTS (SELECT 1 FROM table2 WHERE table2.col2 = table1.col1)
SQL Query Optimization – Arithmetic in Predicates
An index is not used for a column when the column is in an arithmetic expression. It can be Stage 1 but not indexable.
For example:
SELECT EMPNO, LASTNAME FROM EMPLOYEE WHERE SALARY * 1.1 > 50000.00
Should be coded as:
SELECT EMPNO, LASTNAME FROM EMPLOYEE WHERE SALARY > 50000.00 / 1.1
SQL Query Optimization – Scalar Function
Scalar functions are not indexable but you can use scalar functions to offload work from the application program.
Examples:
SELECT EMPNO, LASTNAME FROM EMPLOYEE WHERE YEAR(HIREDATE) = 2005
Should be coded as:
SELECT EMPNO, LASTNAME FROM EMPLOYEE WHERE HIREDATE BETWEEN ‘2005-01-01’ and ‘2005-12-31’
SQL Query Optimization – OPTIMIZE for n ROWS
For online applications, use ‘With OPTIMIZE for n ROWS’ to attempt to influence the access path DB2 chooses without this clause, DB2 chooses the best access path for batch processing With this clause, DB2 optimizes for a quicker response for online processing. Try Optimize for 1, for 10, for 100.
SQL Query Optimization – Stage 1 vs Stage 2 Predicates
Always try to code predicates as Stage 1 and indexable. In general, Stage 2 predicates do not perform as well and consume extra CPU.
SQL Query Optimization – ROWSET positioning (multi-row fetch, multi-row update, and multi-row insert)
Db2 supports the manipulation of multiple rows on fetches, updates, and insert processing. Now having the ability to fetch, update, or insert more than 1 row at a time reduces network traffic and other related costs associated with each call to DB2. The recommendation is to start with 100-row fetches, inserts, or updates, and then test other numbers. It has been proven many times that this process reduces runtime on average by 35%. Consult the IBM DB2 manuals for further detail and coding examples.
SQL Query Optimization – Tablespace scans
SQL Query Optimization – Non-Existence
When coding logic to determine what rows in a table do not exist in another table, there are a couple of common approaches. One approach is to code outer join logic and then check ‘Where D.MGRNO IS NULL’ from the other table, or coding ‘Not Exists’ logic. The following 2 examples both bring back employees that are not managers on the department table, yet the 2ndone is most often the more efficient. The DB2 Visual Explain tool shows by each predicate when filtering is accomplished.
SELECT E.EMPNO, E.LASTNAME FROM EMPLOYEE E LEFT JOIN DEPARTMENT D ON D.MGRNO = E.EMPNO WHERE D.MGRNO IS NULL
SELECT E.EMPNO, E.LASTNAME FROM EMPLOYEE E WHERE NOT EXISTS (SELECT 1 FROM DEPARTMENT D WHERE D.MGRNO = E.EMPNO)
SQL Query Optimization – ‘Update where Current of Cursor’, ‘Delete Where Current of Cursor’, and ‘RIDS’.
What to do when the cursor is ‘Read Only’:
SQL Query Optimization – Left Outer Joins vs Right Outer Joins
When coding outer join logic, it does not matter whether the developer codes a ‘Left Outer Join’ or a ‘Right Outer Join’ in order to get the logic correct, as long as they have the starting ‘Driver’ table coded correctly. There is no difference between a Left and Right outer join other than where the starting ‘Driver’ is coded. This is not really a tuning tip, but rather a tip to help all developers understand that left outer joins are more readable.
Developers in DB2 should only code ‘Left Outer Joins’. It is more straightforward because the starting ‘Driver’ table is always coded first, and all subsequent tables being joined to have ‘Left Outer Join’ coded beside them, making it more understandable and readable.
SELECT TAB_A.COL1, TAB_B.COL1 FROM TAB_A LEFT OUTER JOIN TAB_B ON TAB_A.COL3 = TAB_B.COL3 WHERE TAB_A.COL1=123 AND TAB_B.COL2=456;
SELECT TAB_A.COL1,TAB_B.COL1 FROM TAB_A LEFT OUTER JOIN TAB_B ON TAB_A.COL3 = TAB_B.COL3 AND TAB_B.COL2=456 WHERE TAB_A.COL1=123;
Predicates for any INNER joins can stay in the WHERE clause. If tables TAB_A and TAB_B are defined as views, the optimizer can push these predicates into the views.
SQL Query Optimization – ‘Fetch First xx Rows’
SELECT EMPNO, LASTAME, SALARY, DEPTNO FROM EMP WHERE SALARY BETWEEN 50000.00 and 100000.00 FETCH FIRST 25 ROWS ONLY
If the optimizer knows exactly what you intend to retrieve it can make decisions based on that fact, and often times optimization will be different based on this known fact than if it was not coded, and the program just quit processing after the first 25.
Use FETCH 1 ROW ONLY in SQL queries where you want to fetch the count of all rows i.e. COUNT (*). Also, add check for SQLCODE
SELECT COUNT(*) FROM EMPLOYEE WHERE EMPLOYEE_ID <= 30100 AND SQLCODE = 0 FETCH 1 ROW ONLY;
A well-maintained product backlog is crucial for successful product development. It serves as a single…
Incremental value to the customer refers to the gradual delivery of small, functional parts of…
A Product Market refers to the group of potential customers who might be interested in…
The Professional Agile Leadership - Evidence-Based Management (PAL-EBM) certification offered by Scrum.org is designed for…
The Professional Agile Leadership (PAL I) certification, offered by Scrum.org, is designed to equip leaders…
Choosing the right Scrum Master Certification depends on your current experience and career goals. If…