FETCH FIRST n ROW ONLY and OPTIMIZE FOR n ROWS

The FETCH FIRST n ROW ONLY clause is used to limit the result set of a query to a specified number of rows. It is particularly useful when you are only interested in retrieving a subset of the total records. Retrieving the entire result table from the query can be inefficient. You can specify this clause in a SELECT statement to limit the number of rows in the result table of a query to n rows. If you want to fetch multiple rows at one go & multiple times till the end of the result set then you can use the Rowset feature using DB2 cursor.

Displaying a specific number of top or bottom records based on certain criteria.
Implementing pagination in applications.

The OPTIMIZE FOR n ROWS clause is used to inform the Db2 optimizer about the expected number of rows to be processed. It helps the optimizer make more accurate decisions when creating an access plan for the query. The OPTIMIZE FOR n ROWS clause lets an application declare its intent to do either of these things:

Providing a hint to the optimizer about the expected cardinality of the result set.
Improving query performance by influencing the choice of access paths.

DB2 uses the OPTIMIZE FOR n ROWS clause to choose access paths that minimize the response time for retrieving the first few rows.

FETCH FIRST n ROW ONLY

Syntax

SELECT select_list FROM table_name FETCH FIRST n ROWS ONLY;
'n' is the number of rows to be returned.

SELECT select_list FROM table_name LIMIT n [OFFSET m];
'm' is the number of rows to skip before returning the n rows.
'n' is the number of rows to be returned.

EXAMPLE – FETCH FIRST n ROWS ONLY

Scenario: Suppose that you write an application that requires information on only the 20 employees with the highest salaries. To return only the rows of the employee table for those 20 employees, you can write a query like this

 SELECT LASTNAME, FIRSTNAME, EMPNO, SALARY
 FROM EMP
 ORDER BY SALARY DESC
 FETCH FIRST 20 ROWS ONLY;

You can also use FETCH FIRST n ROWS ONLY within a subquery.

 SELECT * FROM EMP
 WHERE EMPNO IN (
 SELECT RESPEMP FROM PROJECT
 ORDER BY PROJNO
 FETCH FIRST 3 ROWS ONLY)

EXAMPLE – LIMIT Clause

 SELECT LASTNAME, FIRSTNAME, EMPNO, SALARY
 FROM EMP
 ORDER BY SALARY DESC
 LIMIT 20;

Scenario: Suppose that you write an application that requires information on only the 20 employees with the highest salaries but ignores the first 3 as they are executives. To return only the rows of the employee table for those 20 employees, you can write a query like this

 SELECT LASTNAME, FIRSTNAME, EMPNO, SALARY
 FROM EMP
 ORDER BY SALARY DESC
 LIMIT 20 OFFSET 3;

OPTIMIZE FOR n ROWS

When an application executes a SELECT statement, DB2 assumes that the application will retrieve all the qualifying rows. This assumption is most appropriate for batch environments. However, for interactive SQL applications, such as SPUFI, it is common for a query to define a very large potential result set but retrieve only the first few rows. The access path that DB2 chooses might not be optimal for those interactive applications.

OPTIMIZE FOR 1 ROW to avoid sorts: You can influence the access path most by using OPTIMIZE FOR 1 ROW. OPTIMIZE FOR 1 ROW tells DB2 to select an access path that returns the first qualifying row quickly. This means that whenever possible, DB2 avoids any access path that involves a sort. If you specify a value for n that is anything but 1, DB2 chooses an access path based on cost, and you won’t necessarily avoid sorts.

Number of rows can be retrieved with OPTIMIZE FOR n ROWS: The OPTIMIZE FOR n ROWS clause does not prevent you from retrieving all the qualifying rows. However, if you use OPTIMIZE FOR n ROWS, the total elapsed time to retrieve all the qualifying rows might be significantly greater than if DB2 had optimized for the entire result set.

OPTIMIZE FOR n ROWS is effective only on queries that can be performed incrementally. If the query causes DB2 to gather the whole result set before returning the first row, DB2 ignores the OPTIMIZE FOR n ROWS clause, as in the following situations:

The query uses SELECT DISTINCT or a set function distinct, such as COUNT(DISTINCT C1).
Either GROUP BY or ORDER BY is used, and no index can give the necessary ordering.
An aggregate function and no GROUP BY clause is used.
The query uses a UNION.

EXAMPLE – OPTIMIZE FOR n ROWS

Scenario: Suppose that you query the employee table regularly to determine the employees with the highest salaries. You might use a query like this

 SELECT LASTNAME, FIRSTNAME, EMPNO, SALARY
 FROM EMP
 ORDER BY SALARY DESC;

An index is defined on column EMPNO, so employee records are ordered by EMPNO. If you have also defined a descending index on column SALARY, that index is likely to be very poorly clustered. To avoid many random, synchronous I/O operations, DB2 would most likely use a tablespace scan, then sort the rows on SALARY. This technique can cause a delay before the first qualifying rows can be returned to the application.

If you add the OPTIMIZE FOR n ROWS clause to the statement, DB2 will probably use the SALARY index directly because you have indicated that you expect to retrieve the salaries of only the 20 most highly paid employees.

Scenario: The following statement uses that strategy to avoid a costly sort operation

 SELECT LASTNAME,FIRSTNAME,EMPNO,SALARY
 FROM EMP
 ORDER BY SALARY DESC
 OPTIMIZE FOR 20 ROWS;

Effects of using OPTIMIZE FOR n ROWS:

The join method could change. Nested loop join is the most likely choice because it has low overhead cost and appears to be more efficient if you want to retrieve only one row.
An index that matches the ORDER BY clause is more likely to be picked. This is because no sort would be needed for the ORDER BY.
List prefetch is less likely to be picked.
Sequential prefetch is less likely to be requested by DB2 because it infers that you only want to see a small number of rows.
In a join query, the table with the columns in the ORDER BY clause is likely to be picked as the outer table if there is an index on that outer table that gives the ordering needed for the ORDER BY clause.

Note: FETCH FIRST n ROWS ONLY will limit the result set to ‘n’ rows. But OPTIMIZE FOR n ROWS will not limit the result set. It is just an OPTIMIZER Directive to choose the optimized path and the limit of the result set lies with the WHERE clause.

Restrictions FETCH FIRST n ROWS ONLY AND OPTIMIZE FOR n ROWS

OPTIMIZE FOR n ROWS and FETCH FIRST n ROWS ONLY have no impact on operations that require a sort, like ORDER BY, GROUP BY, DISTINCT, UNION, and merge join. For ORDER BY, however, it does make it more likely that an index will be used, even one with a low cluster ratio, to avoid the sort if n is small (1 or 12 for example). The clauses do not avoid processing a work file if required for a hybrid join, materialization of a view, materialization of a nested table expression, etc. The clauses have no effect on non-correlated sub-select processing where the inner select qualifies many rows since they must all be processed before the outer select can be processed. The clauses cannot be used within the inner table of a subselect (it can be used after the subselect), they cannot be used in a CREATE VIEW statement, a nested table expression, and they cannot be used with INSERT, UPDATE, and DELETE statements.

If OPTIMIZE FOR n ROWS is not specified, n in FETCH FIRST n ROWS ONLY is used as OPTIMIZE FOR n ROWS for access path selection. If OPTIMIZE FOR x ROWS is coded and x is not equal to n, the smaller value is used, for example:

If x > n, optimize for n rows is used (value for FETCH FIRST n ROWS ONLY)
If x < n, optimize for x rows is used (value for OPTIMIZE FOR n ROWS)

Both FETCH FIRST and OPTIMIZE FOR are estimates, not guarantees. The actual number of rows returned might be slightly higher or lower.

Key Differences between FETCH FIRST n ROWS ONLY and OPTIMIZE FOR n ROWS

Timing of Application:

FETCH FIRST is applied during the execution of the query and limits the rows in the result set.
OPTIMIZE FOR is considered during the optimization phase and influences the query execution plan.

Effect on Result Set:

FETCH FIRST directly impacts the number of rows returned in the result set.
OPTIMIZE FOR does not limit the result set but guides the optimizer’s decisions.

Flexibility:

FETCH FIRST provides more control over the number of rows in the final output.
OPTIMIZE FOR is a performance hint and might not guarantee the exact number of rows in the result set.

Applicability:

Use FETCH FIRST when you want to limit the number of rows in the result set.
Use OPTIMIZE FOR when you want to guide the optimizer’s choices based on expected cardinality.

Conclusion

Understanding and appropriately using FETCH FIRST n ROW ONLY and OPTIMIZE FOR n ROWS in Db2 SQL queries can significantly impact the efficiency of data retrieval. Whether you need to limit the size of your result set or guide the optimizer’s decisions, these clauses provide valuable tools for enhancing the performance of your database queries. By incorporating these clauses judiciously, developers and database administrators can strike a balance between precision and optimization in their SQL queries.