DB2 Join: An INNER JOIN finds and returns matching data from tables, while an outer join finds and returns matching data and some dissimilar data from tables. An inner join finds and returns matching data from tables, while an outer join finds and returns matching data and some dissimilar data from tables. For combining two sets of columns with/without removing duplicates refer to UNION vs UNION ALL use in SQL Queries. For returning the first non-null value in a list of the input expressions use COALESCE function in Join Queries.
DB2 Join : Inner join
Combines each row of the left table with each row of the right table, keeping only the rows in which the join condition is true.
DB2 Join : Outer join
Includes the rows that are produced by the inner join, plus the missing rows, depending on the type of outer join:
DB2 Join : Left outer join
Includes the rows from the left table that were missing from the inner join.
DB2 Join : Right outer join
Includes the rows from the right table that were missing from the inner join.
DB2 Join : Full outer join
Includes the rows from both tables that were missing from the inner join.
Cartesian product
When two or more tables are referenced in the FROM clause of a query, the database server joins the tables. If neither the FROM clause nor the WHERE clause specifies a predicate for the join, the server computes a Cartesian product that contains m * n rows, where m is the number of rows in the first table and n is the number of rows in the second table. This product is the set of all possible combinations formed by concatenating a row from the first table with a row from the second table.
Table-(T1) Table-(T2) ID NAME ID TITLE 10 Sandy 20 Sales Mgr 20 Sam 30 Clerk 30 Cindy 30 Manager 40 Sales Rep 50 Manager
Example of Standard Join SELECT * Result FROM T1, ID NAME ID TITLE WHERE T1.ID = T2.ID 20 Sam 20 Sales Mgr ORDER BY T1.ID 30 Cindy 30 Clerk , T2.TITLE; 30 Cindy 30 Manager Example of Inner Join SELECT * Result FROM T1 ID NAME ID TITLE INNER JOIN T2 20 Sam 20 Sales Mgr ON T1.ID = T2.ID 30 Cindy 30 Clerk ORDER BY T1.ID 30 Cindy 30 Manager , T2.TITLE;
ON and WHERE Usage
In an inner join only, an ON and a WHERE check work much the same way. Both define the nature of the join, and because in an inner join, only matching rows are returned, both act to exclude all rows that do not match the join.
Inner join, using ON check
Below is an inner join that uses an ON check to exclude “Manager”:
Inner join, using ON check Below is an inner join that uses an ON check to exclude “Manager”: SELECT * Result FROM T1 ID NAME ID TITLE INNER JOIN T2 20 Sam 20 Sales Mgr ON T1.ID = T2.ID 30 Cindy 30 Clerk AND T2.TITLE <> ’Manager’ ORDER BY T1.ID , T2.TITLE; Inner join, using WHERE check Here is the same query written using a WHERE clause SELECT * Result FROM T1 ID NAME ID TITLE INNER JOIN T2 20 Sam 20 Sales Mgr ON T1.ID = T2.ID 30 Cindy 30 Clerk WHERE T2.TITLE <> ’Manager’ ORDER BY T1.ID , T2.TITLE;
Left Outer Join
A left outer join is the same as saying that I want all of the rows in the first table listed, plus any matching rows in the second table:
Table-(T1) Table-(T2) Result ID NAM ID TITLE ID NAME ID TITLE 10 Sandy 20 Sales Mgr 10 Sandy -- --------- 20 Sam 30 Clerk 20 Sam 20 Sales Mgr 30 Cindy 30 Manager 30 Cindy 30 Clerk 40 Sales Rep 30 Cindy 30 Manager 50 Manager Example of Left Outer Join SELECT * FROM T1 LEFT OUTER JOIN T2 ON T1.ID = T2.ID ORDER BY T1.ID , T2.TITLE;
ON and WHERE Usage
In a partial outer join (i.e. left or right), an ON check works differently, depending on what table (field) it refers to:
- If it refers to a field in the table being joined to, it determines whether the related row matches the join or not.
- If it refers to a field in the table being joined from, it determines whether the related row finds a match or not. Regardless, the row will be returned.
In the next example, those rows in the table being joined to (i.e. the T2 view) that match on ID, and that is not for a manager are joined to:
ON check on table being joined to SELECT * Result FROM T1 ID NAME ID TITLE LEFT OUTER JOIN T2 10 Sandy -- --------- ON T1.ID = T2.ID 20 Sam 20 Sales Mgr AND T2.TITLE <> ’Manager’ 30 Cindy 30 Clerk ORDER BY T1.ID , T2.TITLE;
WHERE check on table being joined to
If we rewrite the above query using a WHERE check we will lose a row (of output) because the check is applied after the join is done, and a null JOB does not match:
SELECT * Result FROM T1 ID NAME ID TITLE LEFT OUTER JOIN T2 20 Sam 20 Sales Mgr ON T1.ID = T2.ID 30 Cindy 30 Clerk WHERE T2.TITLE <> ’Manager’ ORDER BY T1.ID , T2.TITLE; We could make the WHERE equivalent to the ON, if we also checked for nulls: SELECT * Result FROM T1 ID NAME ID TITLE LEFT OUTER JOIN T2 10 Sandy -- --------- ON T1.ID = T2.ID 20 Sam 20 Sales Mgr WHERE (T2.TITLE <> ’Manager’ 30 Cindy 30 Clerk OR T2.TITLE IS NULL) ORDER BY T1.ID , T2.TITLE;
In the next example, those rows in the table being joined from (i.e. the T1 view) that match on ID and have a NAME > ’D’ participate in the join. Note however that T1 rows that do not participate in the join (i.e. ID = 30) are still returned:
SELECT * Result FROM T1 ID NAME ID TITLE LEFT OUTER JOIN T2 10 Sandy -- --------- ON T1.ID = T2.ID 20 Sam 20 Sales Mgr AND T2.NAME > ‘D’ 30 Cindy -- --------- ORDER BY T1.ID , T2.TITLE;
If we rewrite the above query using a WHERE check (on NAME) we will lose a row because now the check excludes rows from the answer-set, rather than from participating in the join:
SELECT * Result FROM T1 ID NAME ID TITLE LEFT OUTER JOIN T2 10 Sandy -- --------- ON T1.ID = T2.ID 20 Sam 20 Sales Mgr WHERE T2.NAME > ‘D’ ORDER BY T1.ID , T2.TITLE;
Unlike in the previous example, there is no way to alter the above WHERE check to make it logically equivalent to the prior ON check. The ON and the WHERE are applied at different times and for different purposes and thus do completely different things.
Right Outer Join
A right outer join is the inverse of a left outer join. One gets every row in the second table listed, plus any matching rows in the first table:
Table-(T1) Table-(T2) Result ID NAME ID TITLE ID NAME ID TITLE 10 Sandy 20 Sales Mg 20 Sam 20 Sales Mgr 20 Sam 30 Clerk 30 Cindy 30 Clerk 30 Cindy 30 Manager 30 Cindy 30 Manager 40 Sales Rep -- ---- 40 Sales Rep 50 Manager -- ----- 50 Manager Example of Right Outer Join SELECT * Result FROM T1 ID NAME ID TITLE RIGHT OUTER JOIN T2 20 Sam 20 Sales Mgr ON T1.ID = T2.ID 30 Cindy 30 Clerk ORDER BY T1.ID 30 Cindy 30 Manager , T2.TITLE; -- ----- 40 Sales Rep -- ----- 50 Manager
ON and WHERE Usage
The rules for ON and WHERE usage are the same in a right outer join as they are for a left outer, except that the relevant tables are reversed.
Full Outer Joins
A full outer join occurs when all of the matching rows in two tables are joined, and there is also returned one copy of each non-matching row in both tables.
Table-(T1) Table-(T2) Result ID NAME ID TITLE ID NAME ID TITLE 10 Sandy 20 Sales Mgr 10 Sandy -- --------- 20 Sam 30 Clerk 20 Sam 20 Sales Mgr 30 Cindy 30 Manager 30 Cindy 30 Clerk 40 Sales Rep 30 Cindy 30 Manager 50 Manager -- ----- 40 Sales Rep -- ----- 50 Manager Example of Full Outer Join SELECT * Result FROM T1 ID NAME ID TITLE FULL OUTER JOIN T2 10 Sandy -- --------- ON T1.ID = T2.ID 20 Sam 20 Sales Mgr ORDER BY T1.ID 30 Cindy 30 Clerk , T2.ID 30 Cindy 30 Manager , T2.TITLE; -- ----- 40 Sales Rep -- ----- 50 Manager
ON and WHERE Usage
In a full outer join, an ON check is quite unlike a WHERE check-in in that it never results in a row being excluded from the answer set. All it does is categorize the input row as being either matching or non-matching. For example, in the following full outer join, the ON check joins those rows with equal key values:
Full Outer Join, match on keys SELECT * Result FROM T1 ID NAME ID TITLE FULL OUTER JOIN T2 10 Sandy -- --------- ON T1.ID = T2.ID 20 Sam 20 Sales Mgr ORDER BY T1.ID 30 Cindy 30 Clerk , T2.ID 30 Cindy 30 Manager , T2.TITLE; -- ----- 40 Sales Rep -- ----- 50 Manager
Full Outer Join, match on keys > 20
In the next example, we have deemed that only those IDs that match, and that also have a value greater than 20, are a true match:
SELECT * Result FROM T1 ID NAME ID TITLE FULL OUTER JOIN T2 10 Sandy -- --------- ON T1.ID = T2.ID 20 Sam -- --------- AND T1.ID > 20 30 Cindy 30 Clerk ORDER BY T1.ID 30 Cindy 30 Manager , T2.ID -- ----- 20 Sales Mgr , T2.TITLE; -- ----- 40 Sales Rep -- ----- 50 Manager
Observe how in the above statement we added a predicate, and we got more rows! This is because in an outer join an ON predicate never removes rows. It simply categorizes them as being either matching or non-matching. If they match, it joins them. If they don’t, it passes them through. In the next example, nothing matches. Consequently, every row is returned individually. This query is logically similar to doing a UNION ALL on the two views:
Full Outer Join, match on keys (no rows match) SELECT * Result FROM T1 ID NAME ID TITLE FULL OUTER JOIN T2 10 Sandy -- --------- ON T1.ID = T2.ID 20 Sam -- --------- AND +1 > -1 30 Cindy -- --------- ORDER BY T1.ID -- ----- 20 Sales Mgr , T2.ID -- ----- 30 Clerk , T2.TITLE; -- ----- 30 Manager -- ----- 40 Sales Rep -- ----- 50 Manager
ON checks are somewhat like WHERE checks in that they have two purposes. Within a table, they are used to categorize rows as being either matching or non-matching. Between tables, they are used to define the fields that are to be joined on.
In the prior example, the first ON check defined the fields to join on, while the second join identified those fields that matched the join. Because nothing matched (due to the second predicate), everything fell into the “outer join” category. This means that we can remove the first ON check without altering the answer set:
Full Outer Join, don’t match on keys (no rows match) SELECT * Result FROM T1 ID NAME ID TITLE FULL OUTER JOIN T2 10 Sandy -- --------- ON T1.ID = T2.ID 20 Sam -- --------- AND +1 > -1 30 Cindy -- --------- ORDER BY T1.ID -- ----- 20 Sales Mgr , T2.ID -- ----- 30 Clerk , T2.TITLE; -- ----- 30 Manager -- ----- 40 Sales Rep -- ----- 50 Manager
What happens if everything matches and we don’t identify the join fields? The result in a Cartesian Product:
Full Outer Join, don’t match on keys (all rows match) SELECT * Result FROM T1 ID NAME ID TITLE FULL OUTER JOIN T2 10 Sandy 20 Sales Mgr ON T1.ID = T2.ID 10 Sandy 30 Clerk AND +1 <> -1 10 Sandy 30 Manager ORDER BY T1.ID 10 Sandy 40 Sales Rep , T2.ID 10 Sandy 50 Manager , T2.TITLE; 20 Sam 20 Sales Mgr 20 Sam 30 Clerk 20 Sam 30 Manager 20 Sam 40 Sales Rep 20 Sam 50 Manager 30 Cindy 20 Sales Mgr 30 Cindy 30 Clerk 30 Cindy 30 Manager 30 Cindy 40 Sales Rep 30 Cindy 50 Manager
In an outer join, WHERE predicates behave as if they were written for an inner join. In particular, they always do the following:
- WHERE predicates defining join fields enforce an inner join on those fields.
- WHERE predicates on non-join fields are applied after the join, which means that when they are used on not-null fields, they negate the outer join.
Here is an example of a WHERE join predicate turning an outer join into an inner join:
Full Outer Join, turned into an inner join by WHERE SELECT * Result FROM T1 ID NAME ID TITLE FULL JOIN T2 20 Sam 20 Sales Mgr ON T1.ID = T2.ID 30 Cindy 30 Clerk WHERE T1.ID = T2.ID 30 Cindy 30 Manager ORDER BY T1.ID , T2.ID , T2.TITLE;
To illustrate some of the complications that WHERE checks can cause, imagine that we want to do a FULL OUTER JOIN on our two test views (see below), limiting the answer to those rows where the “T1 ID” field is less than 30. There are several ways to express this query, each giving a different answer:
In our first example, the “T1.ID < 30” predicate is applied after the join, which effectively eliminates all “T2” rows that don’t match (because their “T1.ID” value is null):
Outer join T1.ID < 30, check applied in WHERE (after join) SELECT * Result FROM T1 ID NAME ID TITLE FULL JOIN T2 10 Sandy -- --------- ON T1.ID = T2.ID 20 Sam 20 Sales Mgr WHERE T1.ID <30 ORDER BY T1.ID , T2.ID , T2.TITLE;
In the next example the “T1.ID < 30” check is done during the outer join where it does not any eliminate rows but rather limits those that match in the two views:
Outer join T1.ID < 30, check applied in ON (during join) SELECT * Result FROM T1 ID NAME ID TITLE FULL JOIN T2 10 Sandy -- --------- ON T1.ID = T2.ID 20 Sam 20 Sales Mgr AND T1.ID < 30 30 Cindy -- --------- ORDER BY T1.ID -- ----- 30 Clerk , T2.ID -- ----- 30 Manager , T2.TITLE; -- ----- 40 Sales Rep -- ----- 50 Manager
Imagine that what really wanted to have the “T1.ID < 30” check only apply to those rows in the “T1” table. Then one has to apply the check before the join, which requires the use of a nested-table expression:
Outer join T1.ID < 30, check applied in WHERE (before join) SELECT * Result FROM (SELECT * ID NAME ID TITLE FROM T1 10 Sandy -- --------- WHERE ID < 30) AS T1 20 Sam 20 Sales Mgr FULL OUTER JOIN T2 -- ----- 30 Clerk ON T1.ID = T2.ID -- ----- 30 Manager ORDER BY T1.ID -- ----- 40 Sales Rep , T2.ID -- ----- 50 Manager , T2.TITLE;
Observe how in the above query we still got a row back with an ID of 30, but it came from the “T2” table. This makes sense because the WHERE condition had been applied before we got to this table. There are several incorrect ways to answer the above question. In the first example, we shall keep all non-matching T2 rows by allowing to pass any null T1.ID values:
Outer join T1.ID < 30, (gives wrong answer - see text) SELECT * Result FROM T1 ID NAME ID TITLE FULL OUTER JOIN T2 10 Sand -- --------- ON T1.ID = T2.ID 20 Sam 20 Sales Mgr WHERE T1.ID < 30 -- ----- 40 Sales Rep OR T1.ID IS NULL -- ----- 50 Manager ORDER BY T1.ID , T2.ID , T2.TITLE;
There are two problems with the above query: First, it is only appropriate to use when the T1.ID field is defined as not null, which it is in this case. Second, we lost the row in the T2 table where the ID equaled 30. We can fix this latter problem, by adding another check, but the answer is still wrong:
Outer join T1.ID < 30, (gives wrong answer - see text) SELECT * Result FROM T1 ID NAME ID TITLE FULL OUTER JOIN T2 10 Sandy -- --------- ON T1.ID = T2.ID 20 Sam 20 Sales Mgr WHERE T1.ID < 30 30 Cindy 30 Clerk OR T1.ID = T2.ID 30 Cindy 30 Manager OR T1.ID IS NULL -- ----- 40 Sales Rep ORDER BY T1.ID -- ----- 50 Manager , T2.ID , T2.TITLE;
The last two checks in the above query ensure that every T2 row is returned. But they also have the effect of returning the NAME field from the T1 table whenever there is a match. Given our intentions, this should not happen.
SUMMARY: Query WHERE conditions are applied after the join. When used in an outer join, this means that they are applied to all rows from all tables. In effect, this means that any WHERE conditions in a full outer join will, in most cases, turn it into a form of inner join.