London skyline at sunset

Shane Driscoll

On joining tables and traversing relationships

Tag: outerjoin

  • Filtering OUTER JOINs via ON

    In my last post, we looked at how a WHERE clause can cause an OUTER JOIN to return unexpected results, and sometimes data you thought would come back may be missing altogether.

    Luckily, if you are looking for a way to apply a degree of filtering to an OUTER JOIN whilst still maintaining its ability to retain all rows from one or both sides of the join, there is a solution. It doesn’t make use of the WHERE clause though; instead the work is done in the ON clause of the join itself.

    This may feel unintuitive if you are used to using the ON clause purely to specify the source columns or expressions you wish to match on to join datasets. For OUTER JOINs however it can be utilised as a filter, one which will be applied DURING the processing of the join. This is different to a WHERE clause which as we’ve learnt comes into play AFTER the join.

    Some words about the ON clause

    Before diving into examples, here are key points about ON in SQL joins:

    The ON clause of a JOIN in a SQL statement is simply a conditional BOOLEAN check, just like in other languages.

    For a join to be successful, the overall result of the ON conditional check must evaluate to TRUE.

    An ON condition is not limited to comparisons of column values between two tables, it can include expressions too!

    Source data

    Time for some examples. We’ll go back to our test datasets:

    dbo.Customers
    customer_idcustomer_namecustomer_tier
    1Alicebronze
    2Bobsilver
    3Charliegold
    4Daniellegold
    5Edgold
    dbo.Purchases
    purchase_idcustomer_idstore_namepurchase_datepurchase_amount
    11AutoFix2025-01-02100.00
    22AutoFix2024-12-20200.00
    32BookWorld2024-06-11150.00
    44AutoFix2025-01-03250.00
    55BookWorld2024-09-30100.00
    65BookWorld2025-01-27300.00

    To better get a feel for the data, we’ll join these tables together with a LEFT OUTER JOIN and no filtering:

    SQL
    SELECT
      C.customer_name,
      P.store_name,
      P.purchase_amount,
      P.purchase_date
    FROM
      Customers C
      LEFT OUTER JOIN Purchases P ON C.customer_id = P.customer_id;

    Worked examples using ON filtering

    Let’s delve into the filtering we can achieve in an OUTER JOIN, using the ON clause instead of WHERE.


    Example 1: I want to see a list of ALL customers, but only the purchases from AutoFix.

    SQL
    SELECT
      C.customer_name,
      P.store_name,
      P.purchase_amount,
      P.purchase_date
    FROM
      Customers C
      LEFT OUTER JOIN Purchases P ON C.customer_id = P.customer_id
      AND P.store_name = 'AutoFix';

    You can immediately see here how we’ve tweaked the JOIN. We still have our LEFT OUTER JOIN between the Customers and Purchases tables using the common customer_id solumn.

    What’s new is the second condition. Instead of another column match, it is an expression, one which will only be TRUE for Purchases rows where the store_name is ‘AutoFix’.

    Once again, remember that the ON clause is a conditional, and a join will only succeed if the combination of the clauses evaluates to TRUE.

    The workflow of this query can be read as:

    Fetch ALL rows from the Customers table.

    Perform a LEFT OUTER JOIN where the overall results of the following tests evaluates to TRUE:

    • A row from the Customers table has the same customer_id value as a row from the Purchases table
    • AND – that row from the Purchases table must have a store_name column value of ‘Autofix’

    The result is:

    This data still looks like it’s from an OUTER JOIN doesn’t it? Charlie has made no purchases, so we see NULLS for him. More interesting though is Ed’s row with NULLs. He has made purchases but NONE from AutoFix.

    If we were using the WHERE clause to filter the dataset for AutoFix purchases, neither Charlie or Ed would appear in the output. Instead, by moving that filter into the ON clause, we have taken finer-grained control of our filtering and produced a potentially more useful and insightful dataset.


    For the sake of completeness and before we move into some other examples, let’s see what this query pattern would return if we filter on a store which doesn’t have any recorded purchases.

    Example 2: I want to see a list of ALL customers, but only the purchases from CookieLand.

    SQL
    SELECT
      C.customer_name,
      P.store_name,
      P.purchase_amount,
      P.purchase_date
    FROM
      Customers C
      LEFT OUTER JOIN Purchases P ON C.customer_id = P.customer_id
      AND P.store_name = 'CookieLand';

    We still see each and every customer, but CookieLand really needs to up its advertising spend.


    You can get some interesting results if you perform ON filtering on the left-hand table of a LEFT OUTER JOIN. All rows from the left-hand table will be returned, but the join to the right-hand table will only succeed where the join condition is matched.

    Example 3: I want to see a list of ALL customers, but only the purchases made by ‘gold’-tier customers.

    SQL
    SELECT
      C.customer_name,
      C.customer_tier,
      P.store_name,
      P.purchase_amount,
      P.purchase_date
    FROM
      Customers C
      LEFT OUTER JOIN Purchases P ON C.customer_id = P.customer_id
      and C.customer_tier = 'gold';

    We’ve used an expression to filter data from the left-side Customers table of the LEFT OUTER JOIN instead of the right. This is perfectly valid syntax. Again we have two expressions, the customer_id column match AND the customer_tier = ‘gold’ expression.

    In this query the Customers table is NOT actually being filtered at all. We see ALL rows from that table, with no NULLs in the customer_name or customer_tier columns. Rather the LEFT OUTER JOIN is only making successful matches to the Purchases table (and thereby only bringing back data from the Purchases table) for instances where the ON clause evaluates to TRUE. We will only ever see Purchases data where there is a matching customer_id column value, and the customer is gold-tier.

    To complete the picture, note that even though Charlie is a gold-tier customer, he has made no purchases. This means that his customer_id doesn’t appear in the Purchases table, so the ON conditional can only evaluate as FALSE for him.


    The conditional checks in an ON clause must in combination evaluate to TRUE, but we’re not just restricted to ANDs to combine them.

    Example 4: I want to see a list of ALL customers, but only the purchases made at AutoFix, OR by ‘gold’-tier customers.

    SQL
    SELECT C.customer_name
    	,C.customer_tier
    	,P.store_name
    	,P.purchase_amount
    	,P.purchase_date
    FROM Customers C
    LEFT OUTER JOIN Purchases P ON C.customer_id = P.customer_id
    	AND (
    		P.store_name = 'AutoFix'
    		OR C.customer_tier = 'gold'
    		);

    Again, we are simply adding to the set of expressions which must in combination evaluate to TRUE for the join to succeed. In this case we’re adding conditions to both the left AND right tables in the join.


    We can even use date range filters in our ON clauses.

    Example 5: I want to see a list of ALL customers, but only the purchases made at BookWorld prior to 2025-01-01.

    SQL
    SELECT C.customer_name
    	,P.store_name
    	,P.purchase_amount
    	,P.purchase_date
    FROM Customers C
    LEFT OUTER JOIN Purchases P ON C.customer_id = P.customer_id
    	AND P.store_name = 'BookWorld'
    	AND P.purchase_date < '2025-01-01';

    Key Takeaways

    1. Filtering using the ON clause allows you to control which rows successfully join, as opposed to which rows are filtered in the final output.
    2. In the case of a LEFT OUTER JOIN for example, ON filtering controls which rows from the right-hand table appear, without filtering rows from the left-hand table.
    3. You can use combinations of multiple conditions in your ON clause (similar to those in a WHERE Clause) to exercise fine-grained control over which rows from the right-hand table appear, whilst keeping all of the left-hand table data.

    By mastering the use of ON clause filtering in OUTER joins, you can unlock new ways of shaping your datasets with just a few lines of SQL! It’s a perfect solution for reporting-style queries where you need fine-grained filtering while still ensuring you see all rows from one side of your source datasets.

    The demo SQL code for this blog post is available below.

    SQL
    /*
    DISCLAIMER:
    The scripts provided in this blog post are for educational and demonstration purposes only. 
    They are provided "as-is" without warranty of any kind, either expressed or implied, 
    including but not limited to the implied warranties of merchantability or fitness for a 
    particular purpose. Use these scripts at your own risk.
    
    The author is not responsible for any data loss, corruption, or system issues that may 
    arise from running these scripts in any environment. Please ensure to test these scripts 
    thoroughly in a non-production environment before using them in production systems.
    
    The scripts are designed for SQL Server and may require adjustments for other database systems.
    Please review and modify as needed to suit your specific use case.
    */
    
    /* 
    https://shanedriscoll.com/sql/filtering-outer-joins-via-on/
    */
    
    DROP TABLE IF EXISTS
      dbo.Purchases;
    
    DROP TABLE IF EXISTS
      dbo.Customers;
    
    /* Customers Table */
    CREATE TABLE
      dbo.Customers (
        customer_id INT PRIMARY KEY,
        customer_name VARCHAR(100),
        customer_tier VARCHAR(100)
      );
    
    /* Purchases Table */
    CREATE TABLE
      dbo.Purchases (
        purchase_id INT PRIMARY KEY,
        customer_id INT,
        store_name VARCHAR(100),
        purchase_date DATE,
        purchase_amount DECIMAL(10, 2),
        FOREIGN KEY (customer_id) REFERENCES Customers (customer_id)
      );
    
    /* Insert Customers */
    INSERT INTO
      Customers (customer_id, customer_name, customer_tier)
    VALUES
      (1, 'Alice', 'bronze'),
      (2, 'Bob', 'silver'),
      (3, 'Charlie', 'gold'),
      (4, 'Danielle', 'gold'),
      (5, 'Ed', 'gold');
    
    /* Insert Purchases (Bob and Ed have purchases at multiple stores) */
    INSERT INTO
      Purchases (
        purchase_id,
        customer_id,
        store_name,
        purchase_date,
        purchase_amount
      )
    VALUES
      (1, 1, 'AutoFix', '2025-01-02', 100.00),
      (2, 2, 'AutoFix', '2024-12-20', 200.00),
      (3, 2, 'BookWorld', '2024-06-11', 150.00),
      (4, 4, 'AutoFix', '2025-01-03', 250.00),
      (5, 5, 'BookWorld', '2024-09-30', 100.00),
      (6, 5, 'BookWorld', '2025-01-27', 300.00);
    
    /* LEFT OUTER JOIN with no filtering */
    SELECT
      C.customer_name,
      P.store_name,
      P.purchase_amount,
      P.purchase_date
    FROM
      Customers C
      LEFT OUTER JOIN Purchases P ON C.customer_id = P.customer_id;
    
    /* Example 1: I want to see a list of ALL customers, but only the purchases from AutoFix. */
    SELECT
      C.customer_name,
      P.store_name,
      P.purchase_amount,
      P.purchase_date
    FROM
      Customers C
      LEFT OUTER JOIN Purchases P ON C.customer_id = P.customer_id
      AND P.store_name = 'AutoFix';
      
    /* Example 2: I want to see a list of ALL customers, but only the purchases from CookieLand. */
    SELECT
      C.customer_name,
      P.store_name,
      P.purchase_amount,
      P.purchase_date
    FROM
      Customers C
      LEFT OUTER JOIN Purchases P ON C.customer_id = P.customer_id
      AND P.store_name = 'CookieLand';
      
    /* Example 3: I want to see a list of ALL customers, but only the purchases made by ‘gold’-tier customers. */
    SELECT
      C.customer_name,
      C.customer_tier,
      P.store_name,
      P.purchase_amount,
      P.purchase_date
    FROM
      Customers C
      LEFT OUTER JOIN Purchases P ON C.customer_id = P.customer_id
      and C.customer_tier = 'gold';
    
    /* Example 4: I want to see a list of ALL customers, but only the purchases made at AutoFix, OR by ‘gold’-tier customers. */
    SELECT C.customer_name
    	,C.customer_tier
    	,P.store_name
    	,P.purchase_amount
    	,P.purchase_date
    FROM Customers C
    LEFT OUTER JOIN Purchases P ON C.customer_id = P.customer_id
    	AND (
    		P.store_name = 'AutoFix'
    		OR C.customer_tier = 'gold'
    		);
    
    /* Example 5: I want to see a list of ALL customers, but only the purchases made at BookWorld prior to 2025-01-01.  */
    SELECT C.customer_name
    	,P.store_name
    	,P.purchase_amount
    	,P.purchase_date
    FROM Customers C
    LEFT OUTER JOIN Purchases P ON C.customer_id = P.customer_id
    	AND P.store_name = 'BookWorld'
    	AND P.purchase_date < '2025-01-01';		
    
    /* Cleanup */
    DROP TABLE dbo.Purchases;
    
    DROP TABLE dbo.Customers;
  • OUTER JOIN plus WHERE filter equals not-so OUTER JOIN

    I often forget the finer details around the behaviours of SQL OUTER JOINs where filtering is applied in the ON or WHERE clauses. Over the years I’ve noticed it’s a common area of confusion for many developers too. It’s not surprising. The behaviour isn’t always the most intuitive, and a LEFT OUTER join can be harder to picture as clearly in your mind’s eye than the more simple INNER JOIN.

    Let’s get into the OUTER JOIN weeds a little (for a couple of posts) to get to grips with things, and along the way we may come across some easy ways to solve querying challenges which might be new to you. In any case, I’m looking forward to having some of my own documentation to refresh my memory with in future!

    Looking at LEFT OUTER JOINs, in plain language the workflow could be described as:

    Return ALL rows from the left-hand table. If there’s a match in the right-hand table based on the join condition, include the right-hand table data. If there’s no match, fill in NULLs for the right-hand table’s columns.

    Our source data

    Consider 2 source tables which contain this data:

    dbo.Customers
    customer_idcustomer_namecustomer_tier
    1Alicebronze
    2Bobsilver
    3Charliegold
    4Daniellegold
    5Edgold
    dbo.Purchases
    purchase_idcustomer_idstore_namepurchase_datepurchase_amount
    11AutoFix2025-01-02100.00
    22AutoFix2024-12-20200.00
    32BookWorld2024-06-11150.00
    44AutoFix2025-01-03250.00
    55BookWorld2024-09-30100.00
    65BookWorld2025-01-27300.00

    This data is quite straightforward. 5 customers and the purchases they’ve made across 2 different stores. Note that Charlie hasn’t actually made any purchases at all. Not much point demonstrating LEFT OUTER JOINs without NULLs is there?

    OUTER JOINs and WHERE example

    I commonly see confusion about what happens when you filter a LEFT OUTER JOIN with a WHERE clause. People start out with good intentions about using an OUTER join to bring back every row from the table on the left side of the LEFT OUTER JOIN, but things then go off the rails when a WHERE filter on the right side of the join is introduced.

    Let’s demonstrate. We’ll start by getting a look at all the data by LEFT OUTER JOINing dbo.Customers to dbo.Purchases, without any filtering.

    SQL
    SELECT
      C.customer_name,
      P.store_name,
      P.purchase_amount,
      P.purchase_date
    FROM
      Customers C
      LEFT OUTER JOIN Purchases P ON C.customer_id = P.customer_id;

    This looks good so far. All customers are there, and Charlie has NULL purchase values. Now let’s add a WHERE filter to this query to only return rows for purchases from BookWorld.

    SQL
    SELECT
      C.customer_name,
      P.store_name,
      P.purchase_amount,
      P.purchase_date
    FROM
      Customers C
      LEFT OUTER JOIN Purchases P ON C.customer_id = P.customer_id
    WHERE
      P.store_name = 'BookWorld';

    Maybe the developer wanted the data to look like this, maybe not. It could be that with a LEFT OUTER JOIN they were expecting all of the dbo.Customers rows to be returned but only with purchase information from BookWorld. In any case, we don’t see all of the customers anymore.

    To understand what happened, let’s look at the general order in which database engines will process parts of a SQL query:

    1. FROM (including JOINs and ON)
    2. WHERE
    3. GROUP BY
    4. HAVING
    5. SELECT
    6. ORDER BY
    7. LIMIT / TOP
    8. OFFSET

    Each of these steps are logically performed in order as shown, with each step passing on the intermediate resultset to the next step upon completion.

    So in the query above:

    • The LEFT OUTER JOIN has been computed first, bringing back all of the customers as expected in an intermediate resultset.
    • The WHERE clause has come along next and done precisely what it was asked to do; it has filtered the resultset and dutifully returned ONLY rows where the P.store_name column has the value ‘BookWorld’.
    • Thus the NULLs from the Purchases table have been filtered out, and the OUTER JOIN goodness has gone!

    Therefore with the added WHERE clause above, the workflow has effectively changed to:

    Return ALL rows from the left-hand table. If there’s a match in the right-hand table based on the join condition, include the right-hand table data. If there’s no match, fill in NULLs for the right-hand table’s columns.

    After that, drop ALL rows where the P.store_name column isn’t ‘BookWorld’ (including NULLs).

    INNER JOINs and WHERE example

    What we now have is more like a plain old INNER JOIN. To illustrate, let’s perform the same steps with an INNER JOIN as those joins are generally simpler to follow. Let’s INNER JOIN dbo.Customers to dbo.Purchases, without any filtering.

    SQL
    SELECT
      C.customer_name,
      P.store_name,
      P.purchase_amount,
      P.purchase_date
    FROM
      Customers C
      INNER JOIN Purchases P ON C.customer_id = P.customer_id;

    We see Charlie missing here right from the initial JOIN step, which is to be expected with an INNER JOIN.

    SQL
    SELECT
      C.customer_name,
      P.store_name,
      P.purchase_amount,
      P.purchase_date
    FROM
      Customers C
      INNER JOIN Purchases P ON C.customer_id = P.customer_id
    WHERE
      P.store_name = 'BookWorld';

    No surprises with the WHERE filter here. Any rows where the P.store_name is not ‘BookWorld’ have been removed. The final dataset is in fact the same as the LEFT OUTER JOIN query version after the WHERE filter was added.

    Key takeaways

    • WHERE filters results AFTER all joins have been made.
    • A non-NULL WHERE filter will remove NULLs, including those which come from an OUTER JOIN.
    • A WHERE filter which filters NULLs could undo what you may have been intending to achieve with a LEFT OUTER JOIN in the first place!

    In the next blog post in this series we’ll look into how we can safely and predictably perform resultset “filtering”, whilst still maintaining the OUTER JOIN functionality where all rows are returned from at least one side of the join.

    The demo SQL code for this blog post is available below.

    SQL
    /*
    DISCLAIMER:
    The scripts provided in this blog post are for educational and demonstration purposes only. 
    They are provided "as-is" without warranty of any kind, either expressed or implied, 
    including but not limited to the implied warranties of merchantability or fitness for a 
    particular purpose. Use these scripts at your own risk.
    
    The author is not responsible for any data loss, corruption, or system issues that may 
    arise from running these scripts in any environment. Please ensure to test these scripts 
    thoroughly in a non-production environment before using them in production systems.
    
    The scripts are designed for SQL Server and may require adjustments for other database systems.
    Please review and modify as needed to suit your specific use case.
    */
    
    /* 
    https://shanedriscoll.com/outer-join-plus-where-filter-equals-not-so-outer-join/
    */
    
    DROP TABLE IF EXISTS
      dbo.Purchases;
    
    DROP TABLE IF EXISTS
      dbo.Customers;
    
    /* Customers Table */
    CREATE TABLE
      dbo.Customers (
        customer_id INT PRIMARY KEY,
        customer_name VARCHAR(100),
        customer_tier VARCHAR(100)
      );
    
    /* Purchases Table */
    CREATE TABLE
      dbo.Purchases (
        purchase_id INT PRIMARY KEY,
        customer_id INT,
        store_name VARCHAR(100),
        purchase_date DATE,
        purchase_amount DECIMAL(10, 2),
        FOREIGN KEY (customer_id) REFERENCES Customers (customer_id)
      );
    
    /* Insert Customers */
    INSERT INTO
      Customers (customer_id, customer_name, customer_tier)
    VALUES
      (1, 'Alice', 'bronze'),
      (2, 'Bob', 'silver'),
      (3, 'Charlie', 'gold'),
      (4, 'Danielle', 'gold'),
      (5, 'Ed', 'gold');
    
    /* Insert Purchases (Bob and Ed have purchases at multiple stores) */
    INSERT INTO
      Purchases (
        purchase_id,
        customer_id,
        store_name,
        purchase_date,
        purchase_amount
      )
    VALUES
      (1, 1, 'AutoFix', '2025-01-02', 100.00),
      (2, 2, 'AutoFix', '2024-12-20', 200.00),
      (3, 2, 'BookWorld', '2024-06-11', 150.00),
      (4, 4, 'AutoFix', '2025-01-03', 250.00),
      (5, 5, 'BookWorld', '2024-09-30', 100.00),
      (6, 5, 'BookWorld', '2025-01-27', 300.00);
    
    /* LEFT OUTER JOIN with no filtering */
    SELECT
      C.customer_name,
      P.store_name,
      P.purchase_amount,
      P.purchase_date
    FROM
      Customers C
      LEFT OUTER JOIN Purchases P ON C.customer_id = P.customer_id;
    
    /* LEFT OUTER JOIN with WHERE filter */
    SELECT
      C.customer_name,
      P.store_name,
      P.purchase_amount,
      P.purchase_date
    FROM
      Customers C
      LEFT OUTER JOIN Purchases P ON C.customer_id = P.customer_id
    WHERE
      P.store_name = 'BookWorld';
    
    /* INNER JOIN with no filtering */
    SELECT
      C.customer_name,
      P.store_name,
      P.purchase_amount,
      P.purchase_date
    FROM
      Customers C
      INNER JOIN Purchases P ON C.customer_id = P.customer_id;
    
    /* INNER JOIN with WHERE filter */
    SELECT
      C.customer_name,
      P.store_name,
      P.purchase_amount,
      P.purchase_date
    FROM
      Customers C
      INNER JOIN Purchases P ON C.customer_id = P.customer_id
    WHERE
      P.store_name = 'BookWorld';
    
    /* Cleanup */
    DROP TABLE dbo.Purchases;
    
    DROP TABLE dbo.Customers;

I’m Shane, and I work with data. I’m passionate about solving problems and uncovering insights through data. Here, I share my insights in SQL Server development and administration, graph databases, and more—helping you unlock the untapped value from your tables and relationships.

Contact me

Post categories

Search posts