A JOIN
clause can be added to statements to combine results from multiple tables based on a relationship between them.
This example uses three tables: users
, donations
, and charities
. Notice the donations table has a user_id
column to indicate which user made that donation.
When you want to join multiple tables together, specify the names of all the columns to retrieve from both tables in your SELECT
. Prefixing the columns with table names is not required unless multiple tables have the same column name, but it's good practice to be able to easily see which columns come from which table.
An INNER JOIN
means that the query will only return rows that have a match in both tables. Notice how Priya has not made a donation yet so she is not included in the results.
There is no functional difference between a JOIN
and an INNER JOIN
; it's just syntactic sugar.
The ON
clause is where you specify the relationship between the tables that you want them to be joined on. The data type of these columns needs to match.
SELECT users.name_first,
users.email_address,
donations.amount
FROM users
INNER JOIN donations
ON users.id = donations.user_id;
You can alias the tables and column names to make your statements easier to read. Either use an AS
statement followed by the alias, or just the alias. Both versions are valid but choosing one and being consistent makes queries easier to read.
Aliasing is especially useful for long table names or when two tables have the same column name. A column name you will encounter very often is id
.
SELECT u.name_first name,
u.email_address email,
d.amount
FROM users u
INNER JOIN donations d
ON u.id = d.user_id;
INNER JOIN ... ON
can be repeated multiple times to join more than two tables together.SELECT users.email_address user,
donations.amount,
charities.name charity
FROM users
INNER JOIN donations
ON users.id = donations.user_id
INNER JOIN charities
ON donations.charity_id = charities.id;
A JOIN
can be combined with a WHERE
clause to further narrow down the set of rows returned.
Note that ON
and WHERE
seem similar but they serve different purposes.
ON
is part of the JOIN
operation and determines how rows are matched between tables. WHERE
is applied after the JOIN
and filters records out based on a condition.
ON
and WHERE
effectively behave the same during an INNER JOIN
but other types of joins seen below demonstrate how they are not interchangeable.
SELECT users.name_first,
users.email_address,
donations.amount
FROM users
INNER JOIN donations
ON users.id = donations.user_id
WHERE donations.amount >= 20.0;
A LEFT JOIN
(sometimes called LEFT OUTER JOIN
) can be used when you want to retrieve all of the rows from the first ("left") table combined with only rows that match from the second ("right") table. Rows from the left table that do not match anything in the right table will return NULL
in place of a donation amount.
This type of JOIN
is useful when you are not sure if there will be a match for all the rows in your first table but you want to include them in your result.
In this example we can now see in the results who has and who has not donated.
A RIGHT JOIN
is similar but it returns all of the rows from the second ("right") table and only rows that match from the first ("left") table. It is rarely used in practice but may be useful when experimenting with queries.
SELECT users.name_first,
users.email_address,
donations.amount
FROM users
LEFT JOIN donations
ON users.id = donations.user_id;
If we wanted to find only the people who have not donated, we can use a WHERE
clause to find the NULL
values from our LEFT JOIN
. This is sometimes also referred to as an "anti-JOIN".
Also recall how the ON
and WHERE
clauses serve different purposes. Try removing the WHERE
clause and make the condition part of the ON
clause using the AND
operator.
Notice how the result set is completely different. Now both conditions effectively cancel each other out because the join can't satisfy both constraints at the same time: a donation matching a user and also not having an associated user.
SELECT users.name_first,
users.email_address,
donations.amount
FROM users
LEFT JOIN donations
ON users.id = donations.user_id
WHERE donations.user_id IS NULL;
A FULL JOIN
(sometimes called FULL OUTER JOIN
) is used when you want to return all rows from both tables whether they match or not.
This query returns users that have donated, users that have not donated, and donations that are not currently associated with a user. The last case is commonly referred to as an "orphaned record", meaning the associated user was deleted at some point without also deleting their donation records.
A set of tables that allows orphaned records is generally considered bad practice in database design. Examples of how to prevent this can be found in CASCADE
.
SELECT users.name_first,
users.email_address,
donations.amount
FROM users
FULL JOIN donations
ON users.id = donations.user_id;
A table can be referred to itself with a JOIN
, also known as a "self" JOIN
.
They are useful when a table has data that has a relationship with other data in the same table. In this example, users of the donation platform can refer other users and get a free credit.
This technique requires you to alias the table with two different names.
SELECT
u1.email_address AS referrer_email,
u2.email_address AS referred_email
FROM users u1
INNER JOIN users u2
ON u1.id = u2.referrer_id;
A CROSS JOIN
returns the Cartesian product of rows from both tables. This means each row from the first table is combined with each row from the second table, returning all possible combinations of rows.
This can generate very large amounts of data depending on the size of the input tables so caution is advised when deciding to use a CROSS JOIN
.
This can be used to generate test data since it allows you to generate every possible scenario between two sets of data.
SELECT users.name_first,
users.email_address,
donations.amount
FROM users
CROSS JOIN donations;