A simple google search will yield all sorts of detailed guides on SQL join types. Many of these guides are overwhelming with detail or assume the reader has advanced familiarity with SQL. For those of you who are fairly new to SQL, here is a brief introductory guide on SQL basics and join types to get you oriented.
(Note: Please share this with anyone who might find this helpful!)
SQL Basics: Query Anatomy
Before defining the various join types, we need to understand the anatomy of a SQL query.
A SQL query refers to a block of code that defines what data you’d like to pull from your database. For data analysts, this is generally the start of any analysis. Here is a basic SQL query breakdown:
SELECT: Defines columns/fields to pull
FROM: Specifies table(s) to pull from
WHERE: Filters data on a condition (or conditions)
GROUP BY: Group rows based on one or more columns
ORDER BY: Define sort order
LIMIT: Limit the total number of records returned
Explanation: In this query, we are selecting all columns (SELECT *) from the customer_data table (FROM `customer_data`) for records that have a complete status (WHERE status = ‘complete’) and limiting records to 10,000 records (LIMIT 10000).
Joins
So, what is a join and where in the query does it belong?
Joins can be used in different portions of your query depending on your query structure and the SQL language you are using, but for now, we will focus on the basics.
Within a database, there are typically a collection of tables to pull from. For example, you may have a general customer table with customer_ids, customer_names, etc. You may have another table for orders that include customer_ids, order_ids, product_ids, quantities, order_dates, etc.
A join allows you to merge two or more tables’ data together based on join keys (a shared field that links the tables together).
The order it is written in within your query dictates which table is considered the ‘left’ table and the ‘right’ table. Joins are considered as part of the FROM portion of a SQL query and are executed from top to bottom.
— — — — — — — — — — — — — –
SELECT *
FROM
customers c ← LEFT TABLE
LEFT JOIN ← JOIN TYPE
orders o ← RIGHT TABLE
ON
c.customer_id = o.customer_id ← JOIN KEYS
— — — — — — — — — — — — — –
This query joins the orders table to the customers table based on the customer_id field that belongs to each table independently.
Here is a cheat sheet on join types:
INNER: Only includes records where join key overlaps
FULL OUTER: Includes all records from both tables
LEFT: Includes all records from LEFT table, and merges data from the right table where the right join key matches the left join key
RIGHT: Includes all records from RIGHT table, and merges data from the left table where the left join key matches the right join key
The diagram below is a visual representation of what data are being pulled based on the various join types with SQL query examples.