Have you ever wondered about the differences between a subquery and a common table expression (CTE) in SQL? The concepts seem very similar, but knowing the difference – and when to use each one – will help you write efficient and readable queries.
First, we’ll explain the concepts underlying subqueries and CTEs. Then we’ll look at a few examples and finally analyze the pros and cons of each technique.
What Is a Subquery?
A subquery is a query within a query. We can use it in multiple ways: in the FROM
clause, for filtering, or even as a column. To use a subquery, simply add parentheses and put the query inside them.
In our first examples, we’ll work with data on the results of a long jump competition. We’re given two tables:
participant
– Stores information about contest participants:
id
– A unique ID for each participant.first_name
– The participant’s first name.last_name
– The participant’s last name.
jump
– Stores information about the jumps made by the participants:
id
– The ID of each jump.participant_id
– The ID of the participant who made the jump.contest_id
– The ID of the contest in which the jump was made.length
– Jump length, in centimeters.
participant
id | first_name | last_name |
---|---|---|
1 | Amisha | Brown |
2 | Jamaal | Sanford |
3 | Hiba | Cameron |
jump
id | participant_id | contest_id | length |
---|---|---|---|
1 | 1 | 1 | 667 |
2 | 1 | 2 | 745 |
3 | 1 | 3 | 723 |
4 | 2 | 1 | 736 |
5 | 2 | 2 | 669 |
6 | 2 | 3 | 508 |
7 | 3 | 1 | 664 |
8 | 3 | 2 | 502 |
9 | 3 | 3 | 739 |
Since you know the data we’re using, take a look at the following subquery examples:
SELECT first_name, last_name, lengthFROM participantJOIN jump ON jump.participant_id = participant.idWHERE length > ( SELECT AVG(length) FROM jump);
This query shows participants with their jumps that were longer than the average jump length. In the WHERE condition, we’ve used a subquery to get the average jump length. Since there’s only one value returned by this subquery, we can easily compare column data to it.
Next, another example:
SELECT MAX(avg_length) AS max_lengthFROM ( SELECT first_name, last_name, AVG(length) AS avg_length FROM participant JOIN jump ON jump.participant_id = participant.id GROUP BY first_name, last_name) AS average_lengths;
In this query, we’re showing the largest average jump length by any participant. To get the result, we first compute the average jump length for each participant. We’ve done this by using a subquery inside the FROM
clause. Then, we simply use MAX()
to return the largest average length.
These are only two examples of subqueries. It’s a broad topic – even though the use cases are rather simple – and there are too many concepts to be described in this article. A short overview of subqueries can be found in the article SQL Subqueries on the LearnSQL.com blog. If you’re interested in working with subqueries, check out part 6 of SQL Basics course (it's even called Subqueries).
You can also watch episodes of our We Learn SQL series on Youtube. Several of them have been dedicated to SQL Subqueries. Remember to subscribe to our channel.
What Is CTE?
A common table expression (called CTE for short) is a query which we create before writing the main query. Then, we can simply use it as a regular table inside our code.
Look at the following example. Once again, we’re using the data from the long jump contest:
WITH average_lengths AS ( SELECT first_name, last_name, AVG(length) AS avg_length FROM participant JOIN jump ON jump.participant_id = participant.id GROUP BY first_name, last_name)SELECT MAX(avg_length) AS max_lengthFROM average_lengths;
This returns exactly the same result as the previous example with the subquery: the largest average jump length by any participant. However, instead of writing our query inside the FROM
clause, we’ve put it in the WITH
clause (which comes before the main query). We’ve called our CTE average_length
and used it in the FROM
clause in the main query.
Of course, CTEs can be far more complicated than this example. But we won’t discuss that here. If you’d like to learn about CTEs, check out the Recursive Queries course at LearnSQL.com.
No Difference at All...?
At first, you might think that there’s almost no difference between subqueries and CTEs. We’ve used both a subquery and a CTE in the FROM
clause and the syntax was only a little different. However, don’t forget the first example – we used a subquery in the WHERE
clause there. You couldn’t use a CTE there, and that’s not the only difference!
Subquery vs CTE: What’s the Difference?
Of course, this doesn’t mean that CTEs are inferior to subqueries. Let’s examine the differences between the two, starting with CTEs.
Difference #1: CTEs can be recursive
Let’s take a look at the first advantage of CTEs. CTEs allow you to use a powerful concept: recursion. Thanks to recursion, SQL is now Turing complete – every program which can be written in any programming language can also be written in SQL. (If you have doubts that SQL is a programming language, check out Is SQL a programming language? on the LearnSQL.com blog.)
How does the recursion work in SQL? It allows your CTE to call itself until a specified ending condition is met. In each step, the query expands itself and changes the data it has. Let’s see an example.
We’ll work with the following data for some fictional company employees. It’s stored in the employee
table, which contains the following columns:
id
– A unique ID for each employee.first_name
– The employee’s first name.last_name
– The employee’s last name.manager_id
– The ID of that employee’s manager.
employee
id | first_name | last_name | manager_id |
---|---|---|---|
1 | Maisy | Bloom | NULL |
2 | Caine | Farrow | 1 |
3 | Waqar | Jarvis | 2 |
4 | Lacey-Mai | Rahman | 2 |
5 | Merryn | French | 3 |
Now, we’d like to show the chain of management from the CEO (a person without a value in the manager_id
column) to each employee. The query below will solve this problem. Take a look:
WITH RECURSIVE employee_chain AS ( SELECT id, first_name, last_name, first_name || ' ' || last_name AS chain FROM employee WHERE manager_id IS NULL UNION ALL SELECT employee.id, employee.first_name, employee.last_name, chain || '->' || employee.first_name || ' ' || employee.last_name FROM employee_chain JOIN employee ON employee.manager_id = employee_chain.id)SELECT first_name, last_name, chainFROM employee_chain;
The result will look like this:
first_name | last_name | chain |
---|---|---|
Maisy | Bloom | Maisy Bloom |
Caine | Farrow | Maisy Bloom->Caine Farrow |
Waqar | Jarvis | Maisy Bloom->Caine Farrow->Waqar Jarvis |
Lacey-Mai | Rahman | Maisy Bloom->Caine Farrow->Lacey-Mai Rahman |
Merryn | French | Maisy Bloom->Caine Farrow->Waqar Jarvis->Merryn French |
We have written a query that can easily create a whole chain of relations. You might think that this could be achieved with subqueries, but as the management chain goes deeper and deeper, you’d have to write more and more code. The amount of code you’d have to write would depend on the depth of the chain – and that can only be checked with a recursive CTE.
How does this query work? It starts by running the first part (before the UNION ALL
) and selects an employee without a manager (i.e. Maisy Bloom). Then the part beneath the UNION ALL selects the employee(s) directly managed by Maisy (Caine Farrow). Since the query is calling itself, it then runs the same part again and selects all the employees managed by Caine (Waqar Jarvis and Lacey-Mai Rahman). It repeats this operation as long as it has rows to join. After traversing the whole management chain, the query halts.
If this is your first encounter with recursion in SQL, it might be a little hard to understand. And that’s totally normal. Check out Do it in SQL: Recursive SQL Tree Traversal for a more detailed explanation.
Difference #2: CTEs are reusable
A huge advantage of CTEs is that they can be used multiple times in a query. You don’t have to copy the whole CTE code – you simply put the CTE name.
Using the data from the previous section, we’d like to 1) filter out the employees who don’t have a manager and then 2) show each employee with their manager – but only if they have a manager. The result will look like this:
first_name | last_name | first_name | last_name |
---|---|---|---|
Waqar | Jarvis | Caine | Farrow |
Lacey-Mai | Rahman | Caine | Farrow |
Merryn | French | Waqar | Jarvis |
Now, let’s see how a CTE would solve this task:
WITH not_null_manager AS ( SELECT * FROM employee WHERE manager_id IS NOT NULL)SELECT nnm1.first_name, nnm1.last_name, nnm2.first_name, nnm2.last_nameFROM not_null_manager AS nnm1JOIN not_null_manager AS nnm2 ON nnm1.manager_id = nnm2.id;
Now let’s see how a subquery would achieve the same result:
SELECT nnm1.first_name, nnm1.last_name, nnm2.first_name, nnm2.last_nameFROM ( SELECT * FROM employee WHERE manager_id IS NOT NULL) AS nnm1JOIN ( SELECT * FROM employee WHERE manager_id IS NOT NULL) AS nnm2 ON nnm1.manager_id = nnm2.id;
As you can see, the CTE query has less code. It’s more readable, too: you simply repeat the CTE name (not_null_manager
) instead of a whole chunk of code.
There’s not really much difference in the performance efficiency between these two queries. Even though you only declare the CTE once, the execution time is almost the same.
Difference #3: CTEs can be more readable
So, you know that you can write less code using CTEs. What about code organization? Here’s another example that focuses on the FROM
clause.
Do you still remember the first examples? The ones where returning the largest average jump length? If not, here’s a quick review.
This one uses a subquery:
SELECT MAX(avg_length) AS max_lengthFROM ( SELECT first_name, last_name, AVG(length) AS avg_length FROM participant JOIN jump ON jump.participant_id = participant.id GROUP BY first_name, last_name) AS average_lengths;
And this one uses a CTE:
WITH average_lengths AS ( SELECT first_name, last_name, AVG(length) AS avg_length FROM participant JOIN jump ON jump.participant_id = participant.id GROUP BY first_name, last_name)SELECT MAX(avg_length) AS max_lengthFROM average_lengths;
Which one do you think is more readable? I’d say the second one (CTE) is a bit better. The code is less nested and there’s a small layer of abstraction. You know at first glance that this query selects the maximal average length.
CTEs’ readability is shown even more in the examples from the previous section. And when your queries grow bigger and bigger, using CTEs will significantly increase the readability of your code. And readability is key to the efficient development of code.
If you want to learn more about using CTEs to improve the readability of your code, you should definitely check out the article How to Organize SQL Queries with CTEs.
But Subqueries Are Sometimes Irreplaceable
So far, you’ve learnt quite a few differences between subqueries and CTEs. To be honest, I was trying to convince you that CTEs are much better than subqueries. But in this section, you’ll learn why subqueries can be invaluable.
Filtering with a subquery
This article’s first example used a subquery in the WHERE clause. I didn’t show a similar example in the CTE section. That’s because only subqueries can be used in the WHERE clause!
What’s more, there are quite a few keywords you can use in the WHERE condition – e.g. ALL
, ANY
, EXISTS
, and a few more! Unfortunately, I can’t explain them here; it’d take too much time. Instead, I recommend checking out the Subqueries section of our SQL Basics course. You’ll not only learn about these keywords, you’ll solve a few problems using them! Or check out the article SQL Subqueries in our blog if you want a brief explanation.
Subqueries can act like columns
You can also use subqueries as you would a column. The only constraint is that the subquery must return only one value. Take a look:
SELECT DISTINCT contest_id, ( SELECT COUNT(length) FROM jump AS inside_jump WHERE inside_jump.contest_id = outside_jump.contest_id AND inside_jump.length > 600 ) AS longer_jumps, ( SELECT COUNT(length) FROM jump AS inside_jump WHERE inside_jump.contest_id = outside_jump.contest_id AND inside_jump.length <= 600 ) AS shorter_jumpsFROM jump AS outside_jump;
For each contest, this query returns the number of jumps longer than 600cm (computed in the first subquery) and the number of jumps shorter than or equal to 600cm (computed in the second subquery). The result will look like this:
contest_id | longer_jumps | shorter_jumps |
---|---|---|
1 | 3 | 0 |
2 | 2 | 1 |
3 | 2 | 1 |
Correlated subqueries
Look at the above example again. Did you notice that I used a reference to a table in the outer query inside the subquery? I even referred to the value of the current row from that table. This is called a “correlated subquery”. It allows you to use values from the outer query inside the subquery.
This is a very handy technique, but it’s also quite complicated; we won’t explain it in this article. However, feel free to check out Correlated Subquery in SQL: A Beginner’s Guide in our blog for an explanation.
One More Difference: CTEs Must Be Named
The last difference between CTEs and subqueries is in the naming. CTEs must always have a name. On the other hand, in most database engines, subqueries don’t require any name (the only exception is the FROM
clause in my favorite database engine, PostgreSQL).
It’s a good practice to name subqueries placed in the FROM
or SELECT
clauses, but it’s not a requirement. And, to be precise, you can’t name the subqueries you use in the WHERE
clause.
You might think that naming is not a big difference and won’t affect you much. However, you may have to quickly check something in the database. In this case, the easier syntax might be your choice. Even if the query is less readable, keep this in mind – such queries are rarely read after they’re used.
Subqueries vs CTEs – Which Is Better?
You’ve learnt a lot about the differences between CTEs and subqueries. So, then, which is better? The answer is neither, either, or it depends – both subqueries and CTEs have pros and cons. Each query should be analyzed and the choice between these two decided on a case-by-case basis. But to do this, you’ll have to thoroughly learn both concepts.
To learn more about subqueries, I can suggest the Subqueries part of LearnSQL’s SQL Basics course. If you’d like to learn more about CTEs, the Recursive Queries course is your best bet. These courses will help you quickly learn these concepts. Thus, you’ll be able to decide which queries benefit from CTEs and which call for subqueries.
However, if you’re already somewhat familiar with correlated subqueries and don’t want to learn the same thing once again, you might also want to sharpen your skills in our SQL Practice Set course.