10 SQL Performance Optimization Tricks Every Data Professional Should Know
We often blame the database engine for being slow. The real problem lies in how we write our SQL queries. and transform a query from taking minutes to completing in seconds.
Tip #1: A fundamental principle is to retrieve only the data that is necessary.
Do not use
SELECT *without using theLIMITclause.Filter before joining. If you’re joining two million-row tables but only need records from the last month, filter each table first.
SELECT
c.customer_id,
o.order_id,
o.order_date
FROM
customers c
JOIN (
SELECT *
FROM orders
WHERE order_date >= ‘2025-09-01’
) o
ON c.customer_id = o.customer_id;Aggregate before joining. If you need totals by customer, calculate those totals first, then join the smaller aggregated result.
-- Step 1: aggregate orders
WITH order_totals AS (
SELECT customer_id, SUM(amount) AS total_spent
FROM orders
GROUP BY customer_id
)
-- Step 2: join the smaller result
SELECT c.customer_id, c.name, ot.total_spent
FROM customers c
JOIN order_totals ot
ON c.customer_id = ot.customer_id;Tip #2: Use wildcards only at the end of a phrase.
Use (LIKE ‘text%’) format in a WHERE clause. Placing wildcards in the beginning (LIKE ‘%text’) prevents the database from using an index and can force a slow table scan.
Tip #3 Anti-joins (LEFT JOIN … IS NULL) often outperform NOT IN and NOT EXISTS:
SELECT
c.id,
c.name,
c.date_joined
FROM
customers c
LEFT JOIN
orders o
ON c.customer_id = o.customer_id
WHERE
o.customer_id IS NULL;Tip #4:
DISTINCToften hides bad joins. Instead of removing duplicates, fix the query logic to avoid creating them.
Instead of
SELECT DISTINCT
c.customer_id,
c.customer_name,
c.email
FROM
customers c
JOIN
orders o ON c.customer_id = o.customer_id
WHERE
o.order_date >= ‘2024-01-01’;write this:
SELECT
c.customer_id,
c.customer_name,
c.email
FROM
customers c
WHERE EXISTS (
SELECT 1
FROM orders o
WHERE o.customer_id = c.customer_id
AND o.order_date >= ‘2024-01-01’
);Tip #5: Window functions are incredibly powerful but can be resource-intensive. They are versatile but require skill to use effectively.
Instead of
SELECT
order_id,
order_date,
amount,
SUM(amount) OVER (
ORDER BY order_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) as running_total
FROM
orders
WHERE
customer_id = 12345;write this:
SELECT
order_id,
order_date,
amount,
SUM(amount) OVER (ORDER BY order_date) as running_total
FROM
orders
WHERE
customer_id = 12345;Tip #6: Avoid functions on indexed columns in
WHEREclauses.
Instead of
SELECT
customer_id
FROM
users
WHERE
YEAR(created_at) = 2025;write this:
SELECT
customer_id
FROM
users
WHERE
created_at >= ‘2025-01-01’
AND created_at < ‘2026-01-01’;Tip #7 Use
EXISTSinstead ofIN.
Instead of
SELECT
customer_id,
name
FROM
customers
WHERE
customer_id IN
(
SELECT customer_id
FROM orders
WHERE order_date >= ‘2025-01-01’
);write this:
SELECT
customer_id,
name
FROM
customers c
WHERE EXISTS
(
SELECT 1
FROM orders o
WHERE o.customer_id = c.customer_id
AND o.order_date >= ‘2025-01-01’
);Tip #8 Rewrite subqueries as
JOIN.
Instead of
SELECT
c.customer_id,
c.name,
(SELECT SUM(o.amount)
FROM orders o
WHERE o.customer_id = c.customer_id) AS total_spent
FROM
customers c;use
SELECT
c.customer_id,
c.name,
SUM(o.amount) AS total_spent
FROM customers c
LEFT JOIN orders o
ON o.customer_id = c.customer_id
GROUP BY c.customer_id, c.name;Tip #9 Indexing:
If your queries often filter on expressions, consider indexing the expression itself:
Date extractions:
EXTRACT(YEAR FROM order_date)String operations:
UPPER(lastname),TRIM(phone)JSON path expressions:
(data->>’status’)Calculated fields:
(price * quantity)
Index smartly: Add indexes on columns you filter or join on often, but don’t overdo it. These are the shortcuts, great for speed, but they take up space.
Include frequently accessed columns in the index to avoid table lookups.
Tip #10 Find resource-hungry processes in execution plan.
To dive deeper into optimization, it is a good idea to start analyzing the Execution Plan.
Here’s how to view it in different RDBMS:
| **RDBMS** | **How to See Execution Plan**
|---------------|--------------------------------------------------------------------------------------------------------
| **MySQL** | `EXPLAIN` or `EXPLAIN ANALYZE SELECT ...`
| **PostgreSQL**| `EXPLAIN` or `EXPLAIN (ANALYZE, BUFFERS) SELECT ...`
| **SQL Server**| SSMS: *Ctrl+L* (estimated), *Ctrl+M* (actual); or `SET SHOWPLAN_ALL ON` / `SET STATISTICS PROFILE ON`
| **BigQuery** | Run query → *Execution Details* → *Query Plan* (UI)
| **Oracle** | `EXPLAIN PLAN FOR SELECT ...` then `SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY());`What to look for:
Resource-hungry nodes : Table scans, index scans, nested loop joins, key lookups, and sorts often dominate cost.
Bottlenecks : If one operation takes 70–80% of the plan’s cost, optimizing that node yields the biggest impact.
Manual tweaks (short-term fixes):
Sometimes you can guide the optimizer when it makes a poor choice:
FORCESEEK: Force an index seek instead of a table scan.
Join hints:
HASH JOIN,MERGE JOIN,LOOP JOIN, depending on table size and indexes.Query hints:
MAXDOP,RECOMPILE,OPTIMIZE FORto adjust execution behavior.WITH INDEX: Explicitly choose an index when the optimizer picks the wrong one.
The core issue is in Optimizer and Statistics:
If the optimizer chooses poorly, it’s often because statistics are outdated or missing. Statistics tell the optimizer:
How many rows are in a table or index (row count).
Value distribution in a column (histogram).
Distinct values, density, and min/max ranges.
When statistics are outdated (due to inserts, updates, deletes, or bulk loads), the optimizer is doing wrong row counts → wrong join types, wrong indexes, and bad plans.
Fixing the Root Cause
Keep statistics up to date:
SQL Server →
UPDATE STATISTICS table_name;PostgreSQL →
ANALYZE table_name;MySQL →
ANALYZE TABLE table_name;Oracle →
EXEC DBMS_STATS.GATHER_TABLE_STATS(’schema’,’table’);
Want more content like this? Hit like and subscribe for more stories and guides. Your database will thank you!

