10 SQL Performance Optimization Tricks Every Data Professional Should Know

Oct 02, 2025

Article voiceover

0:00

-4:17

We often blame the database engine for being slow. The real problem lies in how we write our SQL queries. and transform a query from taking minutes to completing in seconds.

Tip #1: A fundamental principle is to retrieve only the data that is necessary.

Do not use SELECT * without using the LIMIT clause.
Filter before joining. If you’re joining two million-row tables but only need records from the last month, filter each table first.

SELECT 
  c.customer_id,
  o.order_id,
  o.order_date
FROM 
  customers c
JOIN (
    SELECT * 
    FROM orders
    WHERE order_date >= ‘2025-09-01’
      ) o 
    ON c.customer_id = o.customer_id;

Aggregate before joining. If you need totals by customer, calculate those totals first, then join the smaller aggregated result.

-- Step 1: aggregate orders
WITH order_totals AS (
    SELECT customer_id, SUM(amount) AS total_spent
    FROM orders
    GROUP BY customer_id
)
-- Step 2: join the smaller result
SELECT c.customer_id, c.name, ot.total_spent
FROM customers c
JOIN order_totals ot 
    ON c.customer_id = ot.customer_id;

Tip #2: Use wildcards only at the end of a phrase.

Use (LIKE ‘text%’) format in a WHERE clause. Placing wildcards in the beginning (LIKE ‘%text’) prevents the database from using an index and can force a slow table scan.

Tip #3 Anti-joins (LEFT JOIN … IS NULL) often outperform NOT IN and NOT EXISTS:

SELECT
  c.id,
  c.name,
  c.date_joined
FROM 
  customers c
LEFT JOIN 
  orders o 
  ON c.customer_id = o.customer_id
WHERE 
  o.customer_id IS NULL;

Tip #4: DISTINCT often hides bad joins. Instead of removing duplicates, fix the query logic to avoid creating them.

Instead of

SELECT DISTINCT 
    c.customer_id,
    c.customer_name,
    c.email
FROM 
  customers c
JOIN 
  orders o ON c.customer_id = o.customer_id
WHERE 
  o.order_date >= ‘2024-01-01’;

write this:

SELECT 
    c.customer_id,
    c.customer_name,
    c.email
FROM 
  customers c
WHERE EXISTS (
    SELECT 1 
    FROM orders o 
    WHERE o.customer_id = c.customer_id 
    AND o.order_date >= ‘2024-01-01’
);

Tip #5: Window functions are incredibly powerful but can be resource-intensive. They are versatile but require skill to use effectively.

Instead of

SELECT 
    order_id,
    order_date,
    amount,
    SUM(amount) OVER (
        ORDER BY order_date 
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) as running_total
FROM 
  orders
WHERE 
  customer_id = 12345;

write this:

SELECT 
    order_id,
    order_date,
    amount,
    SUM(amount) OVER (ORDER BY order_date) as running_total
FROM 
  orders
WHERE 
  customer_id = 12345;

Tip #6: Avoid functions on indexed columns in WHERE clauses.

Instead of

SELECT
  customer_id
FROM
  users
WHERE 
  YEAR(created_at) = 2025;

write this:

SELECT
  customer_id
FROM
  users
WHERE 
  created_at >= ‘2025-01-01’
  AND created_at < ‘2026-01-01’;

Tip #7 Use EXISTS instead of IN.

Instead of

SELECT 
  customer_id,
  name
FROM 
  customers
WHERE 
  customer_id IN 
(
    SELECT customer_id
    FROM orders
    WHERE order_date >= ‘2025-01-01’
);

write this:

SELECT
  customer_id,
  name
FROM 
  customers c
WHERE EXISTS 
  (
    SELECT 1
    FROM orders o
    WHERE o.customer_id = c.customer_id
      AND o.order_date >= ‘2025-01-01’
  );

Tip #8 Rewrite subqueries as JOIN.

Instead of

SELECT 
    c.customer_id,
    c.name,
    (SELECT SUM(o.amount)
     FROM orders o
     WHERE o.customer_id = c.customer_id) AS total_spent
FROM 
  customers c;

use

SELECT 
    c.customer_id,
    c.name,
    SUM(o.amount) AS total_spent
FROM customers c
LEFT JOIN orders o 
    ON o.customer_id = c.customer_id
GROUP BY c.customer_id, c.name;

Tip #9 Indexing:

If your queries often filter on expressions, consider indexing the expression itself:

Date extractions: EXTRACT(YEAR FROM order_date)
String operations: UPPER(lastname), TRIM(phone)
JSON path expressions: (data->>’status’)
Calculated fields: (price * quantity)

Index smartly: Add indexes on columns you filter or join on often, but don’t overdo it. These are the shortcuts, great for speed, but they take up space.

Include frequently accessed columns in the index to avoid table lookups.

Share BIWave’s Substack

Tip #10 Find resource-hungry processes in execution plan.

To dive deeper into optimization, it is a good idea to start analyzing the Execution Plan.

Here’s how to view it in different RDBMS:

| **RDBMS**     | **How to See Execution Plan**                                  
|---------------|--------------------------------------------------------------------------------------------------------
| **MySQL**     | `EXPLAIN` or `EXPLAIN ANALYZE SELECT ...`                      
| **PostgreSQL**| `EXPLAIN` or `EXPLAIN (ANALYZE, BUFFERS) SELECT ...`           
| **SQL Server**| SSMS: *Ctrl+L* (estimated), *Ctrl+M* (actual); or `SET SHOWPLAN_ALL ON` / `SET STATISTICS PROFILE ON` 
| **BigQuery**  | Run query → *Execution Details* → *Query Plan* (UI)             
| **Oracle**    | `EXPLAIN PLAN FOR SELECT ...` then `SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY());`

What to look for:

Resource-hungry nodes : Table scans, index scans, nested loop joins, key lookups, and sorts often dominate cost.
Bottlenecks : If one operation takes 70–80% of the plan’s cost, optimizing that node yields the biggest impact.

Manual tweaks (short-term fixes):

Sometimes you can guide the optimizer when it makes a poor choice:

FORCESEEK: Force an index seek instead of a table scan.
Join hints: HASH JOIN, MERGE JOIN, LOOP JOIN , depending on table size and indexes.
Query hints: MAXDOP, RECOMPILE, OPTIMIZE FOR to adjust execution behavior.
WITH INDEX: Explicitly choose an index when the optimizer picks the wrong one.

The core issue is in Optimizer and Statistics:

If the optimizer chooses poorly, it’s often because statistics are outdated or missing. Statistics tell the optimizer:

How many rows are in a table or index (row count).
Value distribution in a column (histogram).
Distinct values, density, and min/max ranges.

When statistics are outdated (due to inserts, updates, deletes, or bulk loads), the optimizer is doing wrong row counts → wrong join types, wrong indexes, and bad plans.

Fixing the Root Cause

Keep statistics up to date:

SQL Server → UPDATE STATISTICS table_name;
PostgreSQL → ANALYZE table_name;
MySQL → ANALYZE TABLE table_name;
Oracle → EXEC DBMS_STATS.GATHER_TABLE_STATS(’schema’,’table’);

Want more content like this? Hit like and subscribe for more stories and guides. Your database will thank you!

BIWave Substack

Discussion about this post

Ready for more?