How to Query Billions of Rows in Milliseconds Using SQL
Find out when to use materialized views, how to refresh them, and which mistakes to avoid.
Many data systems slow down when processing billions of rows.
You write a count query with a group by clause, and you have to wait minutes for a result, and this delay occurs because the database scans every row to calculate a sum or count. High-performance hardware cannot solve the problem. But there is a better way to perform expensive calculations.
Use materialized views to store precomputed results on your disk.
Core Concepts:
There are two types of Views Storage in SQL:
A standard (virtual) view is a saved query. Every time you access a standard view, the database runs the underlying query for the source tables and it does not save data.
A materialized view saves both the query and the data and it occupies storage space. You trade disk space for query speed gains. If your source tables update, they make the data in your materialized view outdated.
Standard View:
You request data from the view.
Database identifies the query.
The database executes the query against the source tables.
Database returns results.
Materialized View:
You request data from the view.
Database reads the precomputed result from disk.
Database returns results.
Materialized Views SQL Dialect Support:
Oracle, PostgreSQL, BigQuery, Snowflake have a native support of materialized views.
SQL Server does not support materialized views. It uses Indexed Views instead.
MySQL: No native materialized views.
When to use Materialized Views?
Materialized views are not the best for every situation. Use them when query speed is your top priority. Make sure you are able to work with data that is not real-time.
Use cases:
BI Dashboards: If your tool runs the same aggregate queries over and over again, a materialized view will deliver the best results.
Complex Joins: For queries when joining five or more large tables, the materialized view performs the work once.
Data Warehousing: Materialized views do the heavy calculations in advance by pre-aggregating data for the final reporting layer.
Refresh Strategy:
Your refresh strategy choice depends on data freshness requirements and what your database supports. Not all databases offer all methods.
Different databases do that in different ways.
1. Complete Refresh:
The simplest method deletes all data in the view and re-runs the entire query. It takes the most time and resources. Use it for small datasets or daily updates.
In PostgreSQL, use this command:
-- refresh the data
REFRESH MATERIALIZED VIEW
mv_order_summary;If you need the view to remain available during the refresh, use the CONCURRENTLY keyword. You can query the old data while the database builds the new snapshot.
-- Refresh without locking the view for readers
CREATE UNIQUE INDEX
idx_mv_summary
ON
mv_order_summary (color, shape);
REFRESH MATERIALIZED VIEW CONCURRENTLY
mv_order_summary;2. Fast Refresh or Incremental Refresh:
The database uses materialized view logs to track inserts, updates, and deletes on the base table, then applies only the changed rows to the materialized view. It runs faster than a complete refresh.
Oracle syntax:
-- Create a log to track changes on the base table
CREATE MATERIALIZED VIEW LOG ON
sales_data
WITH
ROWID, SEQUENCE (color, shape, price)
INCLUDING NEW VALUES;-- Create the view with fast refresh enabled
CREATE MATERIALIZED VIEW
mv_fast_summary
REFRESH FAST ON DEMAND
AS
SELECT
color,
shape,
COUNT(*) AS cnt,
SUM(price) AS revenue
FROM
sales_data
GROUP BY
color, shape;3. Real-time Refresh:
Some systems provide a hybrid method. When you query a stale view, the database applies the latest changes from the log during execution and returns current data without performing a full refresh.
Query Rewrite:
Some databases like Oracle and BigQuery have feature called Query Rewrite. The optimizer can detect when a query against a large table matches the existing materialized view. The database redirects the query to the materialized view.
Example:
SELECT
color,
COUNT(*)
FROM
sales_data
GROUP BY
color;The database check that mv_order_summary already contains these counts. It rewrites the query to:
SELECT
color,
SUM(brick_count)
FROM
mv_order_summary
GROUP BY
color;And it takes milliseconds.
Implementation:
First, identify your most expensive queries: joins of multiple large tables or heavy calculations.
For example, a sales table contains seven billion rows, and you want to count orders by color and shape.
Create a materialized view. The database object stores the results of your query physically, like a snapshot of your data at a point in time.
Create a materialized view in PostgreSQL:
-- Create a materialized view for order summaries
CREATE MATERIALIZED VIEW
mv_order_summary AS
SELECT
color,
shape,
COUNT(*) AS brick_count,
SUM(price) AS total_revenue
FROM
sales_data
GROUP BY
color, shape;Once you create the view, the database executes the query once. It saves the result. When you query the view, the database avoids scanning seven billion rows and reads the few rows stored in the materialized view.
Step-by-Step Implementation Guide:
Audit: Run a query to find the slowest reports in your system.
Analyze: Check if these queries aggregate data.
Test: Create a materialized view in a development environment.
Refresh: Set up a manual refresh and time the execution.
Compare: Run the original query against the base table and then against the materialized view. Record the time difference.
Automate: Use a cron job or a database scheduler to run the refresh command.
Monitor: Check the refresh success logs daily.
Relationship and Index Optimization:
A materialized view is like a table. You can create indexes on it to speed up queries further.
-- Create an index on the materialized view
CREATE INDEX
idx_mv_color
ON
mv_order_summary (color);Performance Checklist:
Create indexes on columns used in WHERE clauses.
Use partitioned materialized views for large datasets.
Cluster the data based on your most frequent join keys.
Monitor the size of your materialized view logs.
Storage:
When you create a materialized view, the database allocates physical storage blocks for its result set. For a standard table, these blocks are modified by INSERT/UPDATE/DELETE operations.
For a materialized view, blocks are rewritten during refresh operations (complete or incremental) rather than on every DML to the base table.
During a complete refresh, the database creates a temporary table. It populates a table with the new results. Once the process finishes, the database replaces the old table with the new one.
In incremental refreshes, the database reads the “Materialized View Log”, a separate table that contains every change made to the source table since the last refresh. The database applies these changes to the materialized view.
Dependency Chains:
When you build views on top of other views, you create a dependency chain. If you have two views, X and Y, where Y depends on X, you must refresh X first.
Data engineers use Directed Acyclic Graphs, or DAGs, to track these dependencies.
Airflow or dbt can help you schedule these refreshes in the correct order and your final reports avoid showing out-of-date information.
Here are the steps to optimize your pipeline:
Identify slow queries with high row counts.
Check if the business requirements allow some latency.
Create the materialized view with a fast refresh log if supported.
Add indexes to the view for filter columns.
Schedule the refresh based on your data update frequency.
Example: BigQuery Materialized Views:
BigQuery works with materialized views differently. It provides automatic background refreshes, as you define the view and the system does the job.
-- BigQuery Materialized View syntax
CREATE MATERIALIZED VIEW project.dataset.mv_sales_summary
OPTIONS (
enable_refresh = true,
refresh_interval_minutes = 60
)
AS
SELECT
product_id,
SUM(amount) as total_amount,
COUNT(*) as transaction_count
FROM
project.dataset.sales
GROUP BY
product_id;BigQuery can use a materialized view even when the query is not an exact match, applying predicates on top of the precomputed data, but the degree of rewrite depends on the conditions and the view structure.
Oracle:
Oracle imposes strict rules for fast refresh.
Fast refresh for aggregate views requires inclusion of COUNT(*) and other aggregate components (like SUM) so that Oracle can recompute grouped aggregates from deltas. If you want to support fast refresh for the average, you should store both SUM and COUNT aggregations.
The database uses these to calculate the new average when rows arrive.
Example for Averages:
CREATE MATERIALIZED VIEW mv_avg_price
REFRESH FAST ON COMMIT
AS
SELECT
category_id,
SUM(price) as total_sum,
COUNT(price) as total_count,
COUNT(*) as total_rows
FROM
products
GROUP BY
category_id;If a new product arrives with a price of 10, the database adds 10 to the total_sum and increments the total_count.
This is valid only if the query and underlying objects meet Oracle’s fast‑refresh constraints. Otherwise it will not work or fall back to a complete refresh depending on ENGINE/FLAVOR.
Stale Data management:
Staleness means how fresh your data is, the gap between the source data and the materialized view.
If you refresh once an hour, your data is up to sixty minutes old.You should communicate this to your end users, as financial reports require real-time accuracy. In those cases, a materialized view might be a bad choice. Marketing dashboards sometimes accept data from the previous day.
Materialized views are perfect for those scenarios.
Performance Comparison:
You have a table with 100 million rows.
A standard query to find the sum of sales takes 15 seconds. The query uses 2 GB of RAM and 100% of one CPU core for the duration.
A materialized view contains 100 rows with categories that returns the same result in 50 milliseconds.
Advanced Indexing Strategies:
Since the materialized view is a physical table, you can apply advanced indexing techniques:
Bitmap Indexes:
In Oracle, use bitmap indexes on columns with low cardinality like “Region” or “Status.” These indexes are efficient for combining multiple filters.
Clustered Indexes:
In SQL Server, use clustered indexes to order the data in the view.
Partitioning:
If your materialized view grows large, partition it by date, and refresh only the latest partition instead of the whole view.
Common Mistakes:
Refreshing too often: If you refresh every minute on a table with high write volume, the overhead will slow down your source table.
Ignoring the logs: Materialized view logs grow forever if the refresh will not work. Monitor the size of your logs if you want to avoid running out of disk space.
Over-indexing: Too many indexes on a materialized view slow down the refresh process. Index only columns that users filter on.
Create views on real-time data: If the data changes every second, the refresh process will never finish.
Constraints and Limitations:
I thought materialized views would be a great way to get performance gains, but they bring a bunch of new challenges that I have to deal with.
1)Data Staleness and Inconsistency The biggest issue is that the data gets stale the second the base table changes. It creates the fast, or sometimes slow, an annoying loop. A query might run in seconds when it is fresh, but then it takes minutes or hours if the database decides the view is too old to use. It makes the whole experience feel out of control.
2) Refresh Overhead and Performance Impact Data update have a cost:
Write Latency: If I try to keep it real-time with “On Commit” refreshes, it adds extra processing to every transaction and slows down the write performance of the whole database.
Resource Consumption: A Full Refresh re-runs the entire query, which is a heavy load on the CPU and I/O for large datasets.
Lock: In systems like PostgreSQL, a standard refresh can lock the view so nobody can even read it until it is finished. It is another block in the process.
3) Restrictions for Fast Refresh
I want to use Incremental (Fast) Refreshes to avoid those full refreshes, but the limitations are too strict:
Query Complexity: If you use DISTINCT, certain joins, or functions like MIN/MAX, the view would not t fast-refresh at all.
Determinism: The query should be deterministic. Non‑deterministic expressions like RANDOM() or CURRENT_DATE/CURRENT_TIMESTAMP can prevent fast‑refresh eligibility and might run full refresh when the system attempts to incrementally update a view.
Extra Objects: You have to deal with Materialized View Logs to track changes and there is more stuff to maintain.
4) Maintenance and Storage:
Storage Space: These are physical copies, so they eat up disk space. This is a problem when you have massive datasets.
Schema Fragility: If I change a column or the structure of a base table, the view breaks. Then I have to go in and fix it on your own.
Dependency Chains: If one view is built on another, you get this chain where you have to manage the refresh order to avoid using stale data.
You might face challenges when implementing the method. Some SQL dialects have strict rules for fast refresh. You might not be able to use certain functions or joins in an incremental view.
In BigQuery, materialized views can reference standard native tables, and they cannot reference other materialized views or views as base sources.
The Cost of Fast Refresh:
Fast refresh is not free.
When you add a materialized view log on a table, every INSERT, UPDATE, or DELETE operation must write to two places: the source table and the log, which adds a small amount of latency to your write operations. If you have a high-speed ingestion pipeline, this overhead might become a problem.
Measure the impact on performance before deploying to production.
Materialized Lake Views in Microsoft Fabric:
In Microsoft Fabric, the concept is implemented as Materialized Lakehouse Views (or Materialized Lake Views), which store precomputed results in Delta/Parquet‑like storage and expose them using SQL.
They are useful for data engineers working in a lakehouse environment and provide a SQL-like interface on top of Spark tables.
Materialized Lakehouse Views simplify the process of building a silver or gold layer in your data lake.
The Lifecycle of a Materialized View:
A materialized view goes through several stages:
Definition: When you write the SQL and define the refresh properties.
Population: The database runs the initial query to fill the storage.
Staleness: Source data changes, and the view becomes out of sync.
Refresh: The database updates the view data.
Archival: You drop the view when the business requirements change.
Database speed depends on your strategy.
Materialized views are one of the most effective tools for a data engineer. They give you sub-second responses even when your data grows to billions of rows and the cost of storage is low.
Use materialized views to fill the gap between massive data volumes and fast user experience.
Subscribe for more SQL and Data guides.
What is your biggest challenge with slow SQL queries?
Share it in the comments below👇



Great breakdown of a complex topic. The implementation guide is especially helpful for understanding how to scale performance.