Обсуждение: GROUP BY ROLLUP queries on views trigger full table scans (index usage not optimized)
Hello everyone,
Recently, after upgrading my database from PostgreSQL 16 to 18, I encountered an issue: when performing a GROUP BY ROLLUP on a view, the query planner resorts to a full table scan instead of using the index on the underlying table. This severely impacts performance.
----------
:-( The task lasted for 20 seconds, and in the end, it ran for 3.6 hours, Under the same environment and query instructions, pg16 before the upgrade did not have this issue
Planning:
Buffers: shared hit=1829 read=67
I/O Timings: shared read=309.026
Planning Time: 344.548 ms
Execution Time: 12999763.259 ms
----------
Below is a step-by-step reproduction of the issue:
CREATE TABLE materials (
material_id VARCHAR PRIMARY KEY,
description TEXT NOT NULL,
unit_price NUMERIC(10, 2) NOT NULL
);
# 2. Create an explicit index on 'material_id'.
CREATE INDEX idx_material_id ON materials(material_id);
# 3. Create a simple view that mirrors the base table structure.
CREATE VIEW materials_view AS
SELECT material_id, description, unit_price FROM materials;
# 4. Insert test data:
INSERT INTO materials (material_id, description, unit_price)
VALUES ('PI', 'Example Product', 99.99);
INSERT INTO materials (material_id, description, unit_price)
SELECT
'MAT' || generate_series(1,49999)::TEXT,
'Description of product ' || generate_series(1,49999)::TEXT,
(random() * 1000 + 1)::NUMERIC(10,2);
# 5. Query the base table to compute the average unit price for material 'PI',using a simple GROUP BY (no rollup)
EXPLAIN ANALYZE
SELECT material_id, AVG(unit_price) as avg_price
FROM materials
WHERE material_id = 'PI'
GROUP BY material_id;
# 6. Query the base table to compute the average unit price for material 'PI',using a simple GROUP BY (rollup)
EXPLAIN ANALYZE
SELECT material_id, AVG(unit_price) as avg_price
FROM materials
WHERE material_id = 'PI'
GROUP BY ROLLUP(material_id);
# 7. Query the view table to compute the average unit price for material 'PI',using a simple GROUP BY (no rollup)
EXPLAIN ANALYZE
SELECT material_id, AVG(unit_price) as avg_price
FROM materials_view
WHERE material_id = 'PI'
GROUP BY
material_id;
# 8. Query the view table to compute the average unit price for material 'PI',using a simple GROUP BY (rollup)
EXPLAIN ANALYZE
SELECT material_id, AVG(unit_price) as avg_price
FROM materials_view
WHERE material_id = 'PI'
GROUP BY ROLLUP(material_id);
Log, please refer to the attachment for more logs:
GroupAggregate (cost=0.00..1097.39 rows=251 width=64) (actual time=3.983..3.985 rows=2.00 loops=1)
Group Key: materials.material_id
Group Key: ()
Buffers: shared hit=468
-> Seq Scan on materials (cost=0.00..1093.00 rows=250 width=48) (actual time=0.018..3.970 rows=1.00 loops=1)
Filter: ((material_id)::text = 'PI'::text)
Rows Removed by Filter: 49999
Buffers: shared hit=468
Planning Time: 0.117 ms
Execution Time: 4.029 ms
Thanks & Best Regards
_________________________________________________________________________________
Haowu Ge (BG5FRG) | Homepage: https://www.gehaowu.com | PGP:7A06 1F6E DF09 D8A8Вложения
On Mon, Nov 17, 2025 at 5:02 PM Haowu Ge <gehaowu@bitmoe.com> wrote: > # 8. Query the view table to compute the average unit price for material 'PI',using a simple GROUP BY (rollup) > EXPLAIN ANALYZE > SELECT material_id, AVG(unit_price) as avg_price > FROM materials_view > WHERE material_id = 'PI' > GROUP BY ROLLUP(material_id); > GroupAggregate (cost=0.00..1097.39 rows=251 width=64) (actual time=3.983..3.985 rows=2.00 loops=1) > Group Key: materials.material_id > Group Key: () > Buffers: shared hit=468 > -> Seq Scan on materials (cost=0.00..1093.00 rows=250 width=48) (actual time=0.018..3.970 rows=1.00 loops=1) > Filter: ((material_id)::text = 'PI'::text) > Rows Removed by Filter: 49999 > Buffers: shared hit=468 > Planning Time: 0.117 ms > Execution Time: 4.029 ms Thanks for the report. The reason why this query cannot use an index scan is that, when the parent query uses grouping sets, all subquery outputs are wrapped in PlaceHolderVars. This is to ensure that these expressions retain their separate identity so that they can match to the expected grouping set columns (cf. cc5d98525). However, it also prevents the WHERE clause from matching the index column, which is why the index cannot be used. Prior to v18, we wrapped only the subquery outputs that were non-Var expressions. That is why this query was able to use an index scan. However, that behavior could produce incorrect results when the subquery's targetlist contained two or more identical Var expressions. In addition, if the targetlist included any non-Var expressions, we could also fail to match the WHERE clause to the corresponding index expressions. After looking at this query more closely, it seems that there is no need to wrap the expressions in the WHERE clause in the first place. It should be sufficient to wrap only the expressions that are used logically above the aggregation step, ie, those in the targetlist and in the havingQual. However, pullup_replace_vars currently lacks the infrastructure to do that. Another possible fix is to detect whether the subquery's targetlist contains two or more identical Var expressions and, if not, fall back to wrapping only the non-Var expressions. This would restore the behavior that existed prior to v18 for this query. Alternatively, we could teach match_index_to_operand to unwrap PHV expressions when matching them to indexes. This would also allow the WHERE clause to match indexes for non-Var expressions, which is an improvement over the behavior prior to v18. But I haven't fully thought through whether this approach is safe in all cases, particularly when outer joins are involved. Any thoughts? - Richard
On Tue, Nov 18, 2025 at 5:27 PM Richard Guo <guofenglinux@gmail.com> wrote:
> Alternatively, we could teach match_index_to_operand to unwrap PHV
> expressions when matching them to indexes. This would also allow the
> WHERE clause to match indexes for non-Var expressions, which is an
> improvement over the behavior prior to v18. But I haven't fully
> thought through whether this approach is safe in all cases,
> particularly when outer joins are involved.
I believe this approach is safe, because a PlaceHolderVar appearing in
a relation-scan-level expression is effectively a no-op. To play it
safe, we could verify that the PlaceHolderVar is not marked nullable
and that its syntactic scope matches this index, and only strip such
PlaceHolderVars (though it's unclear if this is really necessary).
Attached is a patch that implements this approach. With this patch,
the reported query can use an index scan again. Additionally, it
enables planner to explore index scans in other cases that were
previously not possible, even before v18. For example:
create table t (a int);
insert into t select i from generate_series(1,1000)i;
create index on t (coalesce(a, 1));
analyze t;
explain (costs off)
select * from t t1 left join
(select coalesce(a, 1) as c, * from t t2) s on true
where t1.a > s.c;
QUERY PLAN
-----------------------------------------------
Nested Loop
-> Seq Scan on t t1
-> Index Scan using t_coalesce_idx on t t2
Index Cond: (COALESCE(a, 1) < t1.a)
(4 rows)
The expression coalesce(a, 1) is wrapped in a PlaceHolderVar, and
previously it was not possible to match it to the index expression.
- Richard