Seeking reason behind performance gain in 12 with HashAggregate

Поиск
Список
Период
Сортировка
От Shira Bezalel
Тема Seeking reason behind performance gain in 12 with HashAggregate
Дата
Msg-id CAE0KEwGnDspRJjX=cdvGC69=F=76jbhiViDp=T5pPYR0xF2NAQ@mail.gmail.com
обсуждение исходный текст
Ответы Re: Seeking reason behind performance gain in 12 with HashAggregate  (Justin Pryzby <pryzby@telsasoft.com>)
Re: Seeking reason behind performance gain in 12 with HashAggregate  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Список pgsql-performance
Hi All,

I'm testing an upgrade from Postgres 9.6.16 to 12.1 and seeing a significant performance gain in one specific query. This is really great, but I'm just looking to understand why. Reading through the release notes across all the new versions (10, 11, 12) hasn't yielded an obvious cause, but maybe I missed something. Also, I realize it could be related to other factors (config parameters, physical hosts, etc), but the systems are pretty similar so just wondering about Postgres changes.

The query is the following:

SELECT pvc.value, SUM(pvc.count) AS sum
FROM
(SELECT (ST_ValueCount(cv.rast, 1)).*
FROM calveg_whrtype_20m AS cv) AS pvc
GROUP BY pvc.value 

Here is the EXPLAIN (ANALYZE ON, BUFFERS ON) output from both systems:


In the 9.6 plan, the Seq Scan node produced 15,812 rows. 
In the 12 plan, the Seq Scan produced 2,502 rows, and then the ProjectSet node produced 15,812 rows. 

Note that the table (calveg_whrtype_20m) in the two databases have the same number of rows (2,502).

So it seems something about the introduction of the ProjectSet node between the Seq Scan and HashAggregate is optimizing things...? Is this the right conclusion to draw and if so, why might this be happening? Is there something that was changed/improved in either 10, 11 or 12 that this behavior can be attributed to? 

Two more notes -- 

1. If I run the inner subquery without the outer sum/group by, the plans between the two systems are identical.

2. As the calgeg_whrtype_20m table is a raster, I started my question on the PostGIS list, but there was no obvious answer that the gain is related to a change in the PostGIS code so I'm now turning to this list. 

Thank you,
Shira

В списке pgsql-performance по дате отправления:

Предыдущее
От: Jeff Janes
Дата:
Сообщение: Re: Bad query plan when you add many OR conditions
Следующее
От: Justin Pryzby
Дата:
Сообщение: Re: Seeking reason behind performance gain in 12 with HashAggregate