Unique index prohibits partial aggregates

Поиск

Список

Период

Сортировка

От	Bos, Fred
Тема	Unique index prohibits partial aggregates
Дата	27 июня 2022 г. 14:49:31
Msg-id	DB4PR02MB8774E06D595D3088BE04ED92E7B99@DB4PR02MB8774.eurprd02.prod.outlook.com обсуждение исходный текст
Ответы	Re: Unique index prohibits partial aggregates (David Rowley <dgrowleyml@gmail.com>)
Список	pgsql-general

Дерево обсуждения

Dear pgsql-general,

I am currently working in PostgreSQL 13.7, compiled by Visual C++

build 1914, 64-bit and encountered the following:

I have a table with two columns, which is created as

follows:

CREATE TABLE IF NOT EXISTS bhload_nohyp_noin (

t BIGINT NOT NULL,

v REAL NULL

);

The table is 32 million rows long.

When I run the following query:

SELECT t/(1000*3600*24) as time,

avg(v)

FROM bhload_nohyp_noin

GROUP BY time

ORDER BY time

Postgres executes the following plan:

"QUERY PLAN"

"Finalize GroupAggregate (cost=295097.60..295226.53 rows=200 width=16)"

" Output: ((t / 86400000)), avg(v)"

" Group Key: ((bhload_nohyp_noin.t / 86400000))"

" -> Gather Merge (cost=295097.60..295218.53 rows=1000 width=40)"

" Output: ((t / 86400000)), (PARTIAL avg(v))"

" Workers Planned: 5"

" -> Sort (cost=295097.52..295098.02 rows=200 width=40)"

" Output: ((t / 86400000)), (PARTIAL avg(v))"

" Sort Key: ((bhload_nohyp_noin.t / 86400000))"

" -> Partial HashAggregate (cost=295087.38..295089.88 rows=200 width=40)"

" Output: ((t / 86400000)), PARTIAL avg(v)"

" Group Key: (bhload_nohyp_noin.t / 86400000)"

" -> Parallel Seq Scan on public.bhload_nohyp_noin (cost=0.00..259966.13 rows=7024250 width=12)"

" Output: (t / 86400000), v"

"Settings: effective_cache_size = '128GB', force_parallel_mode = 'on', max_parallel_workers = '40',

max_parallel_workers_per_gather = '10', parallel_setup_cost = '0', random_page_cost = '1.1',

temp_buffers = '200MB', work_mem = '1GB'"

"Planning:"

" Buffers: shared hit=14"

"Planning Time: 0.747 ms"

This plan takes about 2 seconds.

However, when postgres learns the table statistics, either after running the

query once, performing vacuum analyse on the table or assigning a unique index,

postgres refuses to perform partial aggregates. The reason is that the column

t is unique and ordered which causes the query plan to change.

This results in the following query plan:

"QUERY PLAN"

"GroupAggregate (cost=971690.95..5460155.25 rows=31850064 width=16)"

" Output: ((t / 86400000)), avg(v)"

" Group Key: ((bhload_nohyp_noin.t / 86400000))"

" -> Gather Merge (cost=971690.95..4823153.97 rows=31850064 width=12)"

" Output: ((t / 86400000)), v"

" Workers Planned: 5"

" -> Sort (cost=971690.88..987615.91 rows=6370013 width=12)"

" Output: ((t / 86400000)), v"

" Sort Key: ((bhload_nohyp_noin.t / 86400000))"

" -> Parallel Seq Scan on public.bhload_nohyp_noin (cost=0.00..251788.16 rows=6370013 width=12)"

" Output: (t / 86400000), v"

"Settings: effective_cache_size = '128GB', force_parallel_mode = 'on', max_parallel_workers = '40',

max_parallel_workers_per_gather = '10', parallel_setup_cost = '0', random_page_cost = '1.1',

temp_buffers = '200MB', work_mem = '1GB'"

"Planning:"

" Buffers: shared hit=6"

"Planning Time: 0.180 ms"

This takes about 18 seconds, which is much slower. When swapping the columns t

and v, so grouping by v, postgres decides to do partial aggregation again

which makes the operation 2 seconds.

Some extra notes:

- I noticed that the parallel and partial hash aggregation gives the speed up.

- I tried to trigger it by using the parallel costs, but I could not trigger it.

- The table statistics before vacuum analyze is empty.

- Here are the table statistics after vacuum analyze:

schemaname tablename attname inherited null_frac avg_width n_distinct correlation most_common_elem_freqs elem_count_histogram

public bhload_nohyp_noin t FALSE 0 8 -1 1 NULL NULL

public bhload_nohyp_noin v FALSE 0 4 3937 -0.20693 NULL NULL

I expected the query to become faster

with a unique index or column, so why does the query planner decide on group

aggregation instead of partial aggregation?

Regards,

Fred Bos

---------------------------------------------------------------------------
The contents of this e-mail (including any attachments) are for the intended recipients only. If you are not an intended recipient but have received this email in error, we kindly request you to inform the sender of such error and delete this email and any attachments. If you open any attachments of this email, please understand that you do so at your own risk. We have made all reasonable efforts to keep this email and its attachments free from any bugs, viruses or the like, but cannot accept any responsibility for it.
Huisman Equipment B.V. and its affiliated companies cannot take any responsibility with regard to the accuracy or completeness of the content of this email and any attachments. Please note that this email and any attachments may contain information that is considered confidential, privileged and subject to copyright or other intellectual property rights. We kindly request (and insofar legally possible, demand) you to keep the content of this email and any attachments confidential and abide to the restrictions following from such protection.
---------------------------------------------------------------------------

В списке pgsql-general по дате отправления:

Предыдущее

От: Magnus Hagander
Дата: 27 июня 2022 г., 14:33:01
Сообщение: Re: Different sort result between PostgreSQL 8.4 and 12.5

Следующее

От: Magnus Hagander
Дата: 27 июня 2022 г., 17:33:35
Сообщение: Re: Table space not returned to the OS ?

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Unique index prohibits partial aggregates

Предыдущее

Следующее