Re: BUG #15869: Custom aggregation returns null when parallelized

Поиск
Список
Период
Сортировка
От Kassym Dorsel
Тема Re: BUG #15869: Custom aggregation returns null when parallelized
Дата
Msg-id CAKTpVaayY-rN3s1uFzgOvnmXUtS1ovm=Nsk0Br_Toq+NLBfNTg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: BUG #15869: Custom aggregation returns null when parallelized  (David Rowley <david.rowley@2ndquadrant.com>)
Ответы Re: BUG #15869: Custom aggregation returns null when parallelized  (David Rowley <david.rowley@2ndquadrant.com>)
Список pgsql-bugs
Right, adding the Gather node makes it use the combine func and this is where the problem is.

You're right on handling of null values in my combine function. Since this was being run on a table with 150k rows, I had assumed that the contents of my aggregate types would never be null/empty.

Thinking about it, it would make sense to receive an aggregate type with count = 0 or null iff there is 1 worker (1 result to combine the other being null/empty). When there are 2 or more workers I would assume that rows would be relatively evenly split and the return of my aggregate type would be filled given the 150k rows. I tried with 1,2,3,4 workers (ALTER TABLE temp SET (parallel_workers = 1,2,3,4);) and got the same null results before adding support for null values. 

Is this expected behavior when number of workers is >=2? An explicit paragraph in parallel aggregates documentation outlining null support in combine func might be helpful.

Regardless, adding support for null/empty values has fixed my problem and now the aggregate correctly works in parallel queries. Many thanks.

Best,
Kassym


On Sun, Jun 23, 2019 at 10:51 PM David Rowley <david.rowley@2ndquadrant.com> wrote:
On Mon, 24 Jun 2019 at 03:32, PG Bug reporting form
<noreply@postgresql.org> wrote:
> Here's the setup code:
> --CREATE TABLE temp (val double precision);
> --insert into temp (val) select i from generate_series(0, 150000) as t(i);
> --set force_parallel_mode = on;
> select (stats_agg(val)).* from temp;

I don't think force_parallel_mode does what you think it does. It just
adds a Gather node to the top of the plan, if the plan is deemed
parallel safe. It's not going to force your aggregate to be
parallelised.

You might coax the planner into generating a parallel aggregate plan
by setting parallel_tuple_cost and parallel_setup_cost both to 0.

> Expected results:
> 150001, 37500, 75000, 112500
>
> Results when run in parallel:
> 150001, null, null, null

Are you actually getting a partial and finalize aggregate node with
that? Can you show the EXPLAIN output of each?

You might also want to double check your combine function. It does not
look like it's very well coded to handle NULL values for arrays that
have yet to receive their fill of 5 elements.

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Sergei Kornilov
Дата:
Сообщение: Re: BUG #15870: You can't see the table after executing the table-building statement with \i
Следующее
От: Tom Lane
Дата:
Сообщение: Re: BUG #15865: ALTER TABLE statements causing "relation already exists" errors when some indexes exist