Re: BUG #15869: Custom aggregation returns null when parallelized

Поиск
Список
Период
Сортировка
От Kassym Dorsel
Тема Re: BUG #15869: Custom aggregation returns null when parallelized
Дата
Msg-id CAKTpVaYSK=2tav0+K3uZkGkLYokGW1Gwi7F=6wdEN4Mdytpo8A@mail.gmail.com
обсуждение исходный текст
Ответ на Re: BUG #15869: Custom aggregation returns null when parallelized  (David Rowley <david.rowley@2ndquadrant.com>)
Список pgsql-bugs
Yes, the issue comes from having the array in my state have zero elements. However why does an aggregate worker end up with a state in which zero records have been passed to it?

In the case of 1 worker process, this makes sense to me. The combine func takes in two states, the one filled by the single worker process and another dummy empty state. In the case of 2 or more worker process I would think that all states would process some rows and not be left initialized and never used. Is the fact that when there are 2 or more workers that one or more aggregation states doesn't receive any data to process a bug? Or is this non deterministic behavior to be expected?

My mistake was making the assumption that an initialized state would always receive data and would not be passed to a combine func without having processed any data. 

This is the part that would be nice to have in the documentation. In an aggregation with a gather node an initialized worker state may never be passed any data to process and would thus keep its initialized state when being passed to the combine func.



Best,
Kassym


On Mon, Jun 24, 2019 at 6:59 PM David Rowley <david.rowley@2ndquadrant.com> wrote:
On Tue, 25 Jun 2019 at 04:07, Kassym Dorsel <k.dorsel@gmail.com> wrote:
> Right, adding the Gather node makes it use the combine func and this is where the problem is.

You're mixing up Gather and Parallel Aggregates. Setting
force_parallel_mode to on does not force the aggregate to be
parallelised. It just tries to inject a Gather node at the top of the
plan. I think it was really meant just to test the tuple queues for
parallel query back in 9.6. You're certainly not the only person to
have been confused by it.

> You're right on handling of null values in my combine function. Since this was being run on a table with 150k rows, I had assumed that the contents of my aggregate types would never be null/empty.
>
> Thinking about it, it would make sense to receive an aggregate type with count = 0 or null iff there is 1 worker (1 result to combine the other being null/empty). When there are 2 or more workers I would assume that rows would be relatively evenly split and the return of my aggregate type would be filled given the 150k rows. I tried with 1,2,3,4 workers (ALTER TABLE temp SET (parallel_workers = 1,2,3,4);) and got the same null results before adding support for null values.
>
> Is this expected behavior when number of workers is >=2? An explicit paragraph in parallel aggregates documentation outlining null support in combine func might be helpful.

I don't think anyone would be opposed to improving the documents, but
in this case, it's not the state that was NULL. You don't need to deal
with that since you made your combine function strict. It was your
array elements that were NULL and "<value> <op> NULL" yielding NULL is
fairly fundamental to SQL, not really specific to aggregation.  Your
initcond made the q[] array an empty array, so trying to fetch an
element that does not exist will yield NULL. You wouldn't have had the
issue if you'd set all those array elements to 0 in the initcond, but
I've not taken the time to understand your transfn to know if that's
valid. If you've added NULL handling in the combinefn, then that's
likely fine.

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Alvaro Herrera
Дата:
Сообщение: Re: BUG #15789: libpq compilation with OpenSSL 1.1.1b fails onWindows with Visual Studio 2017
Следующее
От: PG Bug reporting form
Дата:
Сообщение: BUG #15873: Attaching a partition fails because it sees deleted columns