Re: pg16: XX000: could not find pathkey item to sort

Поиск
Список
Период
Сортировка
От David Rowley
Тема Re: pg16: XX000: could not find pathkey item to sort
Дата
Msg-id CAApHDvrAHkghCDNFG5+K2adWCDCC9cdZT2KZamVRkqYy31kL9w@mail.gmail.com
обсуждение исходный текст
Ответ на Re: pg16: XX000: could not find pathkey item to sort  (Richard Guo <guofenglinux@gmail.com>)
Ответы Re: pg16: XX000: could not find pathkey item to sort
Re: pg16: XX000: could not find pathkey item to sort
Список pgsql-hackers
On Sun, 8 Oct 2023 at 23:52, Richard Guo <guofenglinux@gmail.com> wrote:
> On Thu, Oct 5, 2023 at 2:26 PM David Rowley <dgrowleyml@gmail.com> wrote:
>>
>> So in short, I propose the attached fix without any regression tests
>> because I feel that any regression test would just mark that there was
>> a big in create_agg_path() and not really help with ensuring we don't
>> end up with some similar problem in the future.
>
>
> If the pathkeys that were added by adjust_group_pathkeys_for_groupagg()
> are computable from the targetlist, it seems that we do not need to trim
> them off, because prepare_sort_from_pathkeys() will add resjunk target
> entries for them.  But it's also no harm if we trim them off.  So I
> think the patch is a pretty safe fix.  +1 to it.

hmm, I think one of us does not understand what is going on here. I
tried to explain in [1] why we *need* to strip off the pathkeys added
by adjust_group_pathkeys_for_groupagg().

Given the following example:

create table ab (a int,b int);
explain (costs off) select a,count(distinct b) from ab group by a;
         QUERY PLAN
----------------------------
 GroupAggregate
   Group Key: a
   ->  Sort
         Sort Key: a, b
         ->  Seq Scan on ab
(5 rows)

adjust_group_pathkeys_for_groupagg() will add the pathkey for the "b"
column and that results in the Sort node sorting on {a,b}.  It's
simply not at all valid to have the GroupAggregate path claim that its
pathkeys are also (effectively) {a,b}" as "b" does not and cannot
legally exist after the aggregation takes place.  We cannot put a
resjunk "b" in the targetlist of the GroupAggregate either as there
could be any number "b" values aggregated.

Can you explain why you think we can put a resjunk "b" in the target
list of the GroupAggregate in the above case?

>>
>> I have some concerns that the assert_pathkeys_in_target() function
>> might be a little heavyweight for USE_ASSERT_CHECKING builds. So I'm
>> not proposing to commit that without further discussion.
>
>
> Yeah, it looks like some heavy to call assert_pathkeys_in_target() for
> each path node.  Can we run some benchmarks to see how much overhead it
> would bring to USE_ASSERT_CHECKING build?

I think it'll be easy to show that there is an overhead to it.  It'll
be in the realm of the ~41ms patched vs ~24ms unpatched that I showed
in [2].  That's quite an extreme case.

Maybe it's worth checking the total planning time spent in a run of
the regression tests with and without the patch to see how much
overhead it adds to the "average case".

David

[1] https://postgr.es/m/CAApHDvpJJigQRW29TppTOPYp+Aui0mtd3MpfRxyKv=N-tB62jQ@mail.gmail.com
[2] https://postgr.es/m/CAApHDvo7RzcQYw-gnkZr6QCijCqf8vJLkJ4XFk-KawvyAw109Q@mail.gmail.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: David Rowley
Дата:
Сообщение: Re: Problem, partition pruning for prepared statement with IS NULL clause.
Следующее
От: David Rowley
Дата:
Сообщение: Re: Draft LIMIT pushdown to Append and MergeAppend patch