Re: Odd Shortcut behaviour in PG14

Поиск
Список
Период
Сортировка
От Ron Johnson
Тема Re: Odd Shortcut behaviour in PG14
Дата
Msg-id CANzqJaATekkU4Nhh4u86CHB1X+oHxV7NbweOXExO4326wbFSTA@mail.gmail.com
обсуждение исходный текст
Ответ на RE: Odd Shortcut behaviour in PG14  (Zahir Lalani <ZahirLalani@oliver.agency>)
Список pgsql-general
On Fri, Nov 24, 2023 at 8:01 AM Zahir Lalani <ZahirLalani@oliver.agency> wrote:


> -----Original Message-----
> From: Tom Lane <tgl@sss.pgh.pa.us>
> Sent: Thursday, November 23, 2023 7:45 PM
> To: Ron Johnson <ronljohnsonjr@gmail.com>
> Cc: pgsql-generallists.postgresql.org <pgsql-general@lists.postgresql.org>
> Subject: Re: Odd Shortcut behaviour in PG14
>
> Ron Johnson <ronljohnsonjr@gmail.com> writes:
> > Out of curiosity, what is the point of adding the "true" predicate no
> > matter the position?  Maybe I've created an incorrect truth table, but
> > "true AND" (and "AND true") don't make any logical difference when
> > added to (ekey > 0)*.*
>
> Not only does it not make any logical difference, but it shouldn't make any
> practical difference either, because the useless "true"
> subclause will be thrown away very early in planning, before any decisions
> would be taken on the strength of what is in the ON clause.
> So I was skeptical that the details presented were even correct.
> I think whatever is causing the behavioral change is something else that the OP
> hasn't identified/controlled for.  Hard to tell with such fragmentary details.
>
>                         regards, tom lane
>

Apologies but I am struggling to make this happen in isolation. A few things:

The true was an error - we previously did not have the AND so needed the true. The devs added the AND of ekey but should have removed the true. We have done that now.

There is a difference between the PG versions. The Dev server which works is 14.9 and the QA server which fails is 14.10

Looking at the application logs this function is being called once per display row - it is running successfully around 10 times with the same input params. When it fails, it is with the same params! All I can say for sure is that the first 10 it ignores the lateral join as the (ekey > 0) evaluates to (0>0). However on the next invocation, for whatever reason, even though ekey is still 0, it decides to evaluate the lateral join and fails.

If we change the code from (ekey > 0) to (0 > 0) - it always works!! So the planner seems to be making different choices based on other functions being called higher up in the chain. We need a way of telling the planner not to run the lateral join if ekey=0.

Sorry I can't provide a testable case - have not been able to reproduce just running this one function multiple times in a loop.

A PL/PgSQL loop, or a bash loop?  Because if the same code, with the same parameters, runs the exact same 10 times in a row, then There's Some Other Problem You're Not Telling Us.
 
Put the SELECT statement in a file with EXPLAIN ANALYZE prefixed to it.

Then run this bash loop:
for x in `seq 0 9`; do psql --host=foo -U bar -af thequery.sql &> thequery_analyzed_$x.out; done

diff them.

I'm betting that what the developers wrote relies on something that SQL doesn't say is reliable, and which changed between 14.9 and 14.10.

Or did you also upgrade the OS when upgrading to .10?  That could cause a problem; always rebuild all indices with CHAR, VARCHAR & TEXT fields after upgrading glibc.

В списке pgsql-general по дате отправления:

Предыдущее
От: Les
Дата:
Сообщение: Re: replication primary writting infinite number of WAL files
Следующее
От: Laurenz Albe
Дата:
Сообщение: Re: replication primary writting infinite number of WAL files