Re: BUG #18060: Left joining rows using random() function in join condition doesn't work as expected.

Поиск
Список
Период
Сортировка
От David Rowley
Тема Re: BUG #18060: Left joining rows using random() function in join condition doesn't work as expected.
Дата
Msg-id CAApHDvqaHfoK0fPU+yx7Bvu4mQ+cYbBwXS10rYj9gviOzutB2Q@mail.gmail.com
обсуждение исходный текст
Ответ на Re: BUG #18060: Left joining rows using random() function in join condition doesn't work as expected.  (James Inform <james.inform@pharmapp.de>)
Ответы Re: BUG #18060: Left joining rows using random() function in join condition doesn't work as expected.  (James Inform <james.inform@pharmapp.de>)
Список pgsql-bugs
On Mon, 21 Aug 2023 at 06:26, James Inform <james.inform@pharmapp.de> wrote:
>
> Just looked at the results again.
>
> They look strange. When I execute the query multiple times I get three
> kinds of results:
> - NO match on all columns
> - THE SAME match on all columns
> - THREE DIFFERENT columns that are repeated for all the 1000 rows.
>
> With your explanation there should be randomly assigned rows appearing,
> but the seems to be a pattern. At least more than those three different
> ones.
>
> Have you executed my example and looked at the results?
> If not, please give it a try. Nothing looks random there.

You might expect that the "link_id = 1 + (random()*500)::int8" is
evaluated as a join qual because you put it in the ON clause, but
PostgreSQL will distribute these quals to the lowest location that
they can be evaluated. Since the only column that's mentioned in your
join expression belongs to the "link" table, then the qual is
evaluated at the scan level for that relation. You'll notice this if
you look at the EXPLAIN output.

The reason you sometimes get no matches is simply that on that
execution of the query, the random number didn't happen to line up
with any of the particular link_ids on any of the scanned tuples.

The reason you sometimes get the same match is that 1 tuple happened
to match the random number during the scan, and that tuple was joined
to 1000 times on your effective clauseless join.

The reason you see three different columns being matches it is that 3
tuples happened to match your random expression during the scan and
the clauseless join joined all three, resulting in 3000 rows rather
than 1000 rows in the final output.

The link table is only scanned once due to the Material node in the
Nested Loop join.  If you did: SET enable_material = off; then the
scan would be performed once per row in the "source" table.  That
would mean the random() function would be executed 1 million times
instead of 1 thousand times.

It might take you a while, but if you tried enough times, all the
planets would align and "link_id = 1 + (random()*500)::int8" would
happen to match all tuples during the scan.  The query would then
return 1 million rows.

PostgreSQL wouldn't have pushed your ON qual down to the scan level if
you'd included some column from the "source" table in the expression.
I'm not sure what good it'd do you, but you'd see different results
using something like "link_id = source_id * 0 + 1 +
(random()*5)::int8" (note the multiplication by 0)

David



В списке pgsql-bugs по дате отправления:

Предыдущее
От: James Inform
Дата:
Сообщение: Re: BUG #18060: Left joining rows using random() function in join condition doesn't work as expected.
Следующее
От: Andres Freund
Дата:
Сообщение: Re: BUG #18055: logical decoding core on AllocateSnapshotBuilder()