Re: [PATCH] Introduce array_shuffle() and array_sample()

Поиск
Список
Период
Сортировка
От Dean Rasheed
Тема Re: [PATCH] Introduce array_shuffle() and array_sample()
Дата
Msg-id CAEZATCUPcdxeRLmiFvuzZ=kKVpx8kOKTYDPY4oVv3hQBJYfsCQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [PATCH] Introduce array_shuffle() and array_sample()  (Martin Kalcher <martin.kalcher@aboutsource.net>)
Ответы Re: [PATCH] Introduce array_shuffle() and array_sample()  (Martin Kalcher <martin.kalcher@aboutsource.net>)
Re: [PATCH] Introduce array_shuffle() and array_sample()  (Martin Kalcher <martin.kalcher@aboutsource.net>)
Список pgsql-hackers
On Tue, 19 Jul 2022 at 21:21, Martin Kalcher
<martin.kalcher@aboutsource.net> wrote:
>
> Here is a patch with dimension aware array_shuffle() and array_sample().
>

+1 for this feature, and this way of handling multi-dimensional arrays.

> If you think array_flatten() is desirable, i can take a look at it.

That's not something I've ever wanted -- personally, I rarely use
multi-dimensional arrays in Postgres.

A couple of quick comments on the current patch:

It's important to mark these new functions as VOLATILE, not IMMUTABLE,
otherwise they won't work as expected in queries. See
https://www.postgresql.org/docs/current/xfunc-volatility.html

It would be better to use pg_prng_uint64_range() rather than rand() to
pick elements. Partly, that's because it uses a higher quality PRNG,
with a larger internal state, and it ensures that the results are
unbiased across the range. But more importantly, it interoperates with
setseed(), allowing predictable sequences of "random" numbers to be
generated -- something that's useful in writing repeatable regression
tests.

Assuming these new functions are made to interoperate with setseed(),
which I think they should be, then they also need to be marked as
PARALLEL RESTRICTED, rather than PARALLEL SAFE. See
https://www.postgresql.org/docs/current/parallel-safety.html, which
explains why setseed() and random() are parallel restricted.

In my experience, the requirement for sampling with replacement is
about as common as the requirement for sampling without replacement,
so it seems odd to provide one but not the other. Of course, we can
always add a with-replacement function later, and give it a different
name. But maybe array_sample() could be used for both, with a
"with_replacement" boolean parameter?

Regards,
Dean



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Junwang Zhao
Дата:
Сообщение: add a missing space
Следующее
От: Kyotaro Horiguchi
Дата:
Сообщение: Re: i.e. and e.g.