Re: [PATCH] Introduce array_shuffle() and array_sample()

Поиск
Список
Период
Сортировка
От Martin Kalcher
Тема Re: [PATCH] Introduce array_shuffle() and array_sample()
Дата
Msg-id f6267e99-4448-b47c-9fe8-ec23690b931f@aboutsource.net
обсуждение исходный текст
Ответ на Re: [PATCH] Introduce array_shuffle() and array_sample()  (Dean Rasheed <dean.a.rasheed@gmail.com>)
Ответы Re: [PATCH] Introduce array_shuffle() and array_sample()  (Dean Rasheed <dean.a.rasheed@gmail.com>)
Список pgsql-hackers
Am 21.07.22 um 10:41 schrieb Dean Rasheed:
> 
> A couple of quick comments on the current patch:

Thank you for your feedback!

> It's important to mark these new functions as VOLATILE, not IMMUTABLE,
> otherwise they won't work as expected in queries. See
> https://www.postgresql.org/docs/current/xfunc-volatility.html

CREATE FUNCTION marks functions as VOLATILE by default if not explicitly 
marked otherwise. I assumed function definitions in pg_proc.dat have the 
same behavior. I will fix that.
https://www.postgresql.org/docs/current/sql-createfunction.html

> It would be better to use pg_prng_uint64_range() rather than rand() to
> pick elements. Partly, that's because it uses a higher quality PRNG,
> with a larger internal state, and it ensures that the results are
> unbiased across the range. But more importantly, it interoperates with
> setseed(), allowing predictable sequences of "random" numbers to be
> generated -- something that's useful in writing repeatable regression
> tests.

I agree that we should use pg_prng_uint64_range(). However, in order to 
achieve interoperability with setseed() we would have to use 
drandom_seed (rather than pg_global_prng_state) as rng state, which is 
declared statically in float.c and exclusively used by random(). Do we 
want to expose drandom_seed to other functions?

> Assuming these new functions are made to interoperate with setseed(),
> which I think they should be, then they also need to be marked as
> PARALLEL RESTRICTED, rather than PARALLEL SAFE. See
> https://www.postgresql.org/docs/current/parallel-safety.html, which
> explains why setseed() and random() are parallel restricted.

As mentioned above, i assumed the default here is PARALLEL UNSAFE. I'll 
fix that.

> In my experience, the requirement for sampling with replacement is
> about as common as the requirement for sampling without replacement,
> so it seems odd to provide one but not the other. Of course, we can
> always add a with-replacement function later, and give it a different
> name. But maybe array_sample() could be used for both, with a
> "with_replacement" boolean parameter?

We can also add a "with_replacement" boolean parameter which is false by 
default to array_sample() later. I do not have a strong opinion about 
that and will implement it, if desired. Any opinions about 
default/no-default?

Martin



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Thomas Munro
Дата:
Сообщение: Re: standby recovery fails (tablespace related) (tentative patch and discussion)
Следующее
От: Alvaro Herrera
Дата:
Сообщение: Re: standby recovery fails (tablespace related) (tentative patch and discussion)