Re: Proposal to introduce a shuffle function to intarray extension

Поиск
Список
Период
Сортировка
От Thomas Munro
Тема Re: Proposal to introduce a shuffle function to intarray extension
Дата
Msg-id CA+hUKGLcjrJ9Q2OZAKHN-SQyq_m5QDv8d3WKHYq5HW8Lyozu4w@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Proposal to introduce a shuffle function to intarray extension  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: Proposal to introduce a shuffle function to intarray extension  (Martin Kalcher <martin.kalcher@aboutsource.net>)
Re: Proposal to introduce a shuffle function to intarray extension  (Martin Kalcher <martin.kalcher@aboutsource.net>)
Список pgsql-hackers
On Sun, Jul 17, 2022 at 3:37 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I wrote:
> > On the whole, I'd vote for calling it shuffle(), and expecting that
> > we'd also use that name for any future generic version.
>
> Actually ... is there a reason to bother with an intarray version
> at all, rather than going straight for an in-core anyarray function?
> It's not obvious to me that an int4-only version would have
> major performance advantages.

Yeah, that seems like a good direction.  If there is a performance
advantage to specialising, then perhaps we only have to specialise on
size, not type.  Perhaps there could be a general function that
internally looks out for typbyval && typlen == 4, and dispatches to a
specialised 4-byte, and likewise for 8, if it can, and that'd already
be enough to cover int, bigint, float etc, without needing
specialisations for each type.

I went to see what Professor Lemire would have to say about all this,
expecting to find a SIMD rabbit hole to fall down for some Sunday
evening reading, but the main thing that jumped out was this article
about the modulo operation required by textbook Fisher-Yates to get a
bounded random number, the random() % n that appears in the patch.  He
talks about shuffling twice as fast by using a no-division trick to
get bounded random numbers[1].  I guess you might need to use our
pg_prng_uint32() for that trick because random()'s 0..RAND_MAX might
introduce bias.  Anyway, file that under go-faster ideas for later.

[1] https://lemire.me/blog/2016/06/30/fast-random-shuffling/



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Nathan Bossart
Дата:
Сообщение: Re: optimize lookups in snapshot [sub]xip arrays
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: Allowing REINDEX to have an optional name