Re: Proposal to introduce a shuffle function to intarray extension

Поиск

Список

Период

Сортировка

От	Thomas Munro
Тема	Re: Proposal to introduce a shuffle function to intarray extension
Дата	18 июля 2022 г. 01:37:04
Msg-id	CA+hUKG+TPcsR-OmioTdtTHBs9k6dS0fOcgkw4YSdp_=RJhCxoQ@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Proposal to introduce a shuffle function to intarray extension (Martin Kalcher <martin.kalcher@aboutsource.net>)
Ответы	Re: Proposal to introduce a shuffle function to intarray extension (Tom Lane <tgl@sss.pgh.pa.us>) Re: Proposal to introduce a shuffle function to intarray extension (Martin Kalcher <martin.kalcher@aboutsource.net>)
Список	pgsql-hackers

Дерево обсуждения

On Mon, Jul 18, 2022 at 4:15 AM Martin Kalcher
<martin.kalcher@aboutsource.net> wrote:
> Am 17.07.22 um 08:00 schrieb Thomas Munro:
> >> Actually ... is there a reason to bother with an intarray version
> >> at all, rather than going straight for an in-core anyarray function?
> >> It's not obvious to me that an int4-only version would have
> >> major performance advantages.
> >
> > Yeah, that seems like a good direction.  If there is a performance
> > advantage to specialising, then perhaps we only have to specialise on
> > size, not type.  Perhaps there could be a general function that
> > internally looks out for typbyval && typlen == 4, and dispatches to a
> > specialised 4-byte, and likewise for 8, if it can, and that'd already
> > be enough to cover int, bigint, float etc, without needing
> > specialisations for each type.
>
> I played around with the idea of an anyarray shuffle(). The hard part
> was to deal with arrays with variable length elements, as they can not
> be swapped easily in place. I solved it by creating an intermediate
> array of references to the elements. I'll attach a patch with the proof
> of concept. Unfortunatly it is already about 5 times slower than the
> specialised version and i am not sure if it is worth going down that road.

Seems OK for a worst case.  It must still be a lot faster than doing
it in SQL.  Now I wonder what the exact requirements would be to
dispatch to a faster version that would handle int4.  I haven't
studied this in detail but perhaps to dispatch to a fast shuffle for
objects of size X, the requirement would be something like typlen == X
&& align_bytes <= typlen && typlen % align_bytes == 0, where
align_bytes is typalign converted to ALIGNOF_{CHAR,SHORT,INT,DOUBLE}?
Or in English, 'the data consists of densely packed objects of fixed
size X, no padding'.  Or perhaps you can work out the padded size and
use that, to catch a few more types.  Then you call
array_shuffle_{2,4,8}() as appropriate, which should be as fast as
your original int[] proposal, but work also for float, date, ...?

About your experimental patch, I haven't reviewed it properly or tried
it but I wonder if uint32 dat_offset, uint32 size (= half size
elements) would be enough due to limitations on varlenas.

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Tom Lane
Дата: 18 июля 2022 г., 01:25:19
Сообщение: Re: postgres_fdw versus regconfig and similar constants

Следующее

От: Tom Lane
Дата: 18 июля 2022 г., 01:46:27
Сообщение: Re: Proposal to introduce a shuffle function to intarray extension

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Proposal to introduce a shuffle function to intarray extension

Предыдущее

Следующее