Re: optimize lookups in snapshot [sub]xip arrays

Поиск
Список
Период
Сортировка
От Nathan Bossart
Тема Re: optimize lookups in snapshot [sub]xip arrays
Дата
Msg-id 20220725190419.GA4095199@nathanxps13
обсуждение исходный текст
Ответ на Re: optimize lookups in snapshot [sub]xip arrays  (Nathan Bossart <nathandbossart@gmail.com>)
Ответы Re: optimize lookups in snapshot [sub]xip arrays  (Zhang Mingli <zmlpostgres@gmail.com>)
Re: optimize lookups in snapshot [sub]xip arrays  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers
On Sat, Jul 16, 2022 at 08:59:57PM -0700, Nathan Bossart wrote:
> On Fri, Jul 15, 2022 at 01:08:57PM -0700, Andres Freund wrote:
>> I wonder if we additionally / alternatively could use a faster method of
>> searching the array linearly, e.g. using SIMD.
> 
> I looked into using SIMD.  The patch is attached, but it is only intended
> for benchmarking purposes and isn't anywhere close to being worth serious
> review.  There may be a simpler/better way to implement the linear search,
> but this seemed to work well.  Overall, SIMD provided a decent improvement.
> I had to increase the number of writers quite a bit in order to demonstrate
> where the hash tables began winning.  Here are the numbers:
> 
>     writers head simd hash
>     256     663  632  694
>     512     530  618  637
>     768     489  544  573
>     1024    364  508  562
>     2048    185  306  485
>     4096    146  197  441
> 
> While it is unsurprising that the hash tables perform the best, there are a
> couple of advantages to SIMD that might make that approach worth
> considering.  For one, there's really no overhead (i.e., you don't need to
> sort the array or build a hash table), so we can avoid picking an arbitrary
> threshold and just have one code path.  Also, a SIMD implementation for a
> linear search through an array of integers could likely be easily reused
> elsewhere.

From the discussion thus far, it seems there is interest in optimizing
[sub]xip lookups, so I'd like to spend some time moving it forward.  I
think the biggest open question is which approach to take.  Both the SIMD
and hash table approaches seem viable, but I think I prefer the SIMD
approach at the moment (see the last paragraph of quoted text for the
reasons).  What do folks think?

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andrew Dunstan
Дата:
Сообщение: Re: very long record lines in expanded psql output
Следующее
От: Anthony Sotolongo
Дата:
Сообщение: Re: Expose Parallelism counters planned/execute in pg_stat_statements