Re: optimize lookups in snapshot [sub]xip arrays

Поиск

Список

Период

Сортировка

От	Bharath Rupireddy
Тема	Re: optimize lookups in snapshot [sub]xip arrays
Дата	14 июля 2022 г. 12:40:56
Msg-id	CALj2ACUUF0ZZGikWBsX88U1dFs72npfuprh2rp0zw_vxH-7O3Q@mail.gmail.com обсуждение исходный текст
Ответ на	optimize lookups in snapshot [sub]xip arrays (Nathan Bossart <nathandbossart@gmail.com>)
Ответы	Re: optimize lookups in snapshot [sub]xip arrays
Список	pgsql-hackers

Дерево обсуждения

On Wed, Jul 13, 2022 at 10:40 PM Nathan Bossart
<nathandbossart@gmail.com> wrote:
>
> Hi hackers,
>
> A few years ago, there was a proposal to create hash tables for long
> [sub]xip arrays in snapshots [0], but the thread seems to have fizzled out.
> I was curious whether this idea still showed measurable benefits, so I
> revamped the patch and ran the same test as before [1].  Here are the
> results for 60₋second runs on an r5d.24xlarge with the data directory on
> the local NVMe storage:
>
>      writers  HEAD  patch  diff
>     ----------------------------
>      16       659   664    +1%
>      32       645   663    +3%
>      64       659   692    +5%
>      128      641   716    +12%
>      256      619   610    -1%
>      512      530   702    +32%
>      768      469   582    +24%
>      1000     367   577    +57%

Impressive.

> As before, the hash table approach seems to provide a decent benefit at
> higher client counts, so I felt it was worth reviving the idea.
>
> The attached patch has some key differences from the previous proposal.
> For example, the new patch uses simplehash instead of open-coding a new
> hash table.  Also, I've bumped up the threshold for creating hash tables to
> 128 based on the results of my testing.  The attached patch waits until a
> lookup of [sub]xip before generating the hash table, so we only need to
> allocate enough space for the current elements in the [sub]xip array, and
> we avoid allocating extra memory for workloads that do not need the hash
> tables.  I'm slightly worried about increasing the number of memory
> allocations in this code path, but the results above seemed encouraging on
> that front.
>
> Thoughts?
>
> [0] https://postgr.es/m/35960b8af917e9268881cd8df3f88320%40postgrespro.ru
> [1] https://postgr.es/m/057a9a95-19d2-05f0-17e2-f46ff20e9b3e%402ndquadrant.com

Aren't these snapshot arrays always sorted? I see the following code:

/* sort so we can bsearch() */
qsort(snapshot->xip, snapshot->xcnt, sizeof(TransactionId), xidComparator);

/* sort so we can bsearch() later */
qsort(snap->subxip, snap->subxcnt, sizeof(TransactionId), xidComparator);

If the ordering isn't an invariant of these snapshot arrays, can we
also use the hash table mechanism for all of the snapshot arrays
infrastructure rather than qsort+bsearch in a few places and hash
table for others?

+ * The current value worked well in testing, but it's still mostly a guessed-at
+ * number that might need updating in the future.
+ */
+#define XIP_HASH_MIN_ELEMENTS (128)
+

Do you see a regression with a hash table for all the cases? Why can't
we just build a hash table irrespective of these limits and use it for
all the purposes instead of making it complex with different
approaches if we don't have measurable differences in the performance
or throughput?

+static inline bool
+XidInXip(TransactionId xid, TransactionId *xip, uint32 xcnt,
+ xip_hash_hash **xiph)

+ /* Make sure the hash table is built. */
+ if (*xiph == NULL)
+ {
+ *xiph = xip_hash_create(TopTransactionContext, xcnt, NULL);
+
+ for (int i = 0; i < xcnt; i++)

Why create a hash table on the first search? Why can't it be built
while inserting or creating these snapshots? Basically, instead of the
array, can these snapshot structures be hash tables by themselves? I
know this requires a good amount of code refactoring, but worth
considering IMO as it removes bsearch thus might improve the
performance further.

Regards,
Bharath Rupireddy.

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Niyas Sait
Дата: 14 июля 2022 г., 11:36:13
Сообщение: Re: [PATCH] Add native windows on arm64 support

Следующее

От: Bharath Rupireddy
Дата: 14 июля 2022 г., 12:43:37
Сообщение: Re: Patch proposal: New hooks in the connection path

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: optimize lookups in snapshot [sub]xip arrays

Предыдущее

Следующее