Re: SIMD optimization for list_sort

Поиск

Список

Период

Сортировка

От	Heikki Linnakangas
Тема	Re: SIMD optimization for list_sort
Дата	22 ноября 2024 г. 19:01:22
Msg-id	a0a02eba-ff3a-48e8-bc18-c8dbd4521e0e@iki.fi обсуждение исходный текст
Ответ на	SIMD optimization for list_sort ("Giacchino, Luca" <luca.giacchino@intel.com>)
Ответы	Re: SIMD optimization for list_sort
Список	pgsql-hackers

Дерево обсуждения

On 22/11/2024 01:27, Giacchino, Luca wrote:
> The existing list_sort takes a comparator function to compare pairs of 
> ListCell. On the other hand, x86-simd-sort requires an array of numeric 
> values to sort, and it returns an array of sorted indices. To enable 
> x86-simd-sort, we add new list_sort_simd functions that take an 
> extractor function. The function extracts a value (float or uint32) from 
> a ListCell. Those values are then collected into an array for x86-simd- 
> sort to work on. A comparator function can still be passed to be used as 
> tie-breaker.
> 
> typedef float (*list_sort_extractor_float)(const ListCell *a);
> 
> typedef uint32 (*list_sort_extractor_uint32)(const ListCell *a);
> 
> void list_sort_simd_float(List *list, list_sort_extractor_float extract, 
> list_sort_comparator cmp);
> 
> void list_sort_simd_uint32(List *list, list_sort_extractor_uint32 
> extract, list_sort_comparator cmp);
> 
> These functions will exist alongside the current list_sort. Existing 
> list_sort use cases in Postgres or extensions will not be affected by 
> default, and they can be converted to list_sort_simd functions where it 
> makes sense in terms of performance.

I'd suggest targeting pg_qsort() directly, instead of list_sort(). 
list_sort() is not used in very performance critical parts.

> We identified a first use case for list_sort_simd_float in pgvector. As 
> part of HNSW index construction, pgvector uses list_sort to sort 
> candidate vectors by distance. Using list_sort_simd_float, we observed 
> reduction in index build time in some scenarios. For example, we 
> observed 7% reduction in index build time with the gist-960 dataset and 
> 10% with the dbpedia-openai-1000k dataset (ANN-Benchmarks, HNSW index 
> with halfvec, m=80). We are also looking into microbenchmarks to measure 
> list_sort performance independently.

That's interesting. I'd suggest proposing this to the pgvector project 
directly, since pgvector would immediately benefit.

> We’d appreciate feedback on this approach. In the meantime, we will 
> complete the patch to share. We also plan to extend SIMD-based sort to 
> tuple sort in the future.

If you could use this to speed up tuple sorting, that would be much more 
interesting for PostgreSQL itself.

-- 
Heikki Linnakangas
Neon (https://neon.tech)

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: SIMD optimization for list_sort