Re: index prefetching

Поиск

Список

Период

Сортировка

От	Konstantin Knizhnik
Тема	Re: index prefetching
Дата	16 января 2024 г. 08:13:43
Msg-id	8969e2cf-6b23-4682-ab56-b3c0aca877ab@garret.ru обсуждение исходный текст
Ответ на	Re: index prefetching (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Ответы	Re: index prefetching
Список	pgsql-hackers

Дерево обсуждения

Hi,

On 12/01/2024 6:42 pm, Tomas Vondra wrote:

Hi,

Here's an improved version of this patch, finishing a lot of the stuff
that I alluded to earlier - moving the code from indexam.c, renaming a
bunch of stuff, etc. I've also squashed it into a single patch, to make
it easier to review.

I am thinking about testing you patch with Neon (cloud Postgres). As far as Neon seaprates compute and storage, prefetch is much more critical for Neon
architecture than for vanilla Postgres.

I have few complaints:

1. It disables prefetch for sequential access pattern (i.e. INDEX MERGE), motivating it that in this case OS read-ahead will be more efficient than prefetch. It may be true for normal storage devices, bit not for Neon storage and may be also for Postgres on top of DFS (i.e. Amazon RDS). I wonder if we can delegate decision whether to perform prefetch in this case or not to some other level. I do not know precisely where is should be handled. The best candidate IMHO is storager manager. But it most likely requires extension of SMGR API. Not sure if you want to do it... Straightforward solution is to move this logic to some callback, which can be overwritten by user.

2. It disables prefetch for direct_io. It seems to be even more obvious than 1), because prefetching using `posix_fadvise` definitely not possible in case of using direct_io. But in theory if SMGR provides some alternative prefetch implementation (as in case of Neon), this also may be not true. Still unclear why we can want to use direct_io in Neon... But still I prefer to mo.ve this decision outside executor.

3. It doesn't perform prefetch of leave pages for IOS, only referenced heap pages which are not marked as all-visible. It seems to me that if optimized has chosen IOS (and not bitmap heap scan for example), then there should be large enough fraction for all-visible pages. Also index prefetch is most efficient for OLAp queries and them are used to be performance for historical data which is all-visible. But IOS can be really handled separately in some other PR. Frankly speaking combining prefetch of leave B-Tree pages and referenced heap pages seems to be very challenged task.

4. I think that performing prefetch at executor level is really great idea and so prefetch can be used by all indexes, including custom indexes. But prefetch will be efficient only if index can provide fast access to next TID (located at the same page). I am not sure that it is true for all builtin indexes (GIN, GIST, BRIN,...) and especially for custom AM. I wonder if we should extend AM API to make index make a decision weather to perform prefetch of TIDs or not.

5. Minor notice: there are few places where index_getnext_slot is called with last NULL parameter (disabled prefetch) with the following comment
"XXX Would be nice to also benefit from prefetching here." But all this places corresponds to "point loopkup", i.e. unique constraint check, find replication tuple by index... Prefetch seems to be unlikely useful here, unlkess there is index bloating and and we have to skip a lot of tuples before locating right one. But should we try to optimize case of bloated indexes?

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: index prefetching