PoC: prefetching index leaf pages (for inserts)
От | Tomas Vondra |
---|---|
Тема | PoC: prefetching index leaf pages (for inserts) |
Дата | |
Msg-id | 8081aa62-4032-927c-8411-548e1b87ed5b@enterprisedb.com обсуждение исходный текст |
Ответы |
Re: PoC: prefetching index leaf pages (for inserts)
|
Список | pgsql-hackers |
Hi, Some time ago I started a thread about prefetching heap pages during index scans [1]. That however only helps when reading rows, not when inserting them. Imagine inserting a row into a table with many indexes - we insert the row into the table, and then into the indexes one by one, synchronously. We walk the index, determine the appropriate leaf page, read it from disk into memory, insert the index tuple, and then do the same thing for the next index. If there are many indexes, and there's not much correlation with the table, this may easily results in I/O happening synchronously with queue depth 1. Hard to make it even slower ... This can be a problem even with a modest number of indexes - imagine bulk-loading data into a table using COPY (say, in 100-row batches). Inserting the rows into heap happens in a bulk, but the indexes are still modified in a loop, as if for single-row inserts. Not great. The with multiple connections the concurrent I/O may be generated that way, but for low-concurrency workloads (e.g. batch jobs) that may not really work. I had an idea what we might do about this - we can walk the index, almost as if we're inserting the index tuple, but only the "inner" non-leaf pages. And instead of descending to the leaf page, we just prefetch it. The non-leaf pages are typically <1% of the index, and hot, so likely already cached, so not worth prefetching those. The attached patch does a PoC of this. It adds a new AM function "amprefetch", with an implementation for btree indexes, mimicking the index lookup, except that it only prefetches the leaf page as explained a bit earlier. In the executor, this is wrapped in ExecInsertPrefetchIndexes() which gets called in various places right before ExecInsertPrefetchIndexes(). I thought about doing that in ExecInsertPrefetchIndexes() directly, but that would not work for COPY, where we want to issue the prefetches for the whole batch, not for individual tuples. This may need various improvements - the prefetch duplicates a couple steps that could be expensive (e.g. evaluation of index predicates, forming index tuples, and so on). Would be nice to improve this, but good enough for PoC I think. Another gap is lack of incremental prefetch (ramp-up). We just prefetch all the indexes, for all tuples. But I think that's OK. We know we'll need those pages, and the number is fairly limited. There's a GUC enable_insert_prefetch, that can be used to enable this insert prefetching. I did a simple test on two machines - one with SATA SSD RAID, one with NVMe SSD. In both cases the data (table+indexes) are an order of magnitude larger than RAM. The indexes are on UUID, so pretty random and there's no correlation. Then batches of 100, 1000 and 10000 rows are inserted, with/without the prefetching. With 5 indexes, the results look like this: SATA SSD RAID ------------- rows no prefetch prefetch 100 176.872 ms 70.910 ms 1000 1035.056 ms 590.495 ms 10000 8494.836 ms 3216.206 ms NVMe ---- rows no prefetch prefetch 100 133.365 ms 72.899 ms 1000 1572.379 ms 829.298 ms 10000 11889.143 ms 3621.981 ms Not bad, I guess. Cutting the time to ~30% is nice. The fewer the indexes, the smaller the difference (with 1 index there is almost no difference), of course. regards [1] https://commitfest.postgresql.org/45/4351/ -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Вложения
В списке pgsql-hackers по дате отправления:
Следующее
От: Tomas VondraДата:
Сообщение: Re: BRIN minmax multi - incorrect distance for infinite timestamp/date