[WIP PATCH] Index scan offset optimisation using visibility map

Поиск
Список
Период
Сортировка
От Michail Nikolaev
Тема [WIP PATCH] Index scan offset optimisation using visibility map
Дата
Msg-id CANtu0oi3a1Rf1PVsBufQbm+g9ytSv75+kp7kJYvK5C1qF_0Siw@mail.gmail.com
обсуждение исходный текст
Ответы Re: [WIP PATCH] Index scan offset optimisation using visibility map
Список pgsql-hackers
Hello.

WIP-Patch for optimisation of OFFSET + IndexScan using visibility map.
Patch based on idea of Maxim Boguk [1] with some inspiration from Douglas Doole [2].
---------
Everyone knows - using OFFSET (especially big) is not an good practice.
But in reality they widely used mostly for paging (because it is simple).

Typical situation is some table (for example tickets) with indexes used for paging\sorting:

VACUUM FULL;
VACUUM ANALYZE ticket;
SET work_mem = '512MB';
SET random_page_cost = 1.0;

CREATE TABLE ticket AS
SELECT
id,
TRUNC(RANDOM() * 100 + 1) AS project_id,
NOW() + (RANDOM() * (NOW()+'365 days' - NOW())) AS created_date,
repeat((TRUNC(RANDOM() * 100 + 1)::text), 1000) as payload
FROM GENERATE_SERIES(1, 1000000) AS g(id);

CREATE INDEX simple_index ON ticket using btree(project_id, created_date);

And some typical query to do offset on tickets of some project with paging, some filtration (based on index) and sorting:

SELECT * FROM ticket
WHERE project_id = ?
AND created_date > '20.06.2017'
ORDER BY created_date offset 500 limit 100;

At the current moment IndexScan node will be required to do 600 heap fetches to execute the query.
But first 500 of them are just dropped by the NodeLimit.

The idea of the patch is to push down offset information in ExecSetTupleBound (like it done for Top-sort) to IndexScan in case
of simple scan (without projection, reordering and qual). In such situation we could use some kind of index only scan
(even better because we dont need index data) to avoid fetching tuples while they are just thrown away by nodeLimit.

Patch is also availble on Github:
https://github.com/michail-nikolaev/postgres/commit/a368c3483250e4c02046d418a27091678cb963f4?diff=split
And some test here:
https://gist.github.com/michail-nikolaev/b7cbe1d6f463788407ebcaec8917d1e0

So, at the moment everything seems to work (check-world is ok too) and I got next result for test ticket table:
| offset | master | patch
| 100 | ~1.3ms | ~0.7ms
| 1000 | ~5.6ms | ~1.1ms
| 10000 | ~46.7ms | ~3.6ms

To continue development I have following questions:
0) Maybe I missed something huge...
1) Is it required to support non-mvvc (dirty) snapshots? They are not supported for IndexOnlyScan - not sure about IndexScan.
2) Should I try to pass informaiton about such optimisation to planner/optimizer? It is not too easy with current desigh but seems possible.
3) If so, should I add something to EXPLAIN?
4) If so, should I add some counters to EXPLAIN ANALYZE? (for example number of heap fetch avoided).
5) Should I add description of optimisation to https://www.postgresql.org/docs/10/static/queries-limit.html ?
6) Maybe you have some ideas for additional tests I need to add.

Thanks a lot.

[1] https://www.postgresql.org/message-id/CAK-MWwQpZobHfuTtHj9%2B9G%2B5%3Dck%2BaX-ANWHtBK_0_D_qHYxWuw%40mail.gmail.com
[2] https://www.postgresql.org/message-id/CADE5jYLuugnEEUsyW6Q_4mZFYTxHxaVCQmGAsF0yiY8ZDggi-w%40mail.gmail.com
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: pgsql: doc: clearify trigger behavior for inheritance
Следующее
От: Bruce Momjian
Дата:
Сообщение: Re: proposal: alternative psql commands quit and exit