[PATCH] Add sortsupport for range types and btree_gist

Поиск
Список
Период
Сортировка
От Christoph Heiss
Тема [PATCH] Add sortsupport for range types and btree_gist
Дата
Msg-id 437ccbcf-8f80-2919-411d-a3af88becf6c@cybertec.at
обсуждение исходный текст
Ответы Re: [PATCH] Add sortsupport for range types and btree_gist  (Andrey Borodin <x4mmm@yandex-team.ru>)
Re: [PATCH] Add sortsupport for range types and btree_gist  (Jacob Champion <jchampion@timescale.com>)
Re: [PATCH] Add sortsupport for range types and btree_gist  (Bernd Helmle <mailings@oopsware.de>)
Список pgsql-hackers
Hi all!

The motivation behind this is that incrementally building up a GiST 
index for certain input data can create a terrible tree structure.
Furthermore, exclusion constraints are commonly implemented using GiST 
indices and for that use case, data is mostly orderable.

By sorting the data before inserting it into the index results in a much 
better index structure, leading to significant performance improvements.

Testing was done using following setup, with about 50 million rows:

    CREATE EXTENSION btree_gist;
    CREATE TABLE t (id uuid, block_range int4range);
    CREATE INDEX ON before USING GIST (id, block_range);
    COPY t FROM '..' DELIMITER ',' CSV HEADER;

using

    SELECT * FROM t WHERE id = '..' AND block_range && '..'

as test query, using a unpatched instance and one with the patch applied.

Some stats for fetching 10,000 random rows using the query above,
100 iterations to get good averages.

The benchmarking was done on a unpatched instance compiled using the 
exact same options as with the patch applied.
[ Results are noted in a unpatched -> patched fashion. ]

First set of results are after the initial CREATE TABLE, CREATE INDEX 
and a COPY to the table, thereby incrementally building the index.

Shared Hit Blocks (average): 110.97 -> 78.58
Shared Read Blocks (average): 58.90 -> 47.42
Execution Time (average): 1.10 -> 0.83 ms
I/O Read Time (average): 0.19 -> 0.15 ms

After a REINDEX on the table, the results improve even more:

Shared Hit Blocks (average): 84.24 -> 8.54
Shared Read Blocks (average): 49.89 -> 0.74
Execution Time (average): 0.84 -> 0.065 ms
I/O Read Time (average): 0.16 -> 0.004 ms

Additionally, the time a REINDEX takes also improves significantly:

     672407.584 ms (11:12.408) -> 130670.232 ms (02:10.670)

Most of the sortsupport for btree_gist was implemented by re-using 
already existing infrastructure. For the few remaining types (bit, bool, 
cash, enum, interval, macaddress8 and time) I manually implemented them 
directly in btree_gist.
It might make sense to move them into the backend for uniformity, but I 
wanted to get other opinions on that first.

`make check-world` reports no regressions.

Attached below, besides the patch, are also two scripts for benchmarking.

`bench-gist.py` to benchmark the actual patch, example usage of this 
would be e.g. `./bench-gist.py -o results.csv public.table`. This 
expects a local instance with no authentication and default `postgres` 
user. The port can be set using the `--port` option.

`plot.py` prints average values (as used above) and creates boxplots for 
each statistic from the result files produced with `bench-gist.py`. 
Depends on matplotlib and pandas.

Additionally, if needed, the sample dataset used to benchmark this is 
available to independently verify the results [1].

Thanks,
Christoph Heiss

---

[1] https://drive.google.com/file/d/1SKRiUYd78_zl7CeD8pLDoggzCCh0wj39
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Eisentraut
Дата:
Сообщение: Re: replacing role-level NOINHERIT with a grant-level option
Следующее
От: Przemysław Sztoch
Дата:
Сообщение: Re: [PATCH] Completed unaccent dictionary with many missing characters