Re: Questions about btree_gin vs btree_gist for low cardinality columns

Поиск

Список

Период

Сортировка

От	Jeremy Finzel
Тема	Re: Questions about btree_gin vs btree_gist for low cardinality columns
Дата	4 июня 2019 г. 16:37:22
Msg-id	CAMa1XUg_1xLyKshjB-oNa1L41QukrTj3hf226hYHaP3kgpj2qA@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Questions about btree_gin vs btree_gist for low cardinality columns (Tom Lane <tgl@sss.pgh.pa.us>)
Список	pgsql-general

Дерево обсуждения

On Sun, Jun 2, 2019 at 6:07 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Anyway, the larger point here is that right now btree_gin is just a quick
hack, and it seems like it might be worth putting some more effort into
it, because the addition of duplicate-compression changes the calculus
for whether it's useful.

Thank you to all for the thoughtful and thorough replies!

To clarify, I am definitely more interested in "low cardinality" in the sense NOT of a boolean or very few values, but rather enough values to where:

It's selective enough that having *some kind of index* actually will significantly speed up queries vs. a sequential scan
There are too many values to use partial indexes easily without it becoming a kind of maintenance nightmare

In our environment, we happen to have this kind of situation a lot. For example, account codes, or other foreign key ids to lookup tables that categorize data in some way that is frequently queried, exist on tables with upwards of 100 million rows. Typically it may be something like 50 to 500 unique values.

Historically, we have just had a bunch of regular btree indexes on these, and there are quite a lot of them indeed.

Here is my specific example:

Table has 127 million rows, including a toast field. The table is 270GB
The filter is on a field with only 16 unique values.
The actual filter condition is filtering a join to 4 of the 16 unique values

As I said, performance was nearly identical for btree vs. gin. But I was much happier with the memory usage of GIN given its tiny size:

Memory for join using btree: Buffers: shared hit=12 read=328991
Memory for join using gin: Buffers: shared hit=12 read=13961

The btree index here is 3.8GB, whereas the gin index is only 200MB. But I have not tested how either index handles bloat.

Morris, I think the reason your gin index was clearly slower was precisely because your example uses an index-only scan - which gin does not support.

Thanks,

Jeremy

В списке pgsql-general по дате отправления:

Предыдущее

От: Steve Atkins
Дата: 04 июня 2019 г., 15:44:34
Сообщение: Re: Research on ?? operators

Следующее

От: Matthias Apitz
Дата: 04 июня 2019 г., 18:03:18
Сообщение: Re: Two small questions re/ COPY CSV data into table

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Questions about btree_gin vs btree_gist for low cardinality columns

Предыдущее

Следующее