Re: Questions about btree_gin vs btree_gist for low cardinality columns

Поиск
Список
Период
Сортировка
От Jeremy Finzel
Тема Re: Questions about btree_gin vs btree_gist for low cardinality columns
Дата
Msg-id CAMa1XUg_1xLyKshjB-oNa1L41QukrTj3hf226hYHaP3kgpj2qA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Questions about btree_gin vs btree_gist for low cardinality columns  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-general
On Sun, Jun 2, 2019 at 6:07 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Anyway, the larger point here is that right now btree_gin is just a quick
hack, and it seems like it might be worth putting some more effort into
it, because the addition of duplicate-compression changes the calculus
for whether it's useful.

Thank you to all for the thoughtful and thorough replies!

To clarify, I am definitely more interested in "low cardinality" in the sense NOT of a boolean or very few values, but rather enough values to where:
  1. It's selective enough that having *some kind of index* actually will significantly speed up queries vs. a sequential scan
  2. There are too many values to use partial indexes easily without it becoming a kind of maintenance nightmare
In our environment, we happen to have this kind of situation a lot.  For example, account codes, or other foreign key ids to lookup tables that categorize data in some way that is frequently queried, exist on tables with upwards of 100 million rows.  Typically it may be something like 50 to 500 unique values.

Historically, we have just had a bunch of regular btree indexes on these, and there are quite a lot of them indeed.

Here is my specific example:
  • Table has 127 million rows, including a toast field.  The table is 270GB
  • The filter is on a field with only 16 unique values.
  • The actual filter condition is filtering a join to 4 of the 16 unique values
As I said, performance was nearly identical for btree vs. gin.  But I was much happier with the memory usage of GIN given its tiny size:
  • Memory for join using btree: Buffers: shared hit=12 read=328991
  • Memory for join using gin: Buffers: shared hit=12 read=13961
The btree index here is 3.8GB, whereas the gin index is only 200MB.  But I have not tested how either index handles bloat.

Morris, I think the reason your gin index was clearly slower was precisely because your example uses an index-only scan - which gin does not support.

Thanks,
Jeremy

В списке pgsql-general по дате отправления:

Предыдущее
От: Steve Atkins
Дата:
Сообщение: Re: Research on ?? operators
Следующее
От: Matthias Apitz
Дата:
Сообщение: Re: Two small questions re/ COPY CSV data into table