Re: GIN index always doing Re-check condition, postgres 9.1

Поиск

Список

Период

Сортировка

От	Andrey Osenenko
Тема	Re: GIN index always doing Re-check condition, postgres 9.1
Дата	2 ноября 2015 г. 08:19:27
Msg-id	CAEB3-FhUJQDrv9oSH=f7ygXPLL7zPMK+TQiaKj0iM1zbfqdPOA@mail.gmail.com обсуждение исходный текст
Ответ на	GIN index always doing Re-check condition, postgres 9.1 (Andrey Osenenko <andrey.osenenko@gmail.com>)
Ответы	Re: GIN index always doing Re-check condition, postgres 9.1 Re: GIN index always doing Re-check condition, postgres 9.1
Список	pgsql-performance

Дерево обсуждения

Thank you.

That's really sad news. This means that even though there is an index that lets you find rows you want almost immediately, to retrieve primary keys, you still have to do a lot of disk io.

I created a new table that contains only primary key and tsvector value, and (at least that's how I'm interpreting it) since there is less data to read per row, it returns same results about 2 times as quickly (I restarted computer to make sure nothing is in memory).

Original table:
Bitmap Heap Scan on fulldata (cost=266.79..39162.57 rows=23069 width=8) (actual time=113.472..18368.769 rows=23132 loops=1)
Recheck Cond: (to_tsvector('russian'::regconfig, plaintext) @@ '''москв'''::tsquery)
Buffers: shared hit=1 read=13114
-> Bitmap Index Scan on fulldata_plaintext_idx (cost=0.00..261.02 rows=23069 width=0) (actual time=90.859..90.859 rows=23132 loops=1)
        Index Cond: (to_tsvector('russian'::regconfig, plaintext) @@ '''москв'''::tsquery)
        Buffers: shared hit=1 read=23
Total runtime: 18425.265 ms

Table with only key and vector:
Bitmap Heap Scan on fts (cost=273.67..27903.61 rows=23441 width=8) (actual time=219.896..10095.159 rows=23132 loops=1)
Recheck Cond: (vector @@ '''москв'''::tsquery)
Buffers: shared hit=1 read=10877
-> Bitmap Index Scan on fts_vector_idx (cost=0.00..267.81 rows=23441 width=0) (actual time=204.631..204.631 rows=23132 loops=1)
        Index Cond: (vector @@ '''москв'''::tsquery)
        Buffers: shared hit=1 read=23
Total runtime: 10103.858 ms

It also looks like if there was a way to create a table with just primary key and add an index to it that indexes data from another table, it would work much, much faster since there would be very little to read from disk after index lookup. But looks like there isn't.

So am I correct in assumption that as the amount of rows grows, query times for rows that are not in memory (and considering how many of them there are, most won't be) will grow linearly?

On Mon, Nov 2, 2015 at 11:14 AM, Andrey Osenenko <andrey.osenenko@gmail.com> wrote:

Thank you.

That's really sad news. This means that even though there is an index that lets you find rows you want almost immediately, to retrieve primary keys, you still have to do a lot of disk io.

I created a new table that contains only primary key and tsvector value, and (at least that's how I'm interpreting it) since there is less data to read per row, it returns same results about 2 times as quickly (I restarted computer to make sure nothing is in memory).

Original table:
Bitmap Heap Scan on fulldata (cost=266.79..39162.57 rows=23069 width=8) (actual time=113.472..18368.769 rows=23132 loops=1)
Recheck Cond: (to_tsvector('russian'::regconfig, plaintext) @@ '''москв'''::tsquery)
Buffers: shared hit=1 read=13114
-> Bitmap Index Scan on fulldata_plaintext_idx (cost=0.00..261.02 rows=23069 width=0) (actual time=90.859..90.859 rows=23132 loops=1)
        Index Cond: (to_tsvector('russian'::regconfig, plaintext) @@ '''москв'''::tsquery)
        Buffers: shared hit=1 read=23
Total runtime: 18425.265 ms

Table with only key and vector:
Bitmap Heap Scan on fts (cost=273.67..27903.61 rows=23441 width=8) (actual time=219.896..10095.159 rows=23132 loops=1)
Recheck Cond: (vector @@ '''москв'''::tsquery)
Buffers: shared hit=1 read=10877
-> Bitmap Index Scan on fts_vector_idx (cost=0.00..267.81 rows=23441 width=0) (actual time=204.631..204.631 rows=23132 loops=1)
        Index Cond: (vector @@ '''москв'''::tsquery)
        Buffers: shared hit=1 read=23
Total runtime: 10103.858 ms

It also looks like if there was a way to create a table with just primary key and add an index to it that indexes data from another table, it would work much, much faster since there would be very little to read from disk after index lookup. But looks like there isn't.

So am I correct in assumption that as the amount of rows grows, query times for rows that are not in memory (and considering how many of them there are, most won't be) will grow linearly?

В списке pgsql-performance по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: GIN index always doing Re-check condition, postgres 9.1