Re: GIN index always doing Re-check condition, postgres 9.1

Поиск
Список
Период
Сортировка
От Andrey Osenenko
Тема Re: GIN index always doing Re-check condition, postgres 9.1
Дата
Msg-id CAEB3-FhUJQDrv9oSH=f7ygXPLL7zPMK+TQiaKj0iM1zbfqdPOA@mail.gmail.com
обсуждение исходный текст
Ответ на GIN index always doing Re-check condition, postgres 9.1  (Andrey Osenenko <andrey.osenenko@gmail.com>)
Ответы Re: GIN index always doing Re-check condition, postgres 9.1
Re: GIN index always doing Re-check condition, postgres 9.1
Список pgsql-performance
Thank you.

That's really sad news. This means that even though there is an index that lets you find rows you want almost immediately, to retrieve primary keys, you still have to do a lot of disk io.

I created a new table that contains only primary key and tsvector value, and (at least that's how I'm interpreting it) since there is less data to read per row, it returns same results about 2 times as quickly (I restarted computer to make sure nothing is in memory).

Original table:
Bitmap Heap Scan on fulldata  (cost=266.79..39162.57 rows=23069 width=8) (actual time=113.472..18368.769 rows=23132 loops=1)
  Recheck Cond: (to_tsvector('russian'::regconfig, plaintext) @@ '''москв'''::tsquery)
  Buffers: shared hit=1 read=13114
  ->  Bitmap Index Scan on fulldata_plaintext_idx  (cost=0.00..261.02 rows=23069 width=0) (actual time=90.859..90.859 rows=23132 loops=1)
        Index Cond: (to_tsvector('russian'::regconfig, plaintext) @@ '''москв'''::tsquery)
        Buffers: shared hit=1 read=23
Total runtime: 18425.265 ms

Table with only key and vector:
Bitmap Heap Scan on fts  (cost=273.67..27903.61 rows=23441 width=8) (actual time=219.896..10095.159 rows=23132 loops=1)
  Recheck Cond: (vector @@ '''москв'''::tsquery)
  Buffers: shared hit=1 read=10877
  ->  Bitmap Index Scan on fts_vector_idx  (cost=0.00..267.81 rows=23441 width=0) (actual time=204.631..204.631 rows=23132 loops=1)
        Index Cond: (vector @@ '''москв'''::tsquery)
        Buffers: shared hit=1 read=23
Total runtime: 10103.858 ms

It also looks like if there was a way to create a table with just primary key and add an index to it that indexes data from another table, it would work much, much faster since there would be very little to read from disk after index lookup. But looks like there isn't.

So am I correct in assumption that as the amount of rows grows, query times for rows that are not in memory (and considering how many of them there are, most won't be) will grow linearly?

On Mon, Nov 2, 2015 at 11:14 AM, Andrey Osenenko <andrey.osenenko@gmail.com> wrote:
Thank you.

That's really sad news. This means that even though there is an index that lets you find rows you want almost immediately, to retrieve primary keys, you still have to do a lot of disk io.

I created a new table that contains only primary key and tsvector value, and (at least that's how I'm interpreting it) since there is less data to read per row, it returns same results about 2 times as quickly (I restarted computer to make sure nothing is in memory).

Original table:
Bitmap Heap Scan on fulldata  (cost=266.79..39162.57 rows=23069 width=8) (actual time=113.472..18368.769 rows=23132 loops=1)
  Recheck Cond: (to_tsvector('russian'::regconfig, plaintext) @@ '''москв'''::tsquery)
  Buffers: shared hit=1 read=13114
  ->  Bitmap Index Scan on fulldata_plaintext_idx  (cost=0.00..261.02 rows=23069 width=0) (actual time=90.859..90.859 rows=23132 loops=1)
        Index Cond: (to_tsvector('russian'::regconfig, plaintext) @@ '''москв'''::tsquery)
        Buffers: shared hit=1 read=23
Total runtime: 18425.265 ms

Table with only key and vector:
Bitmap Heap Scan on fts  (cost=273.67..27903.61 rows=23441 width=8) (actual time=219.896..10095.159 rows=23132 loops=1)
  Recheck Cond: (vector @@ '''москв'''::tsquery)
  Buffers: shared hit=1 read=10877
  ->  Bitmap Index Scan on fts_vector_idx  (cost=0.00..267.81 rows=23441 width=0) (actual time=204.631..204.631 rows=23132 loops=1)
        Index Cond: (vector @@ '''москв'''::tsquery)
        Buffers: shared hit=1 read=23
Total runtime: 10103.858 ms

It also looks like if there was a way to create a table with just primary key and add an index to it that indexes data from another table, it would work much, much faster since there would be very little to read from disk after index lookup. But looks like there isn't.

So am I correct in assumption that as the amount of rows grows, query times for rows that are not in memory (and considering how many of them there are, most won't be) will grow linearly?

В списке pgsql-performance по дате отправления:

Предыдущее
От: Jeff Janes
Дата:
Сообщение: Re: GIN index always doing Re-check condition, postgres 9.1
Следующее
От: Guido Niewerth
Дата:
Сообщение: Slow query in trigger function