Re: Improve Seq scan performance

Поиск
Список
Период
Сортировка
От Craig Ringer
Тема Re: Improve Seq scan performance
Дата
Msg-id 4917E9F1.5080902@postnewspapers.com.au
обсуждение исходный текст
Ответ на Re: Improve Seq scan performance  ("Vladimir Sitnikov" <sitnikov.vladimir@gmail.com>)
Ответы Re: Improve Seq scan performance  ("Vladimir Sitnikov" <sitnikov.vladimir@gmail.com>)
Список pgsql-performance
Vladimir Sitnikov wrote:
>> Lutischán Ferenc wrote:
>>
>>  It is possible to make an index on the table, and make a seq index scan on
>>> this values?
>>>
>> My understanding is that this isn't possible in PostgreSQL, because indexes
>> do not contain information about tuple visibility. Data read from the index
>> might refer to tuples that've been deleted as far as your transaction is
>> concerned, or to tuples that were created after your snapshot was taken.
>>
> My understanding is even though indices do not contain information on tuple
> visibility, index could be used to filter out records that is known to make
> no match.

Yes, that's what an index is for. As far as I know, if it's worth doing
that, it's worth doing an index scan instead of a seq scan.

Maybe there's some hybrid type possible where you can scan the index to
find large table regions that are known /not/ to contain tuples of
interest and seek over them in your scan. I wouldn't know, really, but
it sounds like it'd probably be more I/O than a pure seq scan (given the
reading of the index too) unless the table had the values of interest
rather neatly clustered. It'd also surely use more memory and CPU time
processing the whole index to find table regions without values of interest.

Is that what you meant, though?

> create table seq_test
>  as select cast(i as text) i, repeat('*', 500) padding from
> generate_series(1,10000) as s(i);
>
> create index i_ix on seq_test(i);
>
> vacuum analyze verbose seq_test;
> -- index "i_ix" now contains 10000 row versions in *30 *pages
> -- "seq_test": found 0 removable, 10000 nonremovable row versions in *667 *
> pages
>
> explain analyze select * from seq_test where i like '%123%';

A b-tree index cannot be used on a LIKE query with a leading wildcard.
See the FAQ.

In addition, if your database is not in the C locale you can't use an
ordinary index for LIKE queries. See the FAQ. You need to create a
text_pattern_ops index instead:

create index i_ix_txt on seq_test(i text_pattern_ops);

> set enable_seqscan=off
> -- Index Scan reads 2529 pages for some reason. I would expect *30 *(index
> size) + *20 *(number of matching entries) = 50 pages maximum, that is 10
> times better than with seq scan.
> Index Scan using i_ix on seq_test  (cost=0.00..1643.74 rows=356 width=508)
> (actual time=0.334..16.746 rows=*20 *loops=1 read_shared=2529(2529)
> read_local=0(0) flush=0 local_flush=0 file_read=0 file_write=0)
>   Filter: (i ~~ '%123%'::text)
> Total runtime: 16.863 ms

I think it's reading the whole index, because it can't do a prefix
search if there's a leading wildcard. I'm a bit confused, though, since
I thought in this case it couldn't actually execute the query w/o a
sequential scan, and would just use one irrespective of the
enable_seqscan param. That's what happens here.

Here's what I get:

test=# create table seq_test as select cast(i as text) AS i, repeat('*',
500) AS padding from generate_series(1,10000) as s(i);
SELECT
test=# create index i_ix on seq_test(i);
CREATE INDEX
test=# create index i_ix_txt on seq_test(i text_pattern_ops);
CREATE INDEX
test=# vacuum analyze verbose seq_test;
-- blah blah
INFO:  "seq_test": scanned 667 of 667 pages, containing 10000 live rows
and 0 dead rows; 3000 rows in sample, 10000 estimated total rows

test=# explain analyze select * from seq_test where i like '%123%';
                                                QUERY PLAN

---------------------------------------------------------------------------------------------------------
  Seq Scan on seq_test  (cost=0.00..792.00 rows=400 width=508) (actual
time=0.081..5.239 rows=20 loops=1)
    Filter: (i ~~ '%123%'::text)
  Total runtime: 5.281 ms
(3 rows)

-- Now, note lack of leading wildcard:
test=# explain analyze select * from seq_test where i like '123%';
                                                     QUERY PLAN

-------------------------------------------------------------------------------------------------------------------
  Bitmap Heap Scan on seq_test  (cost=4.35..40.81 rows=10 width=508)
(actual time=0.062..0.081 rows=11 loops=1)
    Filter: (i ~~ '123%'::text)
    ->  Bitmap Index Scan on i_ix_txt  (cost=0.00..4.35 rows=10 width=0)
(actual time=0.048..0.048 rows=11 loops=1)
          Index Cond: ((i ~>=~ '123'::text) AND (i ~<~ '124'::text))
  Total runtime: 0.121 ms
(5 rows)

test=# set enable_seqscan=off;
SET
test=# explain analyze select * from seq_test where i like '%123%';
                                                       QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------
  Seq Scan on seq_test  (cost=100000000.00..100000792.00 rows=400
width=508) (actual time=0.088..5.666 rows=20 loops=1)
    Filter: (i ~~ '%123%'::text)
  Total runtime: 5.702 ms
(3 rows)

--
Craig Ringer

В списке pgsql-performance по дате отправления:

Предыдущее
От: "Vladimir Sitnikov"
Дата:
Сообщение: Re: Improve Seq scan performance
Следующее
От: "Vladimir Sitnikov"
Дата:
Сообщение: Re: Improve Seq scan performance