text search: tablescan cost for a tsvector

Поиск
Список
Период
Сортировка
От Marc Mamin
Тема text search: tablescan cost for a tsvector
Дата
Msg-id C4DAC901169B624F933534A26ED7DF310861B363@JENMAIL01.ad.intershop.net
обсуждение исходный текст
Ответы Re: text search: tablescan cost for a tsvector  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-performance
Hello,

I have quite systematically better performance with the text search when
I disable the statistics collection for the tsvector column.
So I wonder if such statistics ever make sense.

Here a testcase:

The table contains 200'000 tsvector, whereas the lexeme 'fooblablabla'
exists in all tsvector:
Without statistics, the planner decide as expected for the gin index.
After analyze, it switch to a table scan which is also expected, but the
query is 3 times slower.

My first thought was that the Bitmap Heap Scan was really fast as the
searched term is always at the first position.
So I repeated the test with an additional search term at the last
position, but without significant change:

(result from the 6. test below)

without analyze: http://explain.depesz.com/s/6At
with analyze:    http://explain.depesz.com/s/r3B


best regards,

Marc Mamin




Here all my results, always one of the fastest from a few runs.


CREATE TABLE tsv_test
(
  id bigserial NOT NULL,
  v tsvector
);


<The code to fill the table with test data can be found below>


The test query:

explain analyze
select id from tsv_test where v @@ 'lexeme3179'::tsquery
UNION ALL
select id from tsv_test where v @@ 'lexeme5'::tsquery
UNION ALL
select id from tsv_test where v @@ 'fooblablabla'::tsquery

The results

A) on first lexeme

1) without indexes without analyze:
   http://explain.depesz.com/s/bOv

2) alter table tsv_test add constraint tsv_test_pk primary key(id);
   http://explain.depesz.com/s/9QQ (same as previous);

3) create index tsv_gin on tsv_test using gin(v);
   http://explain.depesz.com/s/r4M <= fastest

4) ANALYZE tsv_test (id);
   http://explain.depesz.com/s/MyC (same as previous);

5) ANALYZE tsv_test;
   http://explain.depesz.com/s/qu3S


B) on lastlexeme

6) create table  tsv_test2 as select id,
   v||'zzthisisalongerlexemethisisalongerlexeme'::tsvector
   from tsv_test;

   explain analyze
   select id from tsv_test2 where v @@
'zzthisisalongerlexemethisisalongerlexeme'::tsquery

   http://explain.depesz.com/s/6At

   ANALYZE tsv_test2;

   http://explain.depesz.com/s/r3B



test data:

insert into tsv_test (v)
select
cast('fooblablabla' ||
' lexeme'||s%2|| ' lexeme'||s%3|| ' lexeme'||s%4||
' lexeme'||s%4|| ' lexeme'||s%5|| ' lexeme'||s%6||
' lexeme'||s%7|| ' lexeme'||s%8|| ' lexeme'||s%9||
' lexeme'||s%10 || ' lexeme2'||s%11 || ' lexeme3'||s%12 ||
' lexeme'||s%11 || ' lexeme2'||s%12 || ' lexeme3'||s%22 ||
' lexeme'||s%12 || ' lexeme2'||s%13 || ' lexeme3'||s%32 ||
' lexeme'||s%13 || ' lexeme2'||s%14 || ' lexeme3'||s%42 ||
' lexeme'||s%14 || ' lexeme2'||s%15 || ' lexeme3'||s%52 ||
' lexeme'||s%15 || ' lexeme2'||s%16 || ' lexeme3'||s%62 ||
' lexeme'||s%16 || ' lexeme2'||s%17 || ' lexeme3'||s%72 ||
' lexeme'||s%17 || ' lexeme2'||s%18 || ' lexeme3'||s%82 ||
' lexeme'||s%18 || ' lexeme2'||s%19 || ' lexeme3'||s%92 ||
' lexeme'||s%19 || ' lexeme2'||s%10 || ' lexeme3'||s%15 ||
' lexeme'||s%12 || ' lexeme2'||s%71 || ' lexeme3'||s%16 ||
' lexeme'||s%20 || ' lexeme2'||s%81 || ' lexeme3'||s%17 ||
' lexeme'||s%35 || ' lexeme2'||s%91 || ' lexeme3'||s%18 ||
' lexeme'||s%100 || ' lexeme2'||s%110 || ' lexeme3'||s%120 ||
' lexeme'||s%110 || ' lexeme2'||s%120 || ' lexeme3'||s%220 ||
' lexeme'||s%120 || ' lexeme2'||s%130 || ' lexeme3'||s%320 ||
' lexeme'||s%130 || ' lexeme2'||s%140 || ' lexeme3'||s%420 ||
' lexeme'||s%140 || ' lexeme2'||s%150 || ' lexeme3'||s%520 ||
' lexeme'||s%150 || ' lexeme2'||s%160 || ' lexeme3'||s%620 ||
' lexeme'||s%160 || ' lexeme2'||s%170 || ' lexeme3'||s%720 ||
' lexeme'||s%170 || ' lexeme2'||s%180 || ' lexeme3'||s%820 ||
' lexeme'||s%180 || ' lexeme2'||s%190 || ' lexeme3'||s%920 ||
' lexeme'||s%190 || ' lexeme2'||s%100 || ' lexeme3'||s%150 ||
' lexeme'||s%120 || ' lexeme2'||s%710 || ' lexeme3'||s%160 ||
' lexeme'||s%200 || ' lexeme2'||s%810 || ' lexeme3'||s%170 ||
' lexeme'||s%350 || ' lexeme2'||s%910 || ' lexeme3'||s%180
as tsvector)
FROM generate_series(1,100000) s
UNION ALL
select
cast('fooblablabla' ||
' thisisalongerlexemethisisalongerlexeme'||s%2|| '
thisisalongerlexemethisisalongerlexeme'||s%3|| '
thisisalongerlexemethisisalongerlexeme'||s%4||
' thisisalongerlexemethisisalongerlexeme'||s%4|| '
thisisalongerlexemethisisalongerlexeme'||s%5|| '
thisisalongerlexemethisisalongerlexeme'||s%6||
' thisisalongerlexemethisisalongerlexeme'||s%7|| '
thisisalongerlexemethisisalongerlexeme'||s%8|| '
thisisalongerlexemethisisalongerlexeme'||s%9||
' thisisalongerlexemethisisalongerlexeme'||s%10 || '
thisisalongerlexemethisisalongerlexeme2'||s%11 || '
thisisalongerlexemethisisalongerlexeme3'||s%12 ||
' thisisalongerlexemethisisalongerlexeme'||s%11 || '
thisisalongerlexemethisisalongerlexeme2'||s%12 || '
thisisalongerlexemethisisalongerlexeme3'||s%22 ||
' thisisalongerlexemethisisalongerlexeme'||s%12 || '
thisisalongerlexemethisisalongerlexeme2'||s%13 || '
thisisalongerlexemethisisalongerlexeme3'||s%32 ||
' thisisalongerlexemethisisalongerlexeme'||s%13 || '
thisisalongerlexemethisisalongerlexeme2'||s%14 || '
thisisalongerlexemethisisalongerlexeme3'||s%42 ||
' thisisalongerlexemethisisalongerlexeme'||s%14 || '
thisisalongerlexemethisisalongerlexeme2'||s%15 || '
thisisalongerlexemethisisalongerlexeme3'||s%52 ||
' thisisalongerlexemethisisalongerlexeme'||s%15 || '
thisisalongerlexemethisisalongerlexeme2'||s%16 || '
thisisalongerlexemethisisalongerlexeme3'||s%62 ||
' thisisalongerlexemethisisalongerlexeme'||s%16 || '
thisisalongerlexemethisisalongerlexeme2'||s%17 || '
thisisalongerlexemethisisalongerlexeme3'||s%72 ||
' thisisalongerlexemethisisalongerlexeme'||s%17 || '
thisisalongerlexemethisisalongerlexeme2'||s%18 || '
thisisalongerlexemethisisalongerlexeme3'||s%82 ||
' thisisalongerlexemethisisalongerlexeme'||s%18 || '
thisisalongerlexemethisisalongerlexeme2'||s%19 || '
thisisalongerlexemethisisalongerlexeme3'||s%92 ||
' thisisalongerlexemethisisalongerlexeme'||s%19 || '
thisisalongerlexemethisisalongerlexeme2'||s%10 || '
thisisalongerlexemethisisalongerlexeme3'||s%15 ||
' thisisalongerlexemethisisalongerlexeme'||s%12 || '
thisisalongerlexemethisisalongerlexeme2'||s%71 || '
thisisalongerlexemethisisalongerlexeme3'||s%16 ||
' thisisalongerlexemethisisalongerlexeme'||s%20 || '
thisisalongerlexemethisisalongerlexeme2'||s%81 || '
thisisalongerlexemethisisalongerlexeme3'||s%17 ||
' thisisalongerlexemethisisalongerlexeme'||s%35 || '
thisisalongerlexemethisisalongerlexeme2'||s%91 || '
thisisalongerlexemethisisalongerlexeme3'||s%18 ||
' thisisalongerlexemethisisalongerlexeme'||s%100 || '
thisisalongerlexemethisisalongerlexeme2'||s%110 || '
thisisalongerlexemethisisalongerlexeme3'||s%120 ||
' thisisalongerlexemethisisalongerlexeme'||s%110 || '
thisisalongerlexemethisisalongerlexeme2'||s%120 || '
thisisalongerlexemethisisalongerlexeme3'||s%220 ||
' thisisalongerlexemethisisalongerlexeme'||s%120 || '
thisisalongerlexemethisisalongerlexeme2'||s%130 || '
thisisalongerlexemethisisalongerlexeme3'||s%320 ||
' thisisalongerlexemethisisalongerlexeme'||s%130 || '
thisisalongerlexemethisisalongerlexeme2'||s%140 || '
thisisalongerlexemethisisalongerlexeme3'||s%420 ||
' thisisalongerlexemethisisalongerlexeme'||s%140 || '
thisisalongerlexemethisisalongerlexeme2'||s%150 || '
thisisalongerlexemethisisalongerlexeme3'||s%520 ||
' thisisalongerlexemethisisalongerlexeme'||s%150 || '
thisisalongerlexemethisisalongerlexeme2'||s%160 || '
thisisalongerlexemethisisalongerlexeme3'||s%620 ||
' thisisalongerlexemethisisalongerlexeme'||s%160 || '
thisisalongerlexemethisisalongerlexeme2'||s%170 || '
thisisalongerlexemethisisalongerlexeme3'||s%720 ||
' thisisalongerlexemethisisalongerlexeme'||s%170 || '
thisisalongerlexemethisisalongerlexeme2'||s%180 || '
thisisalongerlexemethisisalongerlexeme3'||s%820 ||
' thisisalongerlexemethisisalongerlexeme'||s%180 || '
thisisalongerlexemethisisalongerlexeme2'||s%190 || '
thisisalongerlexemethisisalongerlexeme3'||s%920 ||
' thisisalongerlexemethisisalongerlexeme'||s%190 || '
thisisalongerlexemethisisalongerlexeme2'||s%100 || '
thisisalongerlexemethisisalongerlexeme3'||s%150 ||
' thisisalongerlexemethisisalongerlexeme'||s%120 || '
thisisalongerlexemethisisalongerlexeme2'||s%710 || '
thisisalongerlexemethisisalongerlexeme3'||s%160 ||
' thisisalongerlexemethisisalongerlexeme'||s%200 || '
thisisalongerlexemethisisalongerlexeme2'||s%810 || '
thisisalongerlexemethisisalongerlexeme3'||s%170 ||
' thisisalongerlexemethisisalongerlexeme'||s%350 || '
thisisalongerlexemethisisalongerlexeme2'||s%910 || '
thisisalongerlexemethisisalongerlexeme3'||s%180
as tsvector)
FROM generate_series(1,100000) s



В списке pgsql-performance по дате отправления:

Предыдущее
От: Gudmundur Johannesson
Дата:
Сообщение: Re: Index with all necessary columns - Postgres vs MSSQL
Следующее
От: Saurabh
Дата:
Сообщение: Re: How to improve insert speed with index on text column