Re: FTS performance issue - planner problem identified (but only partially resolved)

Поиск
Список
Период
Сортировка
От Marc Mamin
Тема Re: FTS performance issue - planner problem identified (but only partially resolved)
Дата
Msg-id B6F6FD62F2624C4C9916AC0175D56D880CE0DDEE@jenmbs01.ad.intershop.net
обсуждение исходный текст
Ответ на Re: FTS performance issue - planner problem identified (but only partially resolved)  (Stefan Keller <sfkeller@gmail.com>)
Список pgsql-performance

________________________________________
Von: Stefan Keller [sfkeller@gmail.com]
>Gesendet: Samstag, 20. Juli 2013 01:55
>
>Hi Marc
>
>Thanks a lot for your hint!
>
>You mean doing a "SET track_counts (true);" for the whole session?

No,
I mean

ALTER TABLE <table> ALTER  <ts_vector_column> SET STATISTICS 0;

And remove existing statistics

DELETE FROM pg_catalog.pg_statistic
where starelid='<table>':: regclass
AND staattnum = (SELECT attnum FROM      pg_attribute
                WHERE attrelid = '<table>':: regclass
                AND  attname  =  '<ts_vector_column>'::name
                )

But you should first try to find out which proportion of your ts queries are faster
when using a table scan as they will probably not happen anymore afterwards !
(Except if further columns on your table 'FullTextSearch' are considered by the planner)




>That would be ok if it would be possible just for the gin index.
>
>It's obviously an issue of the planner estimation costs.
>The data I'm speaking about ("movies") has a text attribute which has
>a length of more than 8K so it's obviously having to do with
>detoasting.
>But the thoughts about @@ operators together with this GIN index seem
>also to be valid.
>
>I hope this issue is being tracked in preparation for 9.3.
>
>Regards, Stefan
>
>
>2013/7/19 Marc Mamin <M.Mamin@intershop.de>:
>>
>>> SELECT * FROM FullTextSearch WHERE content_tsv_gin @@
>>> plainto_tsquery('english', 'good');
>>>
>>> It's slow (> 30 sec.) for some GB (27886 html files, originally 73 MB zipped).
>>> The planner obviously always chooses table scan
>>
>>
>> Hello,
>>
>> A probable reason for the time difference is the cost for decompressing toasted content.
>> At least in 8.3, the planner was not good at estimating it.
>>
>> I'm getting better overall performances since I've stopped collect statistic on tsvectors.
>> An alternative would have been to disallow compression on them.
>>
>> I'm aware this is a drastic way and would not recommend it without testing. The benefit may depend on the type of
datayou are indexing. 
>> In our use case these are error logs with many java stack traces, hence with many lexemes poorly discriminative.
>>
>> see: http://www.postgresql.org/message-id/27953.1329434125@sss.pgh.pa.us
>> as a comment on
>> http://www.postgresql.org/message-id/C4DAC901169B624F933534A26ED7DF310861B363@JENMAIL01.ad.intershop.net
>>
>> regards,
>>
>> Marc Mamin


В списке pgsql-performance по дате отправления:

Предыдущее
От: Stefan Keller
Дата:
Сообщение: Re: FTS performance issue - planner problem identified (but only partially resolved)
Следующее
От: amul sul
Дата:
Сообщение: Fw: [osdldbt-general] Running DBT5 on remote database server