Re: Forcing the right queryplan

Поиск
Список
Период
Сортировка
От Alban Hertroys
Тема Re: Forcing the right queryplan
Дата
Msg-id 8DCA2553-B84E-424B-8806-DA7A2FE6D8C0@solfertje.student.utwente.nl
обсуждение исходный текст
Ответ на Forcing the right queryplan  (Henk van Lingen <H.G.K.vanLingen@uu.nl>)
Ответы Re: Forcing the right queryplan  (Henk van Lingen <H.G.K.vanLingen@uu.nl>)
Список pgsql-general
Sorry for not replying earlier, I've been quite busy.

On 31 Aug 2010, at 16:50, Henk van Lingen wrote:

> syslog=# \d systemevents
>                                         Table "public.systemevents"
>       Column       |            Type             |                         Modi
> fiers
> --------------------+-----------------------------+-----------------------------
> ------------------------------
> id                 | integer                     | not null default nextval('sy

(...)

> message            | text                        |

(...)

> Indexes:
>    "systemevents_pkey" PRIMARY KEY, btree (id)
>    "fromhost_idx" btree (fromhost)
>    "msgs_idx" gin (to_tsvector('english'::regconfig, message))
>
> The GIN index is to do text searching (via LogAnalyzer).
>
> Now there are two types of query plans:

Do you have output of explain analyse for these queries as well? It's hard to see what is actually going on with just
theexplain - we can't see which part of the query is more expensive than the planner expected, for starters. 

> syslog=# explain SELECT id, devicereportedtime, facility, priority, fromhost, syslogtag, infounitid, message FROM
systemeventsWHERE (  ( to_tsvector('english', message) @@ to_tsquery ( '131.211.112.9')) )  ORDER BY id DESC LIMIT 100;
                                                 QUERY PLAN                    
>
> --------------------------------------------------------------------------------
> ---------------------------------
> Limit  (cost=0.00..10177.22 rows=100 width=159)
>   ->  Index Scan Backward using systemevents_pkey on systemevents  (cost=0.00..
> 1052934.86 rows=10346 width=159)
>         Filter: (to_tsvector('english'::regconfig, message) @@ to_tsquery('131.
> 211.112.9'::text))
> (3 rows)
>
> This one is useless (takes very long). However this one:
>
> syslog=# explain SELECT id, devicereportedtime, facility, priority, fromhost, syslogtag, infounitid, message FROM
systemeventsWHERE (  ( to_tsvector('english', message) @@ to_tsquery ( '131.211.112.9')) )  ORDER BY id DESC LIMIT 500; 
>                                                    QUERY PLAN
>
> --------------------------------------------------------------------------------
> -----------------------------------
> Limit  (cost=40928.89..40930.14 rows=500 width=159)
>   ->  Sort  (cost=40928.89..40954.76 rows=10346 width=159)
>         Sort Key: id
>         ->  Bitmap Heap Scan on systemevents  (cost=2898.06..40413.36 rows=1034
> 6 width=159)
>               Recheck Cond: (to_tsvector('english'::regconfig, message) @@ to_t
> squery('131.211.112.9'::text))
>               ->  Bitmap Index Scan on msgs_idx  (cost=0.00..2895.47 rows=10346
> width=0)
>                     Index Cond: (to_tsvector('english'::regconfig, message) @@
> to_tsquery('131.211.112.9'::text))
> (7 rows)
>
> works acceptable.

Odd that more records and a more complicated plan gives faster results... That's why I think we'd really want to see
explainanalyse output. 
I'm guessing that there are a lot of records matching your search string and that you've found the cut-off point where
theplanner thinks you're throwing away enough rows that it's not very useful to first select all the matching records
beforesorting the results. 

I think it decided to just start searching backwards along the id and returning the rows that match that IP (and are
visibleto your transaction) would be faster than trying to work with all the rows that match that IP. 

This probably means it misjudged the costs of sorting your index backwards, which indicates that your planning
statisticsare off, or that your cost parameters aren't appropriate for your system. 



One thing I do notice is that the first plan uses the index on id instead of the ts_vector one. For queries like those
youcould try to use a combined index like this: 

CREATE INDEX msgs_idx (to_tsvector('english'::regconfig, message), id) ON systemevents USING (gin);

Whether to use gist or gin there I really don't know. I'd probably go for gist, I seem to recall that gin is fairly
heavyto use. 

> How to use the right plan regardless of the 'LIMIT-size'?


You could try turning off planner-options, but that's probably a fairly bad idea.

Other options are to use a prepared statement or a stored procedure with the IP as a parameter, which force the planner
touse a more general plan because it doesn't know which values you're going to search for before it plans the query. 

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:737,4c8675d010409863511634!



В списке pgsql-general по дате отправления:

Предыдущее
От: Craig Ringer
Дата:
Сообщение: Re: Configure default for sorting of null-values?
Следующее
От: Scott Marlowe
Дата:
Сообщение: Re: Forcing the right queryplan