Forcing the right queryplan

Поиск
Список
Период
Сортировка
От Henk van Lingen
Тема Forcing the right queryplan
Дата
Msg-id 20100831145009.GD22680@uu.nl
обсуждение исходный текст
Ответы Re: Forcing the right queryplan  (Henk van Lingen <H.G.K.vanLingen@uu.nl>)
Re: Forcing the right queryplan  (Yeb Havinga <yebhavinga@gmail.com>)
Re: Forcing the right queryplan  (Alban Hertroys <dalroi@solfertje.student.utwente.nl>)
Список pgsql-general
Hi,

I've the problem my database is not using the 'right' queryplan in all
cases. Is there a way I can force that and/or how should I tuned the
table statistics?

I'm doing a rsyslog database in PostgreSQL with millions of records
(firewall logging). The db scheme is the so called 'MonitorWare' scheme,
to wich I added two extra indexes.

syslog=# select version();
                                                     version

--------------------------------------------------------------------------------
----------------------------------
 PostgreSQL 8.4.4 on x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20
080704 (Red Hat 4.1.2-48), 64-bit


syslog=# \d systemevents
                                         Table "public.systemevents"
       Column       |            Type             |                         Modi
fiers
--------------------+-----------------------------+-----------------------------
------------------------------
 id                 | integer                     | not null default nextval('sy
stemevents_id_seq'::regclass)
 customerid         | bigint                      |
 receivedat         | timestamp without time zone |
 devicereportedtime | timestamp without time zone |
 facility           | smallint                    |
 priority           | smallint                    |
 fromhost           | character varying(60)       |
 message            | text                        |
 ntseverity         | integer                     |
 importance         | integer                     |
 eventsource        | character varying(60)       |
 eventuser          | character varying(60)       |
 eventcategory      | integer                     |
 eventid            | integer                     |
 eventbinarydata    | text                        |
 maxavailable       | integer                     |
 currusage          | integer                     |
 minusage           | integer                     |
 maxusage           | integer                     |
 infounitid         | integer                     |
 syslogtag          | character varying(60)       |
 eventlogtype       | character varying(60)       |
 genericfilename    | character varying(60)       |
 systemid           | integer                     |
Indexes:
    "systemevents_pkey" PRIMARY KEY, btree (id)
    "fromhost_idx" btree (fromhost)
    "msgs_idx" gin (to_tsvector('english'::regconfig, message))

The GIN index is to do text searching (via LogAnalyzer).

Now there are two types of query plans:

syslog=# explain SELECT id, devicereportedtime, facility, priority, fromhost, syslogtag, infounitid, message FROM
systemeventsWHERE (  ( to_tsvector('english', message) @@ to_tsquery ( '131.211.112.9')) )  ORDER BY id DESC LIMIT 100;
                                                 QUERY PLAN                    

--------------------------------------------------------------------------------
---------------------------------
 Limit  (cost=0.00..10177.22 rows=100 width=159)
   ->  Index Scan Backward using systemevents_pkey on systemevents  (cost=0.00..
1052934.86 rows=10346 width=159)
         Filter: (to_tsvector('english'::regconfig, message) @@ to_tsquery('131.
211.112.9'::text))
(3 rows)

This one is useless (takes very long). However this one:

syslog=# explain SELECT id, devicereportedtime, facility, priority, fromhost, syslogtag, infounitid, message FROM
systemeventsWHERE (  ( to_tsvector('english', message) @@ to_tsquery ( '131.211.112.9')) )  ORDER BY id DESC LIMIT 500; 
                                                    QUERY PLAN

--------------------------------------------------------------------------------
-----------------------------------
 Limit  (cost=40928.89..40930.14 rows=500 width=159)
   ->  Sort  (cost=40928.89..40954.76 rows=10346 width=159)
         Sort Key: id
         ->  Bitmap Heap Scan on systemevents  (cost=2898.06..40413.36 rows=1034
6 width=159)
               Recheck Cond: (to_tsvector('english'::regconfig, message) @@ to_t
squery('131.211.112.9'::text))
               ->  Bitmap Index Scan on msgs_idx  (cost=0.00..2895.47 rows=10346
 width=0)
                     Index Cond: (to_tsvector('english'::regconfig, message) @@
to_tsquery('131.211.112.9'::text))
(7 rows)

works acceptable.

Stats:

syslog=# SELECT relname, relkind, reltuples, relpages
FROM pg_class                                                      WHERE relname LIKE 'systemevents%';
                                      relname            | relkind |  reltuples  | relpages  
-------------------------------+---------+-------------+----------
 systemevents_pkey             | i       | 2.06915e+06 |    71985
 systemeventsproperties        | r       |           0 |        0
 systemeventsproperties_pkey   | i       |           0 |        1
 systemevents_id_seq           | S       |           1 |        1
 systemeventsproperties_id_seq | S       |           1 |        1
 systemevents                  | r       | 2.06915e+06 |   694826
(6 rows)

syslog=# SELECT relname, relkind, reltuples, relpages
FROM pg_class
WHERE relname LIKE 'msg%';
 relname  | relkind |  reltuples  | relpages
----------+---------+-------------+----------
 msgs_idx | i       | 2.06915e+06 |   128069
(1 row)

How to use the right plan regardless of the 'LIMIT-size'?

Cheers,
--
Henk van Lingen, ICT-SC Netwerk & Telefonie,                  (o-      -+
Universiteit Utrecht, Jenalaan 18a, room 0.12                 /\        |
phone: +31-30-2538453                                         v_/_      |
http://henk.vanlingen.net/             http://www.tuxtown.net/netiquette/

В списке pgsql-general по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: pg_dump --compress error
Следующее
От: Allan Kamau
Дата:
Сообщение: Finding intercept of two documents (two tsvector fields)