How to speed up pg_trgm / gin index scan

Поиск
Список
Период
Сортировка
От Christian Ramseyer
Тема How to speed up pg_trgm / gin index scan
Дата
Msg-id 5587F6DD.6000307@networkz.ch
обсуждение исходный текст
Ответы Re: How to speed up pg_trgm / gin index scan  (Jeff Janes <jeff.janes@gmail.com>)
Re: How to speed up pg_trgm / gin index scan  (Jaime Casanova <jaime@2ndquadrant.com>)
Re: How to speed up pg_trgm / gin index scan  (Oleg Bartunov <obartunov@gmail.com>)
Re: How to speed up pg_trgm / gin index scan  (Christian Ramseyer <rc@networkz.ch>)
Список pgsql-general
Hi

I have a pretty large table with syslog messages.

It is already partitioned by month, and for a single month I have e.g.


DM=# \d+ logs_01

    Column    |            Type             |
--------------+-----------------------------+
 host         | character varying(255)      |
 facility     | character varying(10)       |
 priority     | character varying(10)       |
 tag          | character varying(255)      |
 log_date     | timestamp without time zone |
 program      | character varying(255)      |
 msg          | text                        |
 seq          | bigint                      |

Indexes:
    "logs_01_pkey" PRIMARY KEY, btree (seq)
    "idx_logs_01_lower_host" btree (lower(host::text) varchar_pattern_ops)
    "logs_01_date_index" btree (log_date)
    "tridx_logs_01_msg" gin (msg gin_trgm_ops)


DM=# select count(*) from logs_01;
  count
----------
 83052864


I'd like to provide a fast "like %x%" search on the msg column, hence I added a trigram based gin index on it. It is
around60 GB on the 35 GB table: 

DM=# select count(*) from logs_01;
  count
----------
 83052864


DM=# \dt+ logs_01
                     List of relations
 Schema |  Name   | Type  |  Owner   | Size  | Description
--------+---------+-------+----------+-------+-------------
 public | logs_01 | table | postgres | 35 GB |

DM=# \di+ tridx_logs_01_msg
                               List of relations
 Schema |       Name        | Type  |  Owner   |  Table  | Size  | Description
--------+-------------------+-------+----------+---------+-------+-------------
 public | tridx_logs_01_msg | index | postgres | logs_01 | 58 GB |


A typical query on this table looks like this:

explain analyze
select log_date, host, msg
from logs_01 as log   where  log.msg like '%192.23.33.177%'
    and log.log_date >= '2015-1-18 1:45:24'
    and log.log_date <= '2015-1-19 1:45:24'
    order by log_date asc offset 200 limit 50;


It yields a promising explain that shows that the index is used, but unfortunately the bitmap index scan on the GIN
indextakes quite long (40 seconds) 

                                                                                                QUERY PLAN
                                                                                  

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=34510.06..34510.06 rows=1 width=195) (actual time=42971.002..42971.015 rows=50 loops=1)
   ->  Sort  (cost=34509.75..34510.06 rows=124 width=195) (actual time=42970.960..42970.990 rows=250 loops=1)
         Sort Key: log_date
         Sort Method: top-N heapsort  Memory: 152kB
         ->  Bitmap Heap Scan on logs_01 log  (cost=34009.21..34505.44 rows=124 width=195) (actual
time=42963.969..42969.725rows=2472 loops=1) 
               Recheck Cond: ((msg ~~ '%192.23.33.177%'::text) AND (log_date >= '2015-01-18 01:45:24'::timestamp
withouttime zone) AND (log_date <= '2015-01-19 01:45:24'::timestamp without time zone)) 
               ->  BitmapAnd  (cost=34009.21..34009.21 rows=124 width=0) (actual time=42962.562..42962.562 rows=0
loops=1)
                     ->  Bitmap Index Scan on tridx_logs_01_msg  (cost=0.00..6992.15 rows=8020 width=0) (actual
time=42731.145..42731.145rows=168489 loops=1) 
                           Index Cond: (msg ~~ '%192.23.33.177%'::text)
                     ->  Bitmap Index Scan on logs_01_date_index  (cost=0.00..27016.75 rows=1287939 width=0) (actual
time=180.055..180.055rows=1173048 loops=1) 
                           Index Cond: ((log_date >= '2015-01-18 01:45:24'::timestamp without time zone) AND (log_date
<='2015-01-19 01:45:24'::timestamp without time zone)) 
 Total runtime: 42971.137 ms

(also on http://explain.depesz.com/s/KpaB)


Any good ideas on how I could speed this up a bit?

I have already tried to throw quite a bunch of memory at the problem:

shared_buffers = 64GB
work_mem = 16GB

but it didn't improve between this and the 32GB shared/ 2GB work GB I had before.

This is on Postgres 9.1.15 on Linux.

Thanks
Christian


В списке pgsql-general по дате отправления:

Предыдущее
От: Albe Laurenz
Дата:
Сообщение: Re: Re: pg_dump 8.4.9 failing after upgrade to openssl-1.0.1e-30.el6_6.11.x86_64 on redhat linux
Следующее
От: Paul Ramsey
Дата:
Сообщение: Less is More