Re: How to speed up pg_trgm / gin index scan

Поиск
Список
Период
Сортировка
От Jaime Casanova
Тема Re: How to speed up pg_trgm / gin index scan
Дата
Msg-id CAJKUy5hOcKZe6dtZPuyt8+wCBUOyRu-68uLTZxUuUs_1wnk09Q@mail.gmail.com
обсуждение исходный текст
Ответ на How to speed up pg_trgm / gin index scan  (Christian Ramseyer <rc@networkz.ch>)
Ответы Re: How to speed up pg_trgm / gin index scan  (Christian Ramseyer <rc@networkz.ch>)
Список pgsql-general
On Mon, Jun 22, 2015 at 6:51 AM, Christian Ramseyer <rc@networkz.ch> wrote:
>
> DM=# \d+ logs_01
>
>     Column    |            Type             |
> --------------+-----------------------------+
>  host         | character varying(255)      |
>  facility     | character varying(10)       |
>  priority     | character varying(10)       |
>  tag          | character varying(255)      |
>  log_date     | timestamp without time zone |
>  program      | character varying(255)      |
>  msg          | text                        |
>  seq          | bigint                      |
>
> Indexes:
>     "logs_01_pkey" PRIMARY KEY, btree (seq)
>     "idx_logs_01_lower_host" btree (lower(host::text) varchar_pattern_ops)
>     "logs_01_date_index" btree (log_date)
>     "tridx_logs_01_msg" gin (msg gin_trgm_ops)
>
>
> DM=# select count(*) from logs_01;
>   count
> ----------
>  83052864
>
>
> I'd like to provide a fast "like %x%" search on the msg column, hence I added a trigram based gin index on it. It is
around60 GB on the 35 GB table: 
>
> DM=# select count(*) from logs_01;
>   count
> ----------
>  83052864
>
>
> DM=# \dt+ logs_01
>                      List of relations
>  Schema |  Name   | Type  |  Owner   | Size  | Description
> --------+---------+-------+----------+-------+-------------
>  public | logs_01 | table | postgres | 35 GB |
>
> DM=# \di+ tridx_logs_01_msg
>                                List of relations
>  Schema |       Name        | Type  |  Owner   |  Table  | Size  | Description
> --------+-------------------+-------+----------+---------+-------+-------------
>  public | tridx_logs_01_msg | index | postgres | logs_01 | 58 GB |
>
>

What version of postgres is this? GIN indexes improved a lot in 9.4,
they use less than half the space and have doubled the speed (on
average).

Now, whatever version you have; a GIN index has two data structures,
the main one in which the index entries are stored as key-value pairs
(please someone correct my description of the situation) and a pending
list, which is a temporary unsorted list of pending entries in which
all the newly inserted tuples arrive until a VACUUM (or until the
pending list grows upto work_mem) moves that list into the main
structure.

That happens to avoid the penalty of inserting new rows in the main
structure which could be expensive.
But while the pending list grows the speed of the index decreases. And
because you have work_mem in 16Gb your pending list is possibly
growing without control.

if you have 9.3 or superior you can know how big is that pending list
installing pgstattuple.

CREATE EXTENSION pgstattuple;
SELECT * FROM pgstatginindex('tridx_logs_01_msg');

NOTE: remember that pending_pages is expressed in 8kb-pages

if that is the problem or if you are in <= 9.2 then try VACUUM the table

--
Jaime Casanova         www.2ndQuadrant.com
Professional PostgreSQL: Soporte 24x7 y capacitación


В списке pgsql-general по дате отправления:

Предыдущее
От: Christian Ramseyer
Дата:
Сообщение: Re: How to speed up pg_trgm / gin index scan
Следующее
От: Suresh Raja
Дата:
Сообщение: Re: extracting PII data and transforming it across table.