Re: poor pefrormance with regexp searches on large tables

Поиск
Список
Период
Сортировка
От pasman pasmański
Тема Re: poor pefrormance with regexp searches on large tables
Дата
Msg-id CAOWY8=ZrGt06MNVEEhKRR86=a_bLpurjS_cdzFB0MjydP2T9JQ@mail.gmail.com
обсуждение исходный текст
Ответ на poor pefrormance with regexp searches on large tables  (Grzegorz Blinowski <g.blinowski@gmail.com>)
Список pgsql-performance
Try to use single regular expression.

2011/8/10, Grzegorz Blinowski <g.blinowski@gmail.com>:
> Dear All,
>
> I have some problems with regexp queries performance - common sense tells me
> that my queries should run faster than they do.
>
> The database - table in question has 590 K records, table's size is 3.5GB. I
> am effectively querying a single attribute "subject" which has an average
> size of 2KB, so we are doing a query on ~1GB of data. The query looks more
> or less like this:
>
> SELECT T.tender_id FROM archive_tender T WHERE
> (( T.subject !~* '\\mpattern1.*\\M' ) AND ( T.subject ~* '\\mpattern2\\M' OR
> [4-5 more similar terms]  ) AND T.erased = 0 AND T.rejected = 0
> ORDER BY
>    tender_id DESC
> LIMIT
>     10000;
>
> The planner shows seq scan on subject which is OK with regexp match.
>
> Now, the query above takes about 60sec to execute; exactly: 70s for the
> first run and 60s for the next runs. In my opinion this is too long: It
> should take 35 s to read the whole table into RAM (assuming 100 MB/s
> transfers - half the HDD  benchmarked speed). With 12 GB of RAM the whole
> table should be easily buffered on the operating system level. The regexp
> match on 1 GB of data takes 1-2 s (I benchmarked it with a simple pcre
> test). The system is not in the production mode, so there is no additional
> database activity (no reads, no updates, effectively db is  read-only)
>
> To summarize: any idea how to speed up this query? (please, don't suggest
> regexp indexing - in this application it would be too time consuming to
> implement them, and besides - as above - I think that Postgres should do
> better here even with seq-scan).
>
> Server parameters:
> RAM: 12 GB
> Cores: 8
> HDD: SATA; shows 200 MB/s transfer speed
> OS: Linux 64bit; Postgres 8.4
>
>
> Some performance params from postgresql.conf:
> max_connections = 16
> shared_buffers = 24MB
> temp_buffers = 128MB
> max_prepared_transactions = 50
> work_mem = 128MB
> maintenance_work_mem = 1GB
> effective_cache_size = 8GB
>
> Database is vacuumed.
>
>
> Regards,
>
> Greg
>


--
------------
pasman

В списке pgsql-performance по дате отправления:

Предыдущее
От: "Tomas Vondra"
Дата:
Сообщение: Re: poor pefrormance with regexp searches on large tables
Следующее
От: "Kevin Grittner"
Дата:
Сообщение: Re: poor pefrormance with regexp searches on large tables