Performance for seq. scans

Поиск
Список
Период
Сортировка
От Jules Bean
Тема Performance for seq. scans
Дата
Msg-id 20000726115132.C29809@grommit.office.vi.net
обсуждение исходный текст
Ответы Re: Performance for seq. scans  (Steve Heaven <steve@thornet.co.uk>)
Список pgsql-general
Hi all,

I've had a look over the docs and the FAQ and I can't see anything
answering this, so here goes:

I'm in the (slightly unusual, in a relational world) situation that
the dominant query on my database is a wildcard search, so that no
indexes can be used.  (E.g. select * from table_a where foo like
'%bar%').

Without some /very/ clever (and disk-space intensive) subword
indexing, this query is doomed to be a sequential scan, which I'm
resigned to.  It's a question of making that as fast as possible.

My dataset is around 500M as a text file on disk, and around 1500M as
postgres data. The machine I'm working on at the moment does the
search in around 90 seconds. (For comparision, MS SQL 7, the other
solution being considered here, takes around 75 seconds on identical
hardware).

Interestingly, using 'vmstat' shows that the CPU is maxxed out at 50%
(this being a dual CPU machine), while the disk access is a mere
4M/sec --- bonnie claims this machine is capable of around 25M/sec to
this particular disk. So it would seem that the bottleneck is the
CPU. [I understand why both CPUs aren't used]

My previous feeling had been that the bottleneck was going to be the
disk, in which case I was going to recommend installing enough memory
in the machine that the kernel disk cache could cache the whole file,
and thus speeding up the search. In the current situtation, it seems
like the only improvement would be to install a faster CPU (and since
we're currently using a PIII 600, I couldn't expect much more than a
60% improvement or so that way).

It seems slightly surprising that postgres can only "service" a 4M/sec
stream of data from the disk with a LIKE query -- not such a complex
query. Is there some unnecessary data copying in the critical path for
the search?

I almost forgot -- this is debian package 7.0.2-2.

Any pointers to whether or not this performance can be improved upon,
welcomed.  Currently I'm feeling like the right solution may be to
dump the 500M text file periodically and run 'grep' on a machine with
enough memory to cache the text file ;-)

Jules Bean



В списке pgsql-general по дате отправления:

Предыдущее
От: Bob Parkinson
Дата:
Сообщение: planner switch from index scan to seq scan?
Следующее
От: "Andrea Aime"
Дата:
Сообщение: Re: Connecting to PostgreSQL databases