Re: Sequential scan on FK join

Поиск
Список
Период
Сортировка
От Martin Nickel
Тема Re: Sequential scan on FK join
Дата
Msg-id pan.2005.10.21.07.59.13.750857@portant.com
обсуждение исходный текст
Ответ на Sequential scan on FK join  (Martin Nickel <martin@portant.com>)
Список pgsql-performance
On Tue, 18 Oct 2005 08:52:15 +0100, Richard Huxton wrote:

> Martin Nickel wrote:
>> When I turn of seqscan it does use the index - and it runs 20 to 30%
>> longer.  Based on that, the planner is correctly choosing a sequential
>> scan - but that's just hard for me to comprehend.  I'm joining on an
>> int4 key, 2048 per index page - I guess that's a lot of reads - then the
>> data -page reads.  Still, the 8-minute query time seems excessive.
>
> You'll be getting (many) fewer than 2048 index entries per page. There's a
> page header and various pointers involved too, and index pages aren't
> going to be full. So - it needs to search the table on dates, fetch the
> id's and then assemble them for the hash join. Of course, if you have too
> many to join then all this will spill to disk slowing you further.
>
> Now, you'd rather get down below 8 minutes. There are a number of options:
>   1. Make sure your disk i/o is being pushed to its limit
We are completely peaked out on disk io.  iostat frequently shows 60%
iowait time.  This is quite an issue for us and I don't have any
great ideas.  Data is on a 3ware sata raid at raid 10 across 4 disks.  I
can barely even run vacuums on our largest table (lead) since it runs for
a day and a half and kills our online performance while running.

> 2. Look into increasing the sort memory for this one query "set
> work_mem..." (see the runtime configuration section of the manual)
I haven't tried this, and I will.  Thanks for the idea.

>   3. Actually - are you happy that your general configuration is OK?
I'm not at all.  Most of the configuration changes I've tried have made
almost no discernable difference.  I'll post the relevant numbers in a
different post - possibly you'll have some suggestions.

> 4. Perhaps use a cursor - I'm guessing you want to process these
> mailings in some way and only want them one at a time in any case.
Where this first came up was in trying to get aggregate totals per
mailing.  I gave up on that and created a nightly job to create a summary
table since Postgres wasn't up to the job in real time.  Still, I
frequently need to do the join and limit it by other criteria - and it is
incredibly slow - even when the result set is smallish.

>   5. Try the query one day at a time and see if the balance tips the
> other way - you'll be dealing with substantially less data per query
> which might match your system better. Of course, this may not be
> practical for your applicaton.
It is not useful.

>   6. If your lead table is updated only rarely, you could try a CLUSTER
> on the table by mailing_id - that should speed the scan. Read the manual
> for the cluster command first though.
The lead table is one of the most volatle in our system.  Each day we
insert tens or hundreds of thousands of rows, update almost that many, and
delete a few.  It is growing, and could reach 100 million rows in 8 or 9
months.  We're redesigning the data structure a little so lead is not
updated (updates are just too slow), but it will continue to have inserts
and deletes, and we'll have to join it with the associated table being
updated, which already promises to be a slow operation.

We're looking at 15K rpm scsi drives for a replacement raid array.  We are
getting the place where it may be cheaper to convert to Oracle or DB2 than
to try and make Posgres work.


В списке pgsql-performance по дате отправления:

Предыдущее
От: Kuba Ouhrabka
Дата:
Сообщение: Re: cached plans in plpgsql
Следующее
От: "Christian Paul B. Cosinas"
Дата:
Сообщение: Re: Used Memory