Re: query produces 1 GB temp file

Поиск
Список
Период
Сортировка
От John A Meinel
Тема Re: query produces 1 GB temp file
Дата
Msg-id 42051DD1.7020802@arbash-meinel.com
обсуждение исходный текст
Ответ на query produces 1 GB temp file  (Dirk Lutzebaeck <lutzeb@aeccom.com>)
Ответы Re: query produces 1 GB temp file
Список pgsql-performance
Dirk Lutzebaeck wrote:

>Hi,
>
>here is a query which produces over 1G temp file in pgsql_tmp. This
>is on pgsql 7.4.2, RHEL 3.0, XEON MP machine with 32GB RAM, 300MB
>sort_mem and 320MB shared_mem.
>
>Below is the query and results for EXPLAIN and EXPLAIN ANALYZE. All
>tables have been analyzed before.
>
>Can some please explain why the temp file is so huge? I understand
>there are a lot of rows.
>
>Thanks in advance,
>
>Dirk
>
>
...

>               ->  Nested Loop  (cost=0.00..8346.73 rows=3 width=1361) (actual time=34.104..18016.005 rows=703677
loops=1)
>
>
Well, there is this particular query where it thinks there will only be
3 rows, but in fact there are 703,677 of them. And the previous line:

>         ->  Sort  (cost=8346.75..8346.76 rows=3 width=1361) (actual time=75357.448..75499.263 rows=22439 loops=1)
>
>
Seem to indicate that after sorting you still have 22,439 rows, which
then gets pared down again down to 1000.

I'm assuming that the sort you are trying to do is extremely expensive.
You are sorting 700k rows, which takes up too much memory (1GB), which
forces it to create a temporary table, and write it out to disk.

I didn't analyze it a lot, but you might get a lot better performance
from doing a subselect, rather than the query you wrote.

You are joining 4 tables (bi, en, df AS ft, es) I don't know which
tables are what size. In the end, though, you don't really care about
the en table or es tables (they aren't in your output).

So maybe one of you subselects could be:

where bi.en = (select en from es where es.co = bi.co and es.spec=122293729);

I'm pretty sure the reason you need 1GB of temp space is because at one
point you have 700k rows. Is it possible to rewrite the query so that it
does more filtering earlier? Your distinct criteria seems to filter it
down to 20k rows. So maybe it's possible to do some sort of a distinct
in part of the subselect, before you start joining against other tables.

If you have that much redundancy, you might also need to think of doing
a different normalization.

Just some thoughts.

Also, I thought using the "oid" column wasn't really recommended, since
in *high* volume databases they aren't even guaranteed to be unique. (I
think it is a 32-bit number that rolls over.) Also on a database dump
and restore, they don't stay the same, unless you take a lot of extra
care that they are included in both the dump and the restore. I believe
it is better to create your own "id" per table (say SERIAL or BIGSERIAL).

John
=:->


Вложения

В списке pgsql-performance по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: GiST indexes and concurrency (tsearch2)
Следующее
От: Tom Lane
Дата:
Сообщение: Re: GiST indexes and concurrency (tsearch2)