Re: query produces 1 GB temp file

Поиск
Список
Период
Сортировка
От Dirk.Lutzebaeck@t-online.de (Dirk Lutzebaeck)
Тема Re: query produces 1 GB temp file
Дата
Msg-id 4205228C.3060601@aeccom.com
обсуждение исходный текст
Ответ на Re: query produces 1 GB temp file  (John A Meinel <john@arbash-meinel.com>)
Список pgsql-performance
Hi John,

thanks very much for your analysis. I'll probably need to reorganize
some things.

Regards,

Dirk

John A Meinel wrote:

> Dirk Lutzebaeck wrote:
>
>> Hi,
>>
>> here is a query which produces over 1G temp file in pgsql_tmp. This
>> is on pgsql 7.4.2, RHEL 3.0, XEON MP machine with 32GB RAM, 300MB
>> sort_mem and 320MB shared_mem.
>>
>> Below is the query and results for EXPLAIN and EXPLAIN ANALYZE. All
>> tables have been analyzed before.
>>
>> Can some please explain why the temp file is so huge? I understand
>> there are a lot of rows.
>>
>> Thanks in advance,
>>
>> Dirk
>>
>>
> ...
>
>>               ->  Nested Loop  (cost=0.00..8346.73 rows=3 width=1361)
>> (actual time=34.104..18016.005 rows=703677 loops=1)
>>
>>
> Well, there is this particular query where it thinks there will only
> be 3 rows, but in fact there are 703,677 of them. And the previous line:
>
>>         ->  Sort  (cost=8346.75..8346.76 rows=3 width=1361) (actual
>> time=75357.448..75499.263 rows=22439 loops=1)
>>
>>
> Seem to indicate that after sorting you still have 22,439 rows, which
> then gets pared down again down to 1000.
>
> I'm assuming that the sort you are trying to do is extremely
> expensive. You are sorting 700k rows, which takes up too much memory
> (1GB), which forces it to create a temporary table, and write it out
> to disk.
>
> I didn't analyze it a lot, but you might get a lot better performance
> from doing a subselect, rather than the query you wrote.
>
> You are joining 4 tables (bi, en, df AS ft, es) I don't know which
> tables are what size. In the end, though, you don't really care about
> the en table or es tables (they aren't in your output).
>
> So maybe one of you subselects could be:
>
> where bi.en = (select en from es where es.co = bi.co and
> es.spec=122293729);
>
> I'm pretty sure the reason you need 1GB of temp space is because at
> one point you have 700k rows. Is it possible to rewrite the query so
> that it does more filtering earlier? Your distinct criteria seems to
> filter it down to 20k rows. So maybe it's possible to do some sort of
> a distinct in part of the subselect, before you start joining against
> other tables.
>
> If you have that much redundancy, you might also need to think of
> doing a different normalization.
>
> Just some thoughts.
>
> Also, I thought using the "oid" column wasn't really recommended,
> since in *high* volume databases they aren't even guaranteed to be
> unique. (I think it is a 32-bit number that rolls over.) Also on a
> database dump and restore, they don't stay the same, unless you take a
> lot of extra care that they are included in both the dump and the
> restore. I believe it is better to create your own "id" per table (say
> SERIAL or BIGSERIAL).
>
> John
> =:->
>


В списке pgsql-performance по дате отправления:

Предыдущее
От: Sanketh Indarapu
Дата:
Сообщение: Postgres odbc performance on windows
Следующее
От: Dirk.Lutzebaeck@t-online.de (Dirk Lutzebaeck)
Дата:
Сообщение: Re: query produces 1 GB temp file