Re: [ADMIN] question on hash joins

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: [ADMIN] question on hash joins
Дата
Msg-id 22675.1508361426@sss.pgh.pa.us
обсуждение исходный текст
Ответ на [ADMIN] question on hash joins  ("Hartranft, Robert M. (GSFC-423.0)[RAYTHEON CO]"<robert.m.hartranft@nasa.gov>)
Ответы Re: [ADMIN] question on hash joins  ("Hartranft, Robert M. (GSFC-423.0)[RAYTHEON CO]"<robert.m.hartranft@nasa.gov>)
Список pgsql-admin
"Hartranft, Robert M. (GSFC-423.0)[RAYTHEON CO]" <robert.m.hartranft@nasa.gov> writes:
> explain select count(1) from table1 g join table2 x on x.granuleid = g.granuleid where g.collectionid = 22467;
>                                                           QUERY PLAN
>
-------------------------------------------------------------------------------------------------------------------------------
> Aggregate  (cost=18200480.80..18200480.81 rows=1 width=8)
>    ->  Hash Join  (cost=103206.82..18190602.43 rows=3951347 width=0)
>          Hash Cond: (x.granuleid = g.granuleid)
>          ->  Seq Scan on table2 x  (cost=0.00..10596253.01 rows=644241901 width=8)
>          ->  Hash  (cost=92363.72..92363.72 rows=660888 width=8)
>                ->  Index Only Scan using idx_table1 on table1 g  (cost=0.57..92363.72 rows=660888 width=8)
>                      Index Cond: (collectionid = '22467'::bigint)
> (7 rows)

> My question is, what gets put into the Hash?
> I assume the with "width=8" must refer to the size of the key.

It's the data payload width, but there's some per-row overhead too, which
is more than 8 bytes ... don't recall how much more at the moment.

> The reason I ask is because, when I try to run the query it fails due to
> temp file use over 10GB.

What have you got work_mem set to?  If the hashed table exceeds that,
you're going to end up with a multi-batch join, in which case most of
the outer table is going to get written into temp files as well.

I don't think it's a good idea to have such a small temp_file_limit
when you're working with such huge tables.  Either a merge or hash
join is likely to require temp file space similar to the table size,
and you do *not* want a nestloop join, at least not if you want your
answers in reasonable time.
        regards, tom lane


--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

В списке pgsql-admin по дате отправления:

Предыдущее
От: "Hartranft, Robert M. (GSFC-423.0)[RAYTHEON CO]"
Дата:
Сообщение: [ADMIN] question on hash joins
Следующее
От: "Hartranft, Robert M. (GSFC-423.0)[RAYTHEON CO]"
Дата:
Сообщение: Re: [ADMIN] question on hash joins