Re: [ADMIN] question on hash joins

Поиск
Список
Период
Сортировка
От Hartranft, Robert M. (GSFC-423.0)[RAYTHEON CO]
Тема Re: [ADMIN] question on hash joins
Дата
Msg-id B4DE522C-7B7B-47DC-9143-B3087A7487D5@nasa.gov
обсуждение исходный текст
Ответ на Re: [ADMIN] question on hash joins  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: [ADMIN] question on hash joins  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-admin
Thanks for the help!

We set:  work_mem = 16MB

Also, table1 has 100 million rows, not 1 million; that is why it uses the index
in my query.

Sorry if I am being dense, but I still have a question…
Is it possible for me to estimate the size of the hash and a value for 
the temp_file_limit setting using information in the explain plan?

For example, one possibility is that the hash contains the entire tuple for each
matching row.  In that case, I could multiply the number of rows estimated in
the explain output times an average row size to get an “order of magnitude” 
estimate for temp_file_limit.

Another possibility would be the hash only contains the columns
needed to satisfy the remaining query clauses (in my example that would just be the
granuleid column) and thus much smaller.

Thanks in advance,
Bob



On 10/18/17, 5:17 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
   "Hartranft, Robert M. (GSFC-423.0)[RAYTHEON CO]" <robert.m.hartranft@nasa.gov> writes:   > explain select count(1)
fromtable1 g join table2 x on x.granuleid = g.granuleid where g.collectionid = 22467;   >
                           QUERY PLAN   >
-------------------------------------------------------------------------------------------------------------------------------
 > Aggregate  (cost=18200480.80..18200480.81 rows=1 width=8)   >    ->  Hash Join  (cost=103206.82..18190602.43
rows=3951347width=0)   >          Hash Cond: (x.granuleid = g.granuleid)   >          ->  Seq Scan on table2 x
(cost=0.00..10596253.01rows=644241901 width=8)   >          ->  Hash  (cost=92363.72..92363.72 rows=660888 width=8)   >
              ->  Index Only Scan using idx_table1 on table1 g  (cost=0.57..92363.72 rows=660888 width=8)   >
          Index Cond: (collectionid = '22467'::bigint)   > (7 rows)      > My question is, what gets put into the Hash?
 > I assume the with "width=8" must refer to the size of the key.      It's the data payload width, but there's some
per-rowoverhead too, which   is more than 8 bytes ... don't recall how much more at the moment.      > The reason I ask
isbecause, when I try to run the query it fails due to   > temp file use over 10GB.      What have you got work_mem set
to? If the hashed table exceeds that,   you're going to end up with a multi-batch join, in which case most of   the
outertable is going to get written into temp files as well.      I don't think it's a good idea to have such a small
temp_file_limit  when you're working with such huge tables.  Either a merge or hash   join is likely to require temp
filespace similar to the table size,   and you do *not* want a nestloop join, at least not if you want your   answers
inreasonable time.                  regards, tom lane   
 


--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

В списке pgsql-admin по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: [ADMIN] question on hash joins
Следующее
От: Tom Lane
Дата:
Сообщение: Re: [ADMIN] question on hash joins