Обсуждение: [ADMIN] question on hash joins

Поиск
Список
Период
Сортировка

[ADMIN] question on hash joins

От
"Hartranft, Robert M. (GSFC-423.0)[RAYTHEON CO]"
Дата:

Hi All,

 

I have a question about hash joins and the meaning of the width in the explain output for a query.

 

I have two large tables table1 has 1 million rows and table2 has 600 million rows

 

When I try to join these two tables based using one constraint on table1

(which reduces the candidate rows down to 660,888) the optimizer seems to correctly

choose to hash the values from table1.

 

 

explain select count(1) from table1 g join table2 x on x.granuleid = g.granuleid where g.collectionid = 22467;

                                                          QUERY PLAN                                                          

-------------------------------------------------------------------------------------------------------------------------------

Aggregate  (cost=18200480.80..18200480.81 rows=1 width=8)

   ->  Hash Join  (cost=103206.82..18190602.43 rows=3951347 width=0)

         Hash Cond: (x.granuleid = g.granuleid)

         ->  Seq Scan on table2 x  (cost=0.00..10596253.01 rows=644241901 width=8)

         ->  Hash  (cost=92363.72..92363.72 rows=660888 width=8)

               ->  Index Only Scan using idx_table1 on table1 g  (cost=0.57..92363.72 rows=660888 width=8)

                     Index Cond: (collectionid = '22467'::bigint)

(7 rows)

 

 

 

My question is, what gets put into the Hash?

I assume the with "width=8" must refer to the size of the key.

Does the entire row get copied to the hash as the corresponding value?

 

 

The reason I ask is because, when I try to run the query it fails due to

temp file use over 10GB.  How do I accurately determine the amount of memory

that will be used.

 

select count(1) from table1 g join table2 x on x.granuleid = g.granuleid where g.collectionid = 22467;

ERROR:  temporary file size exceeds temp_file_limit (10485760 kB)

 

 

Thanks in advance,

Bob

Re: [ADMIN] question on hash joins

От
Tom Lane
Дата:
"Hartranft, Robert M. (GSFC-423.0)[RAYTHEON CO]" <robert.m.hartranft@nasa.gov> writes:
> explain select count(1) from table1 g join table2 x on x.granuleid = g.granuleid where g.collectionid = 22467;
>                                                           QUERY PLAN
>
-------------------------------------------------------------------------------------------------------------------------------
> Aggregate  (cost=18200480.80..18200480.81 rows=1 width=8)
>    ->  Hash Join  (cost=103206.82..18190602.43 rows=3951347 width=0)
>          Hash Cond: (x.granuleid = g.granuleid)
>          ->  Seq Scan on table2 x  (cost=0.00..10596253.01 rows=644241901 width=8)
>          ->  Hash  (cost=92363.72..92363.72 rows=660888 width=8)
>                ->  Index Only Scan using idx_table1 on table1 g  (cost=0.57..92363.72 rows=660888 width=8)
>                      Index Cond: (collectionid = '22467'::bigint)
> (7 rows)

> My question is, what gets put into the Hash?
> I assume the with "width=8" must refer to the size of the key.

It's the data payload width, but there's some per-row overhead too, which
is more than 8 bytes ... don't recall how much more at the moment.

> The reason I ask is because, when I try to run the query it fails due to
> temp file use over 10GB.

What have you got work_mem set to?  If the hashed table exceeds that,
you're going to end up with a multi-batch join, in which case most of
the outer table is going to get written into temp files as well.

I don't think it's a good idea to have such a small temp_file_limit
when you're working with such huge tables.  Either a merge or hash
join is likely to require temp file space similar to the table size,
and you do *not* want a nestloop join, at least not if you want your
answers in reasonable time.
        regards, tom lane


--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

Re: [ADMIN] question on hash joins

От
"Hartranft, Robert M. (GSFC-423.0)[RAYTHEON CO]"
Дата:
Thanks for the help!

We set:  work_mem = 16MB

Also, table1 has 100 million rows, not 1 million; that is why it uses the index
in my query.

Sorry if I am being dense, but I still have a question…
Is it possible for me to estimate the size of the hash and a value for 
the temp_file_limit setting using information in the explain plan?

For example, one possibility is that the hash contains the entire tuple for each
matching row.  In that case, I could multiply the number of rows estimated in
the explain output times an average row size to get an “order of magnitude” 
estimate for temp_file_limit.

Another possibility would be the hash only contains the columns
needed to satisfy the remaining query clauses (in my example that would just be the
granuleid column) and thus much smaller.

Thanks in advance,
Bob



On 10/18/17, 5:17 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
   "Hartranft, Robert M. (GSFC-423.0)[RAYTHEON CO]" <robert.m.hartranft@nasa.gov> writes:   > explain select count(1)
fromtable1 g join table2 x on x.granuleid = g.granuleid where g.collectionid = 22467;   >
                           QUERY PLAN   >
-------------------------------------------------------------------------------------------------------------------------------
 > Aggregate  (cost=18200480.80..18200480.81 rows=1 width=8)   >    ->  Hash Join  (cost=103206.82..18190602.43
rows=3951347width=0)   >          Hash Cond: (x.granuleid = g.granuleid)   >          ->  Seq Scan on table2 x
(cost=0.00..10596253.01rows=644241901 width=8)   >          ->  Hash  (cost=92363.72..92363.72 rows=660888 width=8)   >
              ->  Index Only Scan using idx_table1 on table1 g  (cost=0.57..92363.72 rows=660888 width=8)   >
          Index Cond: (collectionid = '22467'::bigint)   > (7 rows)      > My question is, what gets put into the Hash?
 > I assume the with "width=8" must refer to the size of the key.      It's the data payload width, but there's some
per-rowoverhead too, which   is more than 8 bytes ... don't recall how much more at the moment.      > The reason I ask
isbecause, when I try to run the query it fails due to   > temp file use over 10GB.      What have you got work_mem set
to? If the hashed table exceeds that,   you're going to end up with a multi-batch join, in which case most of   the
outertable is going to get written into temp files as well.      I don't think it's a good idea to have such a small
temp_file_limit  when you're working with such huge tables.  Either a merge or hash   join is likely to require temp
filespace similar to the table size,   and you do *not* want a nestloop join, at least not if you want your   answers
inreasonable time.                  regards, tom lane   
 


--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

Re: [ADMIN] question on hash joins

От
Tom Lane
Дата:
"Hartranft, Robert M. (GSFC-423.0)[RAYTHEON CO]" <robert.m.hartranft@nasa.gov> writes:
> Sorry if I am being dense, but I still have a question…
> Is it possible for me to estimate the size of the hash and a value for
> the temp_file_limit setting using information in the explain plan?

Well, it'd be (row_overhead + data_width) * number_of_rows.

Poking around in the source code, it looks like the row_overhead in
a tuplestore temp file is 10 bytes (can be more if you have nulls in
the data).  Your example seemed to be storing one bigint column,
so data_width is 8 bytes.  data_width can be a fairly squishy thing
to estimate if the data being passed through the join involves variable-
width columns, but the planner's number is usually an OK place to start.

> For example, one possibility is that the hash contains the entire tuple for each
> matching row.

No, it's just the columns that need to be used in or passed through the
join.  If you want to be clear about this you can use EXPLAIN VERBOSE
and check what columns are emitted by the plan node just below the hash.
        regards, tom lane


--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

Re: [ADMIN] question on hash joins

От
"Hartranft, Robert M. (GSFC-423.0)[RAYTHEON CO]"
Дата:
Thanks Tom,

Given that the hash would only contain keys and values needed for supporting
the query I am having a hard time understanding why I am exceeding the 
10 GB temp_file_limit.


In my case the hash estimates 660888 rows from table1
and I verified that is accurate.

In theory, the hash only contains a bigint key, bigint value, and some overhead.
But even if I assume 50 bytes per row that gives a total size of
33,044,400 bytes for the hash which is way below the 10 GB limit.

What could be causing the temp_file growth?


select granuleid
from table2
where granuleid in (select granuleid from table1 where collectionid = 22467);
ERROR:  temporary file size exceeds temp_file_limit (10485760kB)

explain select granuleid
from table2
where granuleid in (select granuleid from table1 where collectionid = 22467);
          QUERY PLAN                                                       
 

------------------------------------------------------------------------------------------------------------------------Hash
Join (cost=597264.56..18687640.08 rows=644348762 width=8)  Hash Cond: (table2.granuleid = table1.granuleid)  ->  Seq
Scanon table2  (cost=0.00..10598010.62 rows=644348762 width=8)  ->  Hash  (cost=586509.04..586509.04 rows=655562
width=8)       ->  Index Only Scan using idx_tabl1 on table1  (cost=0.57..586509.04 rows=655562 width=8)
IndexCond: (collectionid = '22467'::bigint)
 
(6 rows)


Bob


On 10/19/17, 10:14 AM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
   "Hartranft, Robert M. (GSFC-423.0)[RAYTHEON CO]" <robert.m.hartranft@nasa.gov> writes:   > Sorry if I am being
dense,but I still have a question…   > Is it possible for me to estimate the size of the hash and a value for    > the
temp_file_limitsetting using information in the explain plan?      Well, it'd be (row_overhead + data_width) *
number_of_rows.     Poking around in the source code, it looks like the row_overhead in   a tuplestore temp file is 10
bytes(can be more if you have nulls in   the data).  Your example seemed to be storing one bigint column,   so
data_widthis 8 bytes.  data_width can be a fairly squishy thing   to estimate if the data being passed through the join
involvesvariable-   width columns, but the planner's number is usually an OK place to start.      > For example, one
possibilityis that the hash contains the entire tuple for each   > matching row.      No, it's just the columns that
needto be used in or passed through the   join.  If you want to be clear about this you can use EXPLAIN VERBOSE   and
checkwhat columns are emitted by the plan node just below the hash.                  regards, tom lane   
 


--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

Re: [ADMIN] question on hash joins

От
Tom Lane
Дата:
"Hartranft, Robert M. (GSFC-423.0)[RAYTHEON CO]" <robert.m.hartranft@nasa.gov> writes:
> Given that the hash would only contain keys and values needed for supporting
> the query I am having a hard time understanding why I am exceeding the
> 10 GB temp_file_limit.

Because *both sides* of the join are getting dumped to temp files.
This is necessary when the hash requires multiple batches.  All
but the first batch of hash keys get written out to files, and then
we reload each batch of the inner relation into the in-memory table
and scan the corresponding batch file from the outer relation.

If you can make work_mem large enough to hold the inner relation
then the problem should go away.  Note though that the per-row
overhead is significantly larger in the in-memory representation;
don't have an estimate for that offhand.
        regards, tom lane


--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin