Re: BUG #19030: Hash join leads to extremely high memory usage
От | Tom Lane |
---|---|
Тема | Re: BUG #19030: Hash join leads to extremely high memory usage |
Дата | |
Msg-id | 51461.1755988143@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | BUG #19030: Hash join leads to extremely high memory usage (PG Bug reporting form <noreply@postgresql.org>) |
Список | pgsql-bugs |
PG Bug reporting form <noreply@postgresql.org> writes: > Description > - Two tables, left join > - The left table has significantly fewer rows than the right table > - The left table has very large rows (many columns with high memory usage) > - The left table has many null values in the join column > - A hash join is used > - The hash node is built from the left table > - The query results in extremely high memory usage (100x work_mem in the > example, > 1000x in real case) If this is specific to the case of many null join values, it's a known problem that I have a patch in the queue for [1]. On your example, I get this on HEAD: QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------ Hash Right Join (cost=459884.00..1367112.00 rows=10000000 width=65) (actual time=2557.730..11727.429 rows=10000000.00 loops=1) Hash Cond: (right_.c = left_.c) Buffers: shared read=201216, temp read=131098 written=131098 -> Seq Scan on right_ (cost=0.00..283520.00 rows=20000000 width=8) (actual time=0.060..806.467 rows=20000000.00 loops=1) Buffers: shared read=83520 -> Hash (cost=217696.00..217696.00 rows=10000000 width=65) (actual time=2494.680..2494.680 rows=10000000.00 loops=1) Buckets: 131072 (originally 131072) Batches: 16384 (originally 256) Memory Usage: 440509kB Buffers: shared read=117696, temp written=46737 -> Seq Scan on left_ (cost=0.00..217696.00 rows=10000000 width=65) (actual time=0.096..481.139 rows=10000000.00loops=1) Buffers: shared read=117696 Planning: Buffers: shared hit=136 read=35 Memory: used=21kB allocated=32kB Planning Time: 0.703 ms Execution Time: 11946.100 ms (15 rows) and this with the aforesaid patch: QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------ Hash Right Join (cost=459884.00..1367112.00 rows=10000000 width=65) (actual time=1553.342..6589.352 rows=10000000.00 loops=1) Hash Cond: (right_.c = left_.c) Buffers: shared hit=188 read=201028, temp read=128626 written=128626 -> Seq Scan on right_ (cost=0.00..283520.00 rows=20000000 width=8) (actual time=0.099..750.684 rows=20000000.00 loops=1) Buffers: shared hit=94 read=83426 -> Hash (cost=217696.00..217696.00 rows=10000000 width=65) (actual time=1551.739..1551.740 rows=5000000.00 loops=1) Buckets: 131072 Batches: 256 Memory Usage: 2906kB Buffers: shared hit=94 read=117602, temp written=93662 -> Seq Scan on left_ (cost=0.00..217696.00 rows=10000000 width=65) (actual time=0.106..470.671 rows=10000000.00loops=1) Buffers: shared hit=94 read=117602 Planning: Memory: used=20kB allocated=32kB Planning Time: 0.122 ms Execution Time: 6827.197 ms (14 rows) (Hmm, looking at this, it's apparent that the patch causes the Hash node not to count the null-keyed rows in its EXPLAIN output. On the one hand, that's an accurate reflection of how much went into the hash table, but on the other hand it's pretty confusing.) I doubt we'd consider back-patching such a change, but if you want to see it happen for v19, you could help by reviewing/testing. regards, tom lane [1] https://www.postgresql.org/message-id/flat/3061845.1746486714%40sss.pgh.pa.us
В списке pgsql-bugs по дате отправления: