Re: Should HashSetOp go away
| От | David Rowley |
|---|---|
| Тема | Re: Should HashSetOp go away |
| Дата | |
| Msg-id | CAApHDvoUVx5q=6gNPJ4Ve7bK3=73OKS57VTKHxQEJp_JDurEEg@mail.gmail.com обсуждение исходный текст |
| Ответ на | Re: Should HashSetOp go away (Tom Lane <tgl@sss.pgh.pa.us>) |
| Ответы |
Re: Should HashSetOp go away
|
| Список | pgsql-hackers |
On Mon, 27 Oct 2025 at 07:00, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Jeff Janes <jeff.janes@gmail.com> writes: > > I was thinking of ways to improve the memory usage (or at least its > > estimation) but decided maybe it would be better if HashSetOp went away > > entirely. As far as I can tell HashSetOp has nothing to recommend it other > > than the fact that it already exists. If we instead used an elaboration on > > Hash Anti Join, then it would automatically get spilling to disk, parallel > > operations, better estimation, and the benefits of whatever micro > > optimizations people lavish on the highly used HashJoin machinery but not > > the obscure, little-used HashSetOp. > > This seems like a pretty bad solution. It would imply exporting the > complexities of duplicate-counting for EXCEPT ALL and INTERSECT ALL > modes into the hash-join logic. We don't need that extra complexity > there (it's more than enough of a mess already), and we don't need > whatever performance hit ordinary hash joins would take. If Hash Joins did support IS NOT DISTINCT FROM clauses, then at least the non-ALL cases could be done with Hash Semi Join and Hash Anti Join for INTERSECT and EXCEPT, respectively, followed by a HashAgg. I doubt it would be any faster for the general case, but at least it would allow those setop queries to run when the inputs don't fit in memory. It's not ideal though, as when the planner underestimates, Hashed Setops could still blow up. David
В списке pgsql-hackers по дате отправления: