Re: accounting for memory used for BufFile during hash joins
От | Tomas Vondra |
---|---|
Тема | Re: accounting for memory used for BufFile during hash joins |
Дата | |
Msg-id | 20190508133438.7nfzyioihjcvkjwi@development обсуждение исходный текст |
Ответ на | Re: accounting for memory used for BufFile during hash joins (Melanie Plageman <melanieplageman@gmail.com>) |
Список | pgsql-hackers |
On Tue, May 07, 2019 at 05:43:56PM -0700, Melanie Plageman wrote: > On Tue, May 7, 2019 at 6:59 AM Tomas Vondra <tomas.vondra@2ndquadrant.com> > wrote: > > On Tue, May 07, 2019 at 04:28:36PM +1200, Thomas Munro wrote: > >On Tue, May 7, 2019 at 3:15 PM Tomas Vondra > ><tomas.vondra@2ndquadrant.com> wrote: > >> On Tue, May 07, 2019 at 01:48:40PM +1200, Thomas Munro wrote: > >> Switching to some other algorithm during execution moves the goal > posts > >> to the next galaxy, I'm afraid. > > > >The main problem I'm aware of with sort-merge join is: not all that is > >hashable is sortable. So BNL is actually the only solution I'm aware > >of for problem B that doesn't involve changing a fundamental thing > >about PostgreSQL's data type requirements. > > > > Sure, each of those algorithms has limitations. But I think that's > mostly > irrelevant to the main issue - switching between algorithms > mid-execution. > At that point some of the tuples might have been already sent sent to > the > other nodes, and I have no idea how to "resume" the tuple stream short > of > buffering everything locally until the join completes. And that would be > rather terrible, I guess. > > What if you switched to NLJ on a batch-by-batch basis and did it before > starting > execution of the join but after building the inner side of the hash > table. That > way, no tuples will have been sent to other nodes yet. > Interesting idea! I think you're right doing it on a per-batch basis would solve that problem. Essentially, if all (or >95%) of the tuples has the same hash value, we could switch to a special "degraded" mode doing something like a NL. At that point the hash table benefits are lost anyway, because all the tuples are in a single chain, so it's not going to be much slower. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
В списке pgsql-hackers по дате отправления: