Re: Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets
От | Joshua Tolley |
---|---|
Тема | Re: Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets |
Дата | |
Msg-id | 20081223145146.GA5882@uber обсуждение исходный текст |
Ответ на | Re: Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets ("Robert Haas" <robertmhaas@gmail.com>) |
Ответы |
Re: Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets
|
Список | pgsql-hackers |
On Tue, Dec 23, 2008 at 09:22:27AM -0500, Robert Haas wrote: > On Tue, Dec 23, 2008 at 2:21 AM, Bryce Cutt <pandasuit@gmail.com> wrote: > > Because there is no nice way in PostgreSQL (that I know of) to derive > > a histogram after a join (on an intermediate result) currently > > usingMostCommonValues is only enabled on a join when the outer (probe) > > side is a table scan (seq scan only actually). See > > getMostCommonValues (soon to be called > > ExecHashJoinGetMostCommonValues) for the logic that determines this. So my test case of "do a whole bunch of hash joins in a test query" isn't really valid. Makes sense. I did another, more haphazard test on a query with fewer joins, and saw noticeable speedups. > It's starting to seem to me that the case where this patch provides a > benefit is so narrow that I'm not sure it's worth the extra code. Not that anyone asked, but I don't consider myself qualified to render judgement on that point. Code size is, I guess, a maintainability issue, and I'm not terribly experienced maintaining PostgreSQL :) > Is it realistic to think that the MCVs of the base relation might > still be applicable to the joinrel? It's certainly easy to think of > counterexamples, but it might be a good approximation more often than > not. It's equivalent to our assumption that distributions of values in columns in the same table are independent. Making that assumption in this case would probably result in occasional dramatic speed improvements similar to the ones we've seen in less complex joins, offset by just-as-occasional dramatic slowdowns of similar magnitude. In other words, it will increase the variance of our results. - Josh
В списке pgsql-hackers по дате отправления: