Re: Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets

Поиск
Список
Период
Сортировка
От Joshua Tolley
Тема Re: Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets
Дата
Msg-id 20081223145146.GA5882@uber
обсуждение исходный текст
Ответ на Re: Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets  ("Robert Haas" <robertmhaas@gmail.com>)
Ответы Re: Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets  ("Robert Haas" <robertmhaas@gmail.com>)
Список pgsql-hackers
On Tue, Dec 23, 2008 at 09:22:27AM -0500, Robert Haas wrote:
> On Tue, Dec 23, 2008 at 2:21 AM, Bryce Cutt <pandasuit@gmail.com> wrote:
> > Because there is no nice way in PostgreSQL (that I know of) to derive
> > a histogram after a join (on an intermediate result) currently
> > usingMostCommonValues is only enabled on a join when the outer (probe)
> > side is a table scan (seq scan only actually).  See
> > getMostCommonValues (soon to be called
> > ExecHashJoinGetMostCommonValues) for the logic that determines this.

So my test case of "do a whole bunch of hash joins in a test query"
isn't really valid. Makes sense. I did another, more haphazard test on a
query with fewer joins, and saw noticeable speedups.

> It's starting to seem to me that the case where this patch provides a
> benefit is so narrow that I'm not sure it's worth the extra code.

Not that anyone asked, but I don't consider myself qualified to render
judgement on that point. Code size is, I guess, a maintainability issue,
and I'm not terribly experienced maintaining PostgreSQL :)
> Is it realistic to think that the MCVs of the base relation might
> still be applicable to the joinrel?  It's certainly easy to think of
> counterexamples, but it might be a good approximation more often than
> not.

It's equivalent to our assumption that distributions of values in
columns in the same table are independent. Making that assumption in
this case would probably result in occasional dramatic speed
improvements similar to the ones we've seen in less complex joins,
offset by just-as-occasional dramatic slowdowns of similar magnitude. In
other words, it will increase the variance of our results.

- Josh

В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Kevin Grittner"
Дата:
Сообщение: Re: incoherent view of serializable transactions
Следующее
От: Emmanuel Cecchet
Дата:
Сообщение: Re: incoherent view of serializable transactions