Re: Proposed Patch to Improve Performance of Multi-BatchHash Join for Skewed Data Sets

Поиск
Список
Период
Сортировка
От Bryce Cutt
Тема Re: Proposed Patch to Improve Performance of Multi-BatchHash Join for Skewed Data Sets
Дата
Msg-id 1924d1180903201745i7e3f6876w2e108571e3b1843d@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Proposed Patch to Improve Performance of Multi-BatchHash Join for Skewed Data Sets  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
Not necessarily true.  Seeing as (when the statistics are correct) we
know each of these inner tuples will match with the largest amount of
outer tuples it is just as much of a win per inner tuple as when they
are unique.  There is just a chance you will have to give up on the
optimization part way through if too many inner tuples fall into the
new "skew buckets" (formerly IM buckets) and dump the tuples back into
the main buckets.  The potential win is still pretty high though.

- Bryce Cutt


On Fri, Mar 20, 2009 at 5:35 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Mar 20, 2009 at 8:14 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Bryce Cutt <pandasuit@gmail.com> writes:
>>> Here is the new patch.
>>
>> Applied with revisions.  I undid some of the "optimizations" that
>> cluttered the code in order to save a cycle or two per tuple --- as per
>> previous discussion, that's not what the performance questions were
>> about.  Also, I did not like the terminology "in-memory"/"IM"; it seemed
>> confusing since the main hash table is in-memory too.  I revised the
>> code to consistently refer to the additional hash table as a "skew"
>> hashtable and the optimization in general as skew optimization.  Hope
>> that seems reasonable to you --- we could search-and-replace it to
>> something else if you'd prefer.
>>
>> For the moment, I didn't really do anything about teaching the planner
>> to account for this optimization in its cost estimates.  The initial
>> estimate of the number of MCVs that will be specially treated seems to
>> me to be too high (it's only accurate if the inner relation is unique),
>> but getting a more accurate estimate seems pretty hard, and it's not
>> clear it's worth the trouble.  Without that, though, you can't tell
>> what fraction of outer tuples will get the short-circuit treatment.
>
> If the inner relation isn't fairly close to unique you shouldn't be
> using this optimization in the first place.
>
> ...Robert
>


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: Proposed Patch to Improve Performance of Multi-BatchHash Join for Skewed Data Sets
Следующее
От: Andrew Gierth
Дата:
Сообщение: contrib function naming, and upgrade issues