Re: bad estimation together with large work_mem generates terrible slow hash joins

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: bad estimation together with large work_mem generates terrible slow hash joins
Дата
Msg-id CA+TgmoZbF_oE5aK_x9DKP5U5ZzfU3+bc3C10brO4HauJCm4h-Q@mail.gmail.com
обсуждение исходный текст
Ответ на Re: bad estimation together with large work_mem generates terrible slow hash joins  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: bad estimation together with large work_mem generates terrible slow hash joins  (Tomas Vondra <tv@fuzzy.cz>)
Список pgsql-hackers
On Fri, Sep 12, 2014 at 8:28 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Sep 11, 2014 at 5:57 PM, Tomas Vondra <tv@fuzzy.cz> wrote:
>> Attached is the patch split as suggested:
>>
>> (a) hashjoin-nbuckets-v14a-size.patch
>>
>>     * NTUP_PER_BUCKET=1
>>     * counting buckets towards work_mem
>>     * changes in ExecChooseHashTableSize (due to the other changes)
>
> OK, I'm going to work on this one now.  I have some ideas about how to
> simplify this a bit and will post an update in a few hours once I see
> how that pans out.

Here's what I came up with.  I believe this has the same semantics as
your patch for less code, but please check, because I might be wrong.
The main simplifications I made were:

(1) In master, we decide whether or not to batch based on the size of
the data, without knowing the number of buckets.  If we want to
account for the bucket overhead, that doesn't work.  But in your
patch, we basically computed the number of buckets we'd need for the
single-batch case, then decide whether to do a single batch, and if
yes, computed the same values over again with the same formula phrased
differently.  I revised that so that the single-batch case is
calculated only once.  If everything fits, we're all set, and don't
need to recalculate anything.

(2) In your patch, once we decide we need to batch, you set nbatch as
now, and then check whether the computed value is still adequate after
subtracting buckets_byte from hash_table_bytes.  I revised that so
that we make the initial batch estimate of dbatch by dividing
inner_rel_bytes by hash_table_bytes - bucket_bytes rather than
hash_table_bytes.  I think that avoids the need to maybe increase
nbatch again afterwards.

I'm comfortable with this version if you are, but (maybe as a
follow-on commit) I think we could make this even a bit smarter.  If
inner_rel_bytes + bucket_bytes > hash_table_bytes but inner_rel_bytes
+ bucket_bytes / 4 < hash_table_bytes, then reduce the number of
buckets by 2x or 4x to make it fit.  That would provide a bit of
additional safety margin against cases where this patch might
unluckily lose.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Pavel Stehule
Дата:
Сообщение: Re: expanded mode is still broken
Следующее
От: Josh Berkus
Дата:
Сообщение: Re: jsonb format is pessimal for toast compression