Re: bad estimation together with large work_mem generates terrible slow hash joins

Поиск

Список

Период

Сортировка

От	Robert Haas
Тема	Re: bad estimation together with large work_mem generates terrible slow hash joins
Дата	9 сентября 2014 г. 14:10:08
Msg-id	CA+TgmoaRB4r6norb1P-2xFXavwLjd4EUwi_ERnNfQ5PQq0WOYA@mail.gmail.com обсуждение исходный текст
Ответ на	Re: bad estimation together with large work_mem generates terrible slow hash joins (Tomas Vondra <tv@fuzzy.cz>)
Ответы	Re: bad estimation together with large work_mem generates terrible slow hash joins
Список	pgsql-hackers

Дерево обсуждения

On Mon, Sep 8, 2014 at 5:53 PM, Tomas Vondra <tv@fuzzy.cz> wrote:
> So I only posted the separate patch for those who want to do a review,
> and then a "complete patch" with both parts combined. But it sure may be
> a bit confusing.

Let's do this: post each new version of the patches only on this
thread, as a series of patches that can be applied one after another.

>> In ExecChooseHashTableSize(), if buckets_bytes is independent of
>> nbatch, could we just compute it before working out dbatch, and just
>> deduct it from hash_table_bytes? That seems like it'd do the same
>> thing for less work.
>
> Well, we can do that. But I really don't think that's going to make
> measurable difference. And I think the code is more clear this way.

Really?  It seems to me that you're doing more or less the same
calculation that's already been done a second time.  It took me 20
minutes of staring to figure out what it was really doing.  Now
admittedly I was a little low on caffeine, but...

>> Either way, what happens if buckets_bytes is already bigger than
>> hash_table_bytes?
>
> Hm, I don't see how that could happen.

How about an Assert() and a comment, then?

>> A few more stylistic issues that I see:
>>
>> +               if ((hashtable->nbatch == 1) &&
>> +                       if (hashtable->nbuckets_optimal <= (INT_MAX/2))
>> +       if (size > (HASH_CHUNK_SIZE/8))
>>
>> While I'm all in favor of using parentheses generously where it's
>> needed to add clarity, these and similar usages seem over the top to
>> me.
>
> It seemed more readable for me at the time of coding it, and it still
> seems better this way, but ...
>
> https://www.youtube.com/watch?v=CxK_nA2iVXw

Heh.  Well, at any rate, I think PostgreSQL style is to not use
parentheses in that situation.

>> +char * chunk_alloc(HashJoinTable hashtable, int size) {
>>
>> Eh... no.
>>
>> +       /* XXX maybe we should use MAXALIGN(size) here ... */
>>
>> +1.
>
> I'm not sure what's the 'no' pointing at? Maybe that the parenthesis
> should be on the next line? And is the +1 about doing MAXALING?

The "no" is about the fact that you have the return type, the function
name, and the opening brace on one line instead of three lines as is
project style.  The +1 is for applying MAXALIGN to the size.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: bad estimation together with large work_mem generates terrible slow hash joins