Re: [HACKERS] WIP: [[Parallel] Shared] Hash

Поиск
Список
Период
Сортировка
От Ashutosh Bapat
Тема Re: [HACKERS] WIP: [[Parallel] Shared] Hash
Дата
Msg-id CAFjFpReeHV+nyroU11M98yBRB5yzcsvs0ppr5mdJv6nbewTfsg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] WIP: [[Parallel] Shared] Hash  (Thomas Munro <thomas.munro@enterprisedb.com>)
Ответы Re: [HACKERS] WIP: [[Parallel] Shared] Hash  (Rafia Sabih <rafia.sabih@enterprisedb.com>)
Список pgsql-hackers
>
>> However, ExecHashIncreaseNumBatches() may change the
>> number of buckets; the patch does not seem to account for spaceUsed changes
>> because of that.
>
> That's what this hunk is intended to do:
>
> @@ -795,6 +808,12 @@ ExecHashIncreaseNumBuckets(HashJoinTable hashtable)
>         TRACE_POSTGRESQL_HASH_INCREASE_BUCKETS(hashtable->nbuckets,
>
>             hashtable->nbuckets_optimal);
>
> +       /* account for the increase in space that will be used by buckets */
> +       hashtable->spaceUsed += sizeof(HashJoinTuple) *
> +               (hashtable->nbuckets_optimal - hashtable->nbuckets);
> +       if (hashtable->spaceUsed > hashtable->spacePeak)
> +               hashtable->spacePeak = hashtable->spaceUsed;
> +

Sorry, I missed that hunk. You are right, it's getting accounted for.

>>
>> In ExecHashTableReset(), do we want to update spacePeak while setting
>> spaceUsed.
>
> I figured there was no way that the new spaceUsed value could be
> bigger than spacePeak, because we threw out all chunks and have just
> the bucket array, and we had that number of buckets before, so
> spacePeak must at least have been set to a number >= this number
> either when we expanded to this many buckets, or when we created the
> hashtable in the first place.  Perhaps I should
> Assert(hashtable->spaceUsed <= hashtable->spacePeak).

That would help, better if you explain this with a comment before Assert.

>
>> While this patch tracks space usage more accurately, I am afraid we might be
>> overdoing it; a reason why we don't track space usage accurately now. But I
>> think I will leave it to be judged by someone who is more familiar with the
>> code and possibly has historical perspective.
>
> Well it's not doing more work; it doesn't make any practical
> difference whatsoever but it's technically doing less work than
> master, by doing memory accounting only when acquiring a new 32KB
> chunk.

This patch does more work while counting the space used by buckets, I
guess. AFAIU, right now, that happens only after the hash table is
built completely. But that's fine. I am not worried about whether the
it's less work or more.

> But if by overdoing it you mean that no one really cares about
> the tiny increase in accuracy so the patch on its own is a bit of a
> waste of time, you're probably right.

This is what I meant by overdoing; you have spelled it better.

> Depending on tuple size, you
> could imagine something like 64 bytes of header and unused space per
> 32KB chunk that we're not accounting for, and that's only 0.2%.  So I
> probably wouldn't propose this refactoring just on accuracy grounds
> alone.
>
> This refactoring is intended to pave the way for shared memory
> accounting in the later patches, which would otherwise generate ugly
> IPC if done for every time a tuple is allocated.  I considered using
> atomic add to count space per tuple, or maintaining per-backend local
> subtotals and periodically summing.  Then I realised that switching to
> per-chunk accounting would fix the IPC problem AND be justifiable on
> theoretical grounds.  When we allocate a new 32KB chunk, we really are
> using 32KB more of your memory.

+1.

Thanks for considering the comments.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: [HACKERS] Reporting planning time with EXPLAIN
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: [HACKERS] postgres_fdw bug in 9.6