Re: bad estimation together with large work_mem generates terrible slow hash joins

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: bad estimation together with large work_mem generates terrible slow hash joins
Дата
Msg-id CA+TgmobAYGMuqvEPxANphEOLKjXzOKCZU79i1YfChmVmdtEEeg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: bad estimation together with large work_mem generates terrible slow hash joins  (Tomas Vondra <tv@fuzzy.cz>)
Ответы Re: bad estimation together with large work_mem generates terrible slow hash joins  (Tomas Vondra <tv@fuzzy.cz>)
Список pgsql-hackers
On Fri, Sep 5, 2014 at 3:23 PM, Tomas Vondra <tv@fuzzy.cz> wrote:
> as Heikki mentioned in his "commitfest status" message, this patch still
> has no reviewers :-( Is there anyone willing to pick up that, together
> with the 'dense allocation' patch (as those two are closely related)?
>
> I'm looking in Robert's direction, as he's the one who came up with the
> dense allocation idea, and also commented on the hasjoin bucket resizing
> patch ...
>
> I'd ask Pavel Stehule, but as he's sitting next to me in the office he's
> not really independent ;-) I'll do some reviews on the other patches
> over the weekend, to balance this of course.

Will any of those patches to be, heh heh, mine?

I am a bit confused by the relationship between the two patches you
posted.  The "combined" patch applies, but the other one does not.  If
this is a sequence of two patches, it seems like it would be more
helpful to post A and B rather than B and A+B.  Perhaps the other
patch is on a different thread but there's not an obvious reference to
said thread here.

In ExecChooseHashTableSize(), if buckets_bytes is independent of
nbatch, could we just compute it before working out dbatch, and just
deduct it from hash_table_bytes?  That seems like it'd do the same
thing for less work.  Either way, what happens if buckets_bytes is
already bigger than hash_table_bytes?

A few more stylistic issues that I see:

+               if ((hashtable->nbatch == 1) &&
+                       if (hashtable->nbuckets_optimal <= (INT_MAX/2))
+       if (size > (HASH_CHUNK_SIZE/8))

While I'm all in favor of using parentheses generously where it's
needed to add clarity, these and similar usages seem over the top to
me.

On a related note, the first of these reads like this if (stuff) { if
(more stuff) { do things } } which makes one wonder why we need two if
statements.

+
+       /* used for dense allocation of tuples (into linked chunks) */
+       HashChunk       chunks_batch;   /*  one list for the whole batch */
+}      HashJoinTableData;

If the original coding didn't have a blank line between the last
structure member and the closing brace, it's probably not right to
make it that way now.  There are similar issues at the end of some
functions you modified, and a few other places (like the new code in
ExecChooseHashTableSize and chunk_alloc) where there are extra blank
lines at the starts and ends of blocks.

+char * chunk_alloc(HashJoinTable hashtable, int size) {

Eh... no.

+       /* XXX maybe we should use MAXALIGN(size) here ... */

+1.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: Scaling shared buffer eviction
Следующее
От: Robert Haas
Дата:
Сообщение: Re: pg_background (and more parallelism infrastructure patches)