Re: Query slower on 8.0.3 (Windows) vs 7.3 (cygwin)

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: Query slower on 8.0.3 (Windows) vs 7.3 (cygwin)
Дата
Msg-id 6518.1127509969@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: Query slower on 8.0.3 (Windows) vs 7.3 (cygwin)  ("Gurpreet Aulakh" <gaulakh@ecmarket.com>)
Ответы Re: Query slower on 8.0.3 (Windows) vs 7.3 (cygwin)
Список pgsql-performance
"Gurpreet Aulakh" <gaulakh@ecmarket.com> writes:
> After further investigation I have found that the reason why the query is
> slower on 8.0.3 is that the hash and hash joins are slower on the 8.0.3.
> So the question comes down to : Why are hash and hash joins slower?

I looked into this a bit and determined that the problem seems to have
been introduced here:

2002-12-30 10:21  tgl

    * src/: backend/executor/nodeHash.c,
    backend/executor/nodeHashjoin.c, backend/optimizer/path/costsize.c,
    include/executor/nodeHash.h: Better solution to integer overflow
    problem in hash batch-number computation: reduce the bucket number
    mod nbatch.  This changes the association between original bucket
    numbers and batches, but that doesn't matter.  Minor other cleanups
    in hashjoin code to help centralize decisions.

(which means it's present in 7.4 as well as 8.0).  The code now
groups tuples into hash batches according to
    (hashvalue % totalbuckets) % nbatch
When a tuple that is not in the first batch is reloaded, it is placed
into a bucket according to
    (hashvalue % nbuckets)
This means that if totalbuckets, nbatch, and nbuckets have a common
factor F, the buckets won't be evenly used; in fact, only one in every F
buckets will be used at all, the rest remaining empty.  The ones that
are used accordingly will contain about F times more tuples than
intended.  The slowdown comes from having to compare these extra tuples
against the outer-relation tuples.

7.3 uses a different algorithm for grouping tuples that avoids this
problem, but it has performance issues of its own (in particular, to
avoid integer overflow we have to limit the number of batches we can
have).  So just reverting this patch doesn't seem very attractive.

The problem no longer exists in 8.1 because of rewrites undertaken for
another purpose, so I'm sort of tempted to do nothing.  To fix this in
the back branches we'd have to develop new code that won't ever go into
CVS tip and thus will never get beta-tested.  The risk of breaking
things seems higher than I'd like.

If we did want to fix it, my first idea is to increment nbatch looking
for a value that has no common factor with nbuckets.

            regards, tom lane

В списке pgsql-performance по дате отправления:

Предыдущее
От: Ron Peacetree
Дата:
Сообщение: Re: Releasing memory during External sorting?
Следующее
От: "Cristian Prieto"
Дата:
Сообщение: Index use in BETWEEN statement...