Re: Question with hashed IN

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: Question with hashed IN
Дата
Msg-id 18896.1061257437@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: Question with hashed IN  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
I said:
> Oh, I see what it is.  The initial sizing of the hash table (number of
> buckets) is done using the planner's estimate of the number of rows out
> of the subplan.  In your later examples, the hash table is woefully
> overloaded and so searching it takes longer (too many items on each
> hash chain).

> I'm not sure how important this is to work on.  We could try to make the
> executor's hash code more able to adapt when the hash table grows beyond
> what it was expecting (by rehashing, etc) but personally I'd rather spend
> the time on trying to improve the estimate to begin with.

After some further looking, I realized that the backend's standard hashtable
management code (dynahash.c) already has a perfectly good algorithm for
expanding hashtables when they start to get full.  But the code I'd
written for hashed aggregation and grouping didn't use dynahash.c,
because the latter didn't support anything more complex than memcmp()
for comparing keys.  This was obviously a dumb decision in hindsight,
so I've invested the couple hours' work needed to improve dynahash's API
and make the hashed-aggregation code use it.

As of CVS tip I get more reasonable behavior in your example:
                                                       QUERY PLAN
 
 

---------------------------------------------------------------------------------------------------------------------------Seq
Scanon fktest  (cost=12510.00..12532.50 rows=500 width=4) (actual time=7803.77..7803.77 rows=0 loops=1)  Filter: (NOT
(hashedsubplan))  SubPlan    ->  Seq Scan on pktest  (cost=0.00..10010.00 rows=1000000 width=4) (actual
time=248.34..5408.39rows=1000000 loops=1)Total runtime: 7979.45 msec
 
(5 rows)
                                                     QUERY PLAN                                                       

-----------------------------------------------------------------------------------------------------------------------Seq
Scanon fktest  (cost=1260.00..1282.50 rows=500 width=4) (actual time=4482.05..4482.05 rows=0 loops=1)  Filter: (NOT
(hashedsubplan))  SubPlan    ->  Seq Scan on pktest  (cost=0.00..1010.00 rows=100000 width=4) (actual
time=0.09..2052.05rows=1000000 loops=1)Total runtime: 4651.02 msec
 
(5 rows)
                                                    QUERY PLAN                                                      

---------------------------------------------------------------------------------------------------------------------Seq
Scanon fktest  (cost=135.00..157.50 rows=500 width=4) (actual time=5830.16..5830.16 rows=0 loops=1)  Filter: (NOT
(hashedsubplan))  SubPlan    ->  Seq Scan on pktest  (cost=0.00..110.00 rows=10000 width=4) (actual time=0.03..3413.05
rows=1000000loops=1)Total runtime: 5994.26 msec
 
(5 rows)
                                                   QUERY PLAN                                                     
-------------------------------------------------------------------------------------------------------------------Seq
Scanon fktest  (cost=22.50..45.00 rows=500 width=4) (actual time=4160.85..4160.85 rows=0 loops=1)  Filter: (NOT (hashed
subplan)) SubPlan    ->  Seq Scan on pktest  (cost=0.00..20.00 rows=1000 width=4) (actual time=0.03..1755.32
rows=1000000loops=1)Total runtime: 4326.24 msec
 
(5 rows)

Before, the same four tests gave runtimes like these on this machine:Total runtime: 16590.97 msecTotal runtime:
13792.45msecTotal runtime: 22702.87 msecTotal runtime: 115465.43 msec
 
(I forget now whether those times were taken with a backend compiled for
profiling --- so they may not be directly comparable.  But at least I
can say that the increase in runtime with smaller estimates is gone.)
        regards, tom lane


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Qualified tables in error messages
Следующее
От: "Christopher Kings-Lynne"
Дата:
Сообщение: backwards-compat problem?