Re: testing ProcArrayLock patches

Поиск

Список

Период

Сортировка

От	Andres Freund
Тема	Re: testing ProcArrayLock patches
Дата	19 ноября 2011 г. 02:08:35
Msg-id	201111190408.16385.andres@anarazel.de обсуждение исходный текст
Ответ на	Re: testing ProcArrayLock patches ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>)
Список	pgsql-hackers

Дерево обсуждения

On Saturday, November 19, 2011 12:18:07 AM Kevin Grittner wrote:
> Andres Freund <andres@anarazel.de> wrote:
> > I think opannotate -a -s produces output with instructions/code
> > intermingled.
> 
> Thanks.  I'll check out perf later (thanks for the tips!), but for
> now, here's the function which was at the top of my oprofile
> results, annotated with those options.  I'm afraid it's a bit
> intimidating to me -- the last time I did much with X86 assembly
> language was in the mid-80s, on an 80286.  :-/ 
While my assembly knoweldge surely isn't from the 80s be assured that I find it 
intimidating as well ;)

> Hopefully, since
> this is at the top of the oprofile results when running with
> prepared statements, it will be of use to somebody.
I think in quite many situations hash_search_with_hash_value is rather 
noticeable in the profiles. Even without concurrency...

Looking at your annotation output the code seems to be almost entirely stalled 
waiting for memory.
The first stall is after the first reading memory access which is likely to be 
uncached (the first cacheline of the HTAB is accessed before but that will be 
in the cache). The interesting thing is that I would have expected a higher 
likelihood for this to stay in the cache. 2225  0.0165 :  70b543:    mov    (%rdi),%r15              :static inline
uint32             :calc_bucket(HASHHDR *hctl, uint32 hash_val)              :{              :    uint32        bucket;
            :              :    bucket = hash_val & hctl->high_mask; 4544  0.0337 :  70b546:    and    0x2c(%r15),%ebx
           :    if (bucket > hctl->max_bucket)53409  0.3958 :  70b54a:    cmp    0x28(%r15),%ebx              :
70b54e:   jbe    70b554 <hash_search_with_hash_value+0x34>
 

So a stall here is not that surprising.


Here we fetch data from memory which is unlikely to be prefetchable and then 
require the result from that fetch. Note how  segp = hashp->dir[segment_num]; 
is distributed over line 52, 64, 83.
              :    segp = hashp->dir[segment_num]; 2062  0.0153 :  70b562:    shr    %cl,%eax  309  0.0023 :  70b564:
mov    %eax,%eax  643  0.0048 :  70b566:    mov    (%rdx,%rax,8),%rbp              :              :    if (segp ==
NULL)43329 0.3211 :  70b56a:    test   %rbp,%rbp
 





The next cacheline is referenced here. Again a fetch from memory which is soon 
after needed to continue.
Unless I misunderstood the code-flow this disproves my theory that we might 
have many collisions as that test seems to be outside the test (              :    prevBucketPtr = &segp[segment_ndx];
           :    currBucket = *prevBucketPtr;  122 9.0e-04 :  70b586:    mov    0x0(%rbp),%rbx              :
 :    /*              :     * Follow collision chain looking for matching key              :     */              :
match= hashp->match;        /* save one fetch in inner 
 
loop */              :    keysize = hashp->keysize;    /* ditto */99903  0.7404 :  70b58a:    mov    %rax,0x18(%rsp)
         :              :    while (currBucket != NULL) 1066  0.0079 :  70b58f:    test   %rbx,%rbx
 




line 136 is the first time the contents of the current bucket is needed. Thats 
why the test is so noticeable.              :        currBucket = *prevBucketPtr;  655  0.0049 :  70b5a3:    mov
(%rbx),%rbx             :     * Follow collision chain looking for matching key              :     */              :
match= hashp->match;        /* save one fetch in inner 
 
loop */              :    keysize = hashp->keysize;    /* ditto */              :              :    while (currBucket
!=NULL)  608  0.0045 :  70b5a6:    test   %rbx,%rbx              :  70b5a9:    je     70b5d0
<hash_search_with_hash_value+0xb0>             :    {              :        if (currBucket->hashvalue == hashvalue &&
3504 0.0260 :  70b5ab:    cmp    %r12d,0x8(%rbx)98486  0.7299 :  70b5af:    nop 1233  0.0091 :  70b5b0:    jne
70b5a0<hash_search_with_hash_value+0x80>
 


That covers all the slow points in the function. And unless I am missing 
something those are all the fetched cachelines of that function... For 
HASH_FIND that is.


So I think that reinforces my belive that ordinary cachemisses are the culprit 
here. Which is to be excepted in a hashtable...


Andres


PS: No idea whether that rambling made sense to anyone... But I looked at that 
function fo  the first time ;)

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Robert Haas
Дата: 19 ноября 2011 г., 01:56:13
Сообщение: Re: Inlining comparators as a performance optimisation

Следующее

От: Simon Riggs
Дата: 19 ноября 2011 г., 08:22:05
Сообщение: Re: foreign key locks, 2nd attempt

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: testing ProcArrayLock patches

Предыдущее

Следующее