[HACKERS] [GSOC][weekly report 6] Eliminate O(N^2) scaling from rw-conflicttracking in serializable transactions

Поиск
Список
Период
Сортировка
От Mengxing Liu
Тема [HACKERS] [GSOC][weekly report 6] Eliminate O(N^2) scaling from rw-conflicttracking in serializable transactions
Дата
Msg-id 50422608.2673.15d4c5c99c8.Coremail.liu-mx15@mails.tsinghua.edu.cn
обсуждение исходный текст
Список pgsql-hackers
In the last week:
   1) I tried to find out what's wrong with my last patch: reducing the contention on SerializableFinishedListLock. 
   2) I used a skip list to replace the linked list to speedup conflict tracking. But there are still some bugs resulting in segment fault. 
       Here is the code: https://github.com/liumx10/postgresql/commit/c469e14e27c31edba5caff001647440040f53c84
       I will figure it out in the next days.

Here is the details.
1) Reducing the contention on SerializableFinishedListLock
In the last email, I reported that reducing the contention on FinishedListLock didn't increase the performance. The system became even 
slower in some cases. I thought the key point was not the problem of my codes. The original Pseudo code looks like this:
     
     lockAcquire(SerializableFinishedListLock)
     for  predlock in this Sxact:
            lockAcquire(patitionlock);
           // do something;
            unlock(paritionlock);
     releaseAcquire(SerializableFinishedListLock)

SerializableFinishedListLock likes a coarse-grained lock, while partition locks like fine-grained lock. My patch essentially remove the outer lock.
But unfortunately, there are many other places requiring partition locks. We have only 16 partition locks in all. 
So even without SerializableFinishedListLock, contentions will happen on partition locks. The performance wouldn't be improved. 

Why the performance became worse in some cases? Context switch may be one of the reason. 
It was easier to fail on getting locks when using fine-grained locks,
which would hang the current process and result in a context switch.
For example, if it failed to get partition locks 5 times in a for loop, there would be 5 context switch at least.
I used vmstat to count the context switch. In the benchmark where my patch had a worse performance, there were more context switch happened than using the original code. 
I also modify the source code to count where requiring partition locks failed. In the original version, only a few (no more than 1%) locking failure came from the pseudo code above. 
But in my patch, more than 50% failure came from this part. 

2) Skip list
To make it simple, I didn't develop an independent data structure for skip list. Instead, I used two linked list to simulate skip list. 
Most transactions have no more than 50 conflicts, so two level s kip list is enough now. 
I'll make a clean patch after I figure out bugs. 

--
Sincerely

Mengxing Liu





В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: [HACKERS] Something for the TODO list: deprecating abstime and friends
Следующее
От: Mark Cave-Ayland
Дата:
Сообщение: Re: [HACKERS] More flexible LDAP auth search filters?