Обсуждение: [HACKERS] [GSOC][weekly report 3] Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions

Поиск
Список
Период
Сортировка
Hi, all. In the last week, I fixed some bugs and replace the list of “PossibleUnsafeConflicts” with hash table.
It passed the existing 178 tests. The patch is attached.

But in my benchmark, the throughput decrease by 15% after the modification.
Can you help me do a quick review to find if there is anything wrong?
I also attached the flame graph before/after the modification for reference.

Here is my questions:
1. Is there any function in HTAB like “clear” that can make itself empty quickly?
In this patch, when releasing a transaction object, I need to scan the hash table and
use “hash_search” to remove entries one by one. It would make releasing operation slower.

In a previous email, I reported that many backends wait for the lock “SerializableFinishedListLock”;
If we don't implement functions like “ReleaseOneSerializableXact” quickly, they will be the bottleneck.

2. Is the HTAB thread-safe? I would focus on removing some unnecessary locks if possible. 

My plan for the next:
I will add a TPC-B benchmark to represent classic OLTP benchmark.


--
Sincerely

Mengxing Liu





Вложения
On 06/20/2017 06:51 AM, Mengxing Liu wrote:
> But in my benchmark, the throughput decrease by 15% after the modification.
> Can you help me do a quick review to find if there is anything wrong?
> I also attached the flame graph before/after the modification for reference.

Hmm. The hash table ought to speed up the RWConflictExists() function 
right? Where in the flame graph is RWConflictExists()? If it only 
accounts for a small amount of the overall runtime, even drastic speedup 
there won't make much difference to the total.

> Here is my questions:
> 1. Is there any function in HTAB like “clear” that can make itself empty quickly?
> In this patch, when releasing a transaction object, I need to scan the hash table and
> use “hash_search” to remove entries one by one. It would make releasing operation slower.

Nope, there is no such function, I'm afraid.

> In a previous email, I reported that many backends wait for the lock “SerializableFinishedListLock”;
> If we don't implement functions like “ReleaseOneSerializableXact” quickly, they will be the bottleneck.

Yeah, that's probably what's causing the decrease in throughput you're 
seeing.

You might need to also keep the linked lists, and use the hash table to 
only look up particular items in the linked list faster.

I was surprised to see that you're creating a lot of smallish hash 
tables, three for each PredXact entry. I would've expected all the 
PredXact entries to share the same hash tables, i.e. have only three 
hash tables in total. That would be more flexible in allocating 
resources among entries. (It won't help with the quick-release, though)

> 2. Is the HTAB thread-safe? I would focus on removing some unnecessary locks if possible.

Nope, you need to hold a lock while searching/manipulating a HTAB hash 
table. If the hash table gets big and you start to see lock contention, 
you can partition it so that each operation only needs to lock the one 
partition covering the search key.

- Heikki




> From: "Heikki Linnakangas" <hlinnaka@iki.fi>
>
> Hmm. The hash table ought to speed up the RWConflictExists() function
> right? Where in the flame graph is RWConflictExists()? If it only
> accounts for a small amount of the overall runtime, even drastic speedup
> there won't make much difference to the total.
>

Yes. It is much smaller than we have expected. I did a small test  for HTAB and linked list:
when the set size is smaller than 10, linked list is as quick as HTAB. Here is the result.
(x microseconds running 10 million searching)

setSize: 5, LIST: 423569, HTAB: 523681
setSize: 10, LIST: 540579, HTAB: 430111
setSize: 15, LIST: 752879, HTAB: 429519
setSize: 20, LIST: 940792, HTAB: 431388
setSize: 25, LIST: 1163760, HTAB: 446043
setSize: 30, LIST: 1352990, HTAB: 429057
setSize: 35, LIST: 1524097, HTAB: 431547
setSize: 40, LIST: 1714976, HTAB: 429023

As we can see, the hash table can only benefit in a very extreme situation.
However, even if there are 100 concurrent connections, the average length of conflict
list is about 10. linked list is not the bottleneck.

>
> Nope, there is no such function, I'm afraid.
>

Oh, that's really a bad news.

> > In a previous email, I reported that many backends wait for the lock “SerializableFinishedListLock”;
> > If we don't implement functions like “ReleaseOneSerializableXact” quickly, they will be the bottleneck.
>
> Yeah, that's probably what's causing the decrease in throughput you're
> seeing.
>

An new evidence: I use "SELECT wait_event_type, wait_event FROM pg_stat_activity;" and sum by event_type to analyze the
waitevent. 

The result of original version:   SerializableXactHashLock 27   predicate_lock_manager 512   WALWriteLock 3
SerializableFinishedListLock402 

The result of new version:   SerializableXactHashLock 38   predicate_lock_manager 473   WALWriteLock 3
SerializableFinishedListLock1068 

Obviously, more backends are blocked by SerializableFinishedListLock.

>
> You might need to also keep the linked lists, and use the hash table to
> only look up particular items in the linked list faster.
>
> I was surprised to see that you're creating a lot of smallish hash
> tables, three for each PredXact entry. I would've expected all the
> PredXact entries to share the same hash tables, i.e. have only three
> hash tables in total. That would be more flexible in allocating
> resources among entries. (It won't help with the quick-release, though)
>

Yes, it would looks more elegant and require less memory resources.
( because hash table objects also consume memory )
But just for performance, it would be less efficient than my patch.
Because it has to maintain linked lists, besides hash tables.  In other words,
it does more works than my patch.

Another point is that removing linked list may have more opportunities to reduce
lock contentions. Otherwise, we need a global lock to protect the linked list.

--
Sincerely


Mengxing Liu