Обсуждение: Re: [RFC] Lock-free XLog Reservation from WAL
On Sun, 19 Jan 2025 at 17:56, Yura Sokolov <y.sokolov@postgrespro.ru> wrote: > 17.01.2025 17:00, Zhou, Zhiguo пишет: >> On 1/16/2025 10:00 PM, Yura Sokolov wrote: >>> >>> Good day, Zhiguo. >>> >>> Excuse me, I feel sneaky a bit, but I've started another thread >>> just about increase of NUM_XLOGINSERT_LOCK, because I can measure >>> its effect even on my working notebook (it is another one: Ryzen >>> 5825U limited to @2GHz). >>> >>> http://postgr.es/m/flat/3b11fdc2-9793-403d- >>> b3d4-67ff9a00d447%40postgrespro.ru >>> >>> ----- >>> >>> regards >>> Yura Sokolov aka funny-falcon >>> >>> >> Good day, Yura! >> Thank you for keeping me informed. I appreciate your proactive >> approach and understand the importance of exploring different angles >> for optimization. Your patch is indeed fundamental to our ongoing >> work on the lock-free xlog reservation, and I'm eager to see how it >> can further enhance our efforts. >> I will proceed to test the performance impact of your latest patch >> when combined with the lock-free xlog reservation patch. This will >> help us determine if there's potential for additional >> optimization. Concurrently, with your permission, I'll try to refine >> the hash-table- based implementation for your further review. WDYT? >> > > Good day, Zhiguo > > Here's version of "hash-table reservation" with both 32bit and 64bit > operations (depending on PG_HAVE_ATOMIC_U64_SIMULATION, or may be > switched by hand). > > 64bit version uses other protocol with a bit lesser atomic > operations. I suppose it could be a bit faster. But I can't prove it > now. > > btw, you wrote: > >>> Major issue: >>> - `SetPrevRecPtr` and `GetPrevRecPtr` do non-atomic write/read > with on >>> platforms where MAXALIGN != 8 or without native 64 > load/store. Branch >>> with 'memcpy` is rather obvious, but even pointer de-referencing on >>> "lucky case" is not safe either. >>> >>> I have no idea how to fix it at the moment. >>> >> >> Indeed, non-atomic write/read operations can lead to safety issues in >> some situations. My initial thought is to define a bit near the >> prev-link to flag the completion of the update. In this way, we could >> allow non-atomic or even discontinuous write/read operations on the >> prev-link, while simultaneously guaranteeing its atomicity through >> atomic operations (as well as memory barriers) on the flag bit. What >> do you think of this as a viable solution? > > There is a way to order operations: > - since SetPrevRecPtr stores start of record as LSN, its lower 32bits > are certainly non-zero (record could not start at the beginning of a > page). > - so SetPrevRecPtr should write high 32bits, issue write barrier, and > then write lower 32bits, > - and then GetPrevRecPtr should first read lower 32bits, and if it is > not zero, then issue read barrier and read upper 32bits. > > This way you will always read correct prev-rec-ptr on platform without > 64bit atomics. (because MAXALING >= 4 and PostgreSQL requires 4 byte > atomicity for several years). > Hi, Yura Sokolov Thanks for updating the patch. I test the v2 patch using BenchmarkSQL 1000 warehouse, and here is the tpmC result: case | min | avg | max --------------------+------------+------------+-------------- master (patched) | 988,461.89 | 994,916.50 | 1,000,362.40 master (44b61efb79) | 857,028.07 | 863,174.59 | 873,856.92 The patch provides a significant improvement. I just looked through the patch, here are some comments. 1. The v2 patch can't be applied cleanly. Applying: Lock-free XLog Reservation using lock-free hash-table .git/rebase-apply/patch:33: trailing whitespace. .git/rebase-apply/patch:37: space before tab in indent. { .git/rebase-apply/patch:38: space before tab in indent. int i; .git/rebase-apply/patch:39: trailing whitespace. .git/rebase-apply/patch:46: space before tab in indent. LWLockReleaseClearVar(&WALInsertLocks[i].l.lock, warning: squelched 4 whitespace errors warning: 9 lines add whitespace errors. 2. And there is a typo: + * PrevLinksHash is a lock-free hash table based on Cuckoo algorith. It is + * mostly 4 way: for every element computed two positions h1, h2, and s/algorith/algorithm/g -- Regrads, Japin Li
22.01.2025 09:09, Japin Li пишет: > On Sun, 19 Jan 2025 at 17:56, Yura Sokolov <y.sokolov@postgrespro.ru> wrote: >> 17.01.2025 17:00, Zhou, Zhiguo пишет: >>> On 1/16/2025 10:00 PM, Yura Sokolov wrote: >>>> >>>> Good day, Zhiguo. >>>> >>>> Excuse me, I feel sneaky a bit, but I've started another thread >>>> just about increase of NUM_XLOGINSERT_LOCK, because I can measure >>>> its effect even on my working notebook (it is another one: Ryzen >>>> 5825U limited to @2GHz). >>>> >>>> http://postgr.es/m/flat/3b11fdc2-9793-403d- >>>> b3d4-67ff9a00d447%40postgrespro.ru >>>> >>>> ----- >>>> >>>> regards >>>> Yura Sokolov aka funny-falcon >>>> >>>> >>> Good day, Yura! >>> Thank you for keeping me informed. I appreciate your proactive >>> approach and understand the importance of exploring different angles >>> for optimization. Your patch is indeed fundamental to our ongoing >>> work on the lock-free xlog reservation, and I'm eager to see how it >>> can further enhance our efforts. >>> I will proceed to test the performance impact of your latest patch >>> when combined with the lock-free xlog reservation patch. This will >>> help us determine if there's potential for additional >>> optimization. Concurrently, with your permission, I'll try to refine >>> the hash-table- based implementation for your further review. WDYT? >>> >> >> Good day, Zhiguo >> >> Here's version of "hash-table reservation" with both 32bit and 64bit >> operations (depending on PG_HAVE_ATOMIC_U64_SIMULATION, or may be >> switched by hand). >> >> 64bit version uses other protocol with a bit lesser atomic >> operations. I suppose it could be a bit faster. But I can't prove it >> now. >> >> btw, you wrote: >> >>>> Major issue: >>>> - `SetPrevRecPtr` and `GetPrevRecPtr` do non-atomic write/read >> with on >>>> platforms where MAXALIGN != 8 or without native 64 >> load/store. Branch >>>> with 'memcpy` is rather obvious, but even pointer de-referencing on >>>> "lucky case" is not safe either. >>>> >>>> I have no idea how to fix it at the moment. >>>> >>> >>> Indeed, non-atomic write/read operations can lead to safety issues in >>> some situations. My initial thought is to define a bit near the >>> prev-link to flag the completion of the update. In this way, we could >>> allow non-atomic or even discontinuous write/read operations on the >>> prev-link, while simultaneously guaranteeing its atomicity through >>> atomic operations (as well as memory barriers) on the flag bit. What >>> do you think of this as a viable solution? >> >> There is a way to order operations: >> - since SetPrevRecPtr stores start of record as LSN, its lower 32bits >> are certainly non-zero (record could not start at the beginning of a >> page). >> - so SetPrevRecPtr should write high 32bits, issue write barrier, and >> then write lower 32bits, >> - and then GetPrevRecPtr should first read lower 32bits, and if it is >> not zero, then issue read barrier and read upper 32bits. >> >> This way you will always read correct prev-rec-ptr on platform without >> 64bit atomics. (because MAXALING >= 4 and PostgreSQL requires 4 byte >> atomicity for several years). >> > > Hi, Yura Sokolov > > Thanks for updating the patch. > I test the v2 patch using BenchmarkSQL 1000 warehouse, and here is the tpmC > result: > > case | min | avg | max > --------------------+------------+------------+-------------- > master (patched) | 988,461.89 | 994,916.50 | 1,000,362.40 > master (44b61efb79) | 857,028.07 | 863,174.59 | 873,856.92 > > The patch provides a significant improvement. > > I just looked through the patch, here are some comments. > > 1. > The v2 patch can't be applied cleanly. > > Applying: Lock-free XLog Reservation using lock-free hash-table > .git/rebase-apply/patch:33: trailing whitespace. > > .git/rebase-apply/patch:37: space before tab in indent. > { > .git/rebase-apply/patch:38: space before tab in indent. > int i; > .git/rebase-apply/patch:39: trailing whitespace. > > .git/rebase-apply/patch:46: space before tab in indent. > LWLockReleaseClearVar(&WALInsertLocks[i].l.lock, > warning: squelched 4 whitespace errors > warning: 9 lines add whitespace errors. > > 2. > And there is a typo: > > + * PrevLinksHash is a lock-free hash table based on Cuckoo algorith. It is > + * mostly 4 way: for every element computed two positions h1, h2, and > > s/algorith/algorithm/g Hi, Japin Thank you a lot for measuring and comments. May I ask you to compare not only against master, but against straight increase of NUM_XLOGINSERT_LOCKS to 128 as well? This way the profit from added complexity will be more obvious: does it pay for self or not. ------- regards Yura
On Wed, 22 Jan 2025 at 10:25, Yura Sokolov <y.sokolov@postgrespro.ru> wrote:
> 22.01.2025 09:09, Japin Li пишет:
>> Hi, Yura Sokolov
>> Thanks for updating the patch.
>> I test the v2 patch using BenchmarkSQL 1000 warehouse, and here is the tpmC
>> result:
>> case | min | avg | max
>> --------------------+------------+------------+--------------
>> master (patched) | 988,461.89 | 994,916.50 | 1,000,362.40
>> master (44b61efb79) | 857,028.07 | 863,174.59 | 873,856.92
>> The patch provides a significant improvement.
>> I just looked through the patch, here are some comments.
>> 1.
>> The v2 patch can't be applied cleanly.
>> Applying: Lock-free XLog Reservation using lock-free hash-table
>> .git/rebase-apply/patch:33: trailing whitespace.
>> .git/rebase-apply/patch:37: space before tab in indent.
>> {
>> .git/rebase-apply/patch:38: space before tab in indent.
>> int i;
>> .git/rebase-apply/patch:39: trailing whitespace.
>> .git/rebase-apply/patch:46: space before tab in indent.
>> LWLockReleaseClearVar(&WALInsertLocks[i].l.lock,
>> warning: squelched 4 whitespace errors
>> warning: 9 lines add whitespace errors.
>> 2.
>> And there is a typo:
>> + * PrevLinksHash is a lock-free hash table based on Cuckoo
>> algorith. It is
>> + * mostly 4 way: for every element computed two positions h1, h2, and
>> s/algorith/algorithm/g
>
> Hi, Japin
>
> Thank you a lot for measuring and comments.
>
> May I ask you to compare not only against master, but against straight
> increase of NUM_XLOGINSERT_LOCKS to 128 as well?
> This way the profit from added complexity will be more obvious: does
> it pay for self or not.
The above test already increases NUM_XLOGINSERT_LOCKS to 64; I will try 128
and update the result later.
--
Regrads,
Japin Li
22.01.2025 10:54, Japin Li пишет:
> On Wed, 22 Jan 2025 at 10:25, Yura Sokolov <y.sokolov@postgrespro.ru> wrote:
>> 22.01.2025 09:09, Japin Li пишет:
>>> Hi, Yura Sokolov
>>> Thanks for updating the patch.
>>> I test the v2 patch using BenchmarkSQL 1000 warehouse, and here is the tpmC
>>> result:
>>> case | min | avg | max
>>> --------------------+------------+------------+--------------
>>> master (patched) | 988,461.89 | 994,916.50 | 1,000,362.40
>>> master (44b61efb79) | 857,028.07 | 863,174.59 | 873,856.92
>>> The patch provides a significant improvement.
>>> I just looked through the patch, here are some comments.
>>> 1.
>>> The v2 patch can't be applied cleanly.
>>> Applying: Lock-free XLog Reservation using lock-free hash-table
>>> .git/rebase-apply/patch:33: trailing whitespace.
>>> .git/rebase-apply/patch:37: space before tab in indent.
>>> {
>>> .git/rebase-apply/patch:38: space before tab in indent.
>>> int i;
>>> .git/rebase-apply/patch:39: trailing whitespace.
>>> .git/rebase-apply/patch:46: space before tab in indent.
>>> LWLockReleaseClearVar(&WALInsertLocks[i].l.lock,
>>> warning: squelched 4 whitespace errors
>>> warning: 9 lines add whitespace errors.
>>> 2.
>>> And there is a typo:
>>> + * PrevLinksHash is a lock-free hash table based on Cuckoo
>>> algorith. It is
>>> + * mostly 4 way: for every element computed two positions h1, h2, and
>>> s/algorith/algorithm/g
>>
>> Hi, Japin
>>
>> Thank you a lot for measuring and comments.
>>
>> May I ask you to compare not only against master, but against straight
>> increase of NUM_XLOGINSERT_LOCKS to 128 as well?
>> This way the profit from added complexity will be more obvious: does
>> it pay for self or not.
>
> The above test already increases NUM_XLOGINSERT_LOCKS to 64;
Ok, that is good.
Did you just increased number of locks, or applied
"several-attempts-to-lock"
from [1] as well? It will be interesting how it affects performance in this
case. And it is orthogonal to "lock-free reservation", so they could
applied simultaneously.
> I will try 128 and update the result later.
Thank you.
[1]
https://www.postgresql.org/message-id/3b11fdc2-9793-403d-b3d4-67ff9a00d447%40postgrespro.ru
------
regards
Yura
On Wed, 22 Jan 2025 at 11:22, Yura Sokolov <y.sokolov@postgrespro.ru> wrote:
> 22.01.2025 10:54, Japin Li пишет:
>> On Wed, 22 Jan 2025 at 10:25, Yura Sokolov <y.sokolov@postgrespro.ru> wrote:
>>> 22.01.2025 09:09, Japin Li пишет:
>>>> Hi, Yura Sokolov
>>>> Thanks for updating the patch.
>>>> I test the v2 patch using BenchmarkSQL 1000 warehouse, and here is the tpmC
>>>> result:
>>>> case | min | avg | max
>>>> --------------------+------------+------------+--------------
>>>> master (patched) | 988,461.89 | 994,916.50 | 1,000,362.40
>>>> master (44b61efb79) | 857,028.07 | 863,174.59 | 873,856.92
>>>> The patch provides a significant improvement.
>>>> I just looked through the patch, here are some comments.
>>>> 1.
>>>> The v2 patch can't be applied cleanly.
>>>> Applying: Lock-free XLog Reservation using lock-free hash-table
>>>> .git/rebase-apply/patch:33: trailing whitespace.
>>>> .git/rebase-apply/patch:37: space before tab in indent.
>>>> {
>>>> .git/rebase-apply/patch:38: space before tab in indent.
>>>> int i;
>>>> .git/rebase-apply/patch:39: trailing whitespace.
>>>> .git/rebase-apply/patch:46: space before tab in indent.
>>>> LWLockReleaseClearVar(&WALInsertLocks[i].l.lock,
>>>> warning: squelched 4 whitespace errors
>>>> warning: 9 lines add whitespace errors.
>>>> 2.
>>>> And there is a typo:
>>>> + * PrevLinksHash is a lock-free hash table based on Cuckoo
>>>> algorith. It is
>>>> + * mostly 4 way: for every element computed two positions h1, h2, and
>>>> s/algorith/algorithm/g
>>>
>>> Hi, Japin
>>>
>>> Thank you a lot for measuring and comments.
>>>
>>> May I ask you to compare not only against master, but against straight
>>> increase of NUM_XLOGINSERT_LOCKS to 128 as well?
>>> This way the profit from added complexity will be more obvious: does
>>> it pay for self or not.
>> The above test already increases NUM_XLOGINSERT_LOCKS to 64;
>
> Ok, that is good.
> Did you just increased number of locks, or applied
> "several-attempts-to-lock"
> from [1] as well? It will be interesting how it affects performance in this
> case. And it is orthogonal to "lock-free reservation", so they could
> applied simultaneously.
I apply the following two patches:
1. Lock-free XLog Reservation using lock-free hash-table
2. Increase NUM_XLOGINSERT_LOCKS to 64
I noticed the patch from the [1]. However, I haven't tested it independently.
--
Regrads,
Japin Li
22.01.2025 10:54, Japin Li wrote:
> On Wed, 22 Jan 2025 at 10:25, Yura Sokolov <y.sokolov@postgrespro.ru> wrote:
>> 22.01.2025 09:09, Japin Li пишет:
>>> Hi, Yura Sokolov
>>> Thanks for updating the patch.
>>> I test the v2 patch using BenchmarkSQL 1000 warehouse, and here is the tpmC
>>> result:
>>> case | min | avg | max
>>> --------------------+------------+------------+--------------
>>> master (patched) | 988,461.89 | 994,916.50 | 1,000,362.40
>>> master (44b61efb79) | 857,028.07 | 863,174.59 | 873,856.92
>>> The patch provides a significant improvement.
>>> I just looked through the patch, here are some comments.
>>> 1.
>>> The v2 patch can't be applied cleanly.
>>> Applying: Lock-free XLog Reservation using lock-free hash-table
>>> .git/rebase-apply/patch:33: trailing whitespace.
>>> .git/rebase-apply/patch:37: space before tab in indent.
>>> {
>>> .git/rebase-apply/patch:38: space before tab in indent.
>>> int i;
>>> .git/rebase-apply/patch:39: trailing whitespace.
>>> .git/rebase-apply/patch:46: space before tab in indent.
>>> LWLockReleaseClearVar(&WALInsertLocks[i].l.lock,
>>> warning: squelched 4 whitespace errors
>>> warning: 9 lines add whitespace errors.
>>> 2.
>>> And there is a typo:
>>> + * PrevLinksHash is a lock-free hash table based on Cuckoo
>>> algorith. It is
>>> + * mostly 4 way: for every element computed two positions h1, h2, and
>>> s/algorith/algorithm/g
>>
>> Hi, Japin
>>
>> Thank you a lot for measuring and comments.
>>
>> May I ask you to compare not only against master, but against straight
>> increase of NUM_XLOGINSERT_LOCKS to 128 as well?
>> This way the profit from added complexity will be more obvious: does
>> it pay for self or not.
>
> The above test already increases NUM_XLOGINSERT_LOCKS to 64; I will try 128
> and update the result later.
Oh, I see: I forgot that I removed increase of NUM_XLOGINSERT_LOCKS from
v2 patch.