Re: Slow standby snapshot

Поиск
Список
Период
Сортировка
От Andrey Borodin
Тема Re: Slow standby snapshot
Дата
Msg-id 5FDD25C2-7806-490A-8D28-483F1A70D8D5@yandex-team.ru
обсуждение исходный текст
Ответ на Re: Slow standby snapshot  (Andres Freund <andres@anarazel.de>)
Ответы Re: Slow standby snapshot  (Michail Nikolaev <michail.nikolaev@gmail.com>)
Список pgsql-hackers
Sorry for so late reply. I've been thinking about possible approaches.
KnownAssignedXids over hashtable in fact was implemented long before and rejected [0].

> 3 авг. 2021 г., в 22:35, Andres Freund <andres@anarazel.de> написал(а):
>
> On 2021-08-03 10:33:50 +0500, Andrey Borodin wrote:
>>> 3 авг. 2021 г., в 03:01, Andres Freund <andres@anarazel.de> написал(а):
>>> On 2021-08-03 00:07:23 +0300, Michail Nikolaev wrote:
>>>> The main idea is simple optimistic optimization - store offset to next
>>>> valid entry. So, in most cases, we could just skip all the gaps.
>>>> Of course, it adds some additional impact for workloads without long
>>>> (few seconds) transactions but it is almost not detectable (because of
>>>> CPU caches).
>>>
>>> I'm doubtful that that's really the right direction. For workloads that
>>> are replay heavy we already often can see the cost of maintaining the
>>> known xids datastructures show up significantly - not surprising, given
>>> the datastructure. And for standby workloads with active primaries the
>>> cost of searching through the array in all backends is noticeable as
>>> well.  I think this needs a bigger data structure redesign.
>>
>> KnownAssignedXids implements simple membership test idea. What kind of
>> redesign would you suggest? Proposed optimisation makes it close to optimal,
>> but needs eventual compression.
>
> Binary searches are very ineffecient on modern CPUs (unpredictable memory
> accesses, unpredictable branches). We constantly need to do binary searches
> during replay to remove xids from the array. I don't see how you can address
> that with just the current datastructure.
Current patch addresses another problem. In presence of old enough transaction enumeration of KnownAssignedXids with
sharedlock prevents adding new transactions with exclusive lock. And recovery effectively pauses. 

Binary searches can consume 10-15 cache misses, which is unreasonable amount of memory waits. But that's somewhat
differentproblem. 
Also binsearch is not that expensive when we compress KnownAssignedXids often.

>> Maybe use a hashtable of running transactions? It will be slightly faster
>> when adding\removing single transactions. But much worse when doing
>> KnownAssignedXidsRemove().
>
> Why would it be worse for KnownAssignedXidsRemove()? Were you intending to
> write KnownAssignedXidsGet[AndSetXmin]()?
I was thinking about inefficient KnownAssignedXidsRemovePreceding() in hashtable. But, probably, this is not so
frequentoperation. 

>> Maybe use a tree? (AVL\RB or something like that) It will be slightly better, because it does not need eventual
compressionlike exiting array. 
>
> I'm not entirely sure what datastructure would work best. I can see something
> like a radix tree work well, or a skiplist style approach. Or a hashtable:
>
> I'm not sure that we need to care as much about the cost of
> KnownAssignedXidsGetAndSetXmin() - for one, the caching we now have makes that
> less frequent. But more importantly, it'd not be hard to maintain an
> occasionally (or opportunistically) maintained denser version for
> GetSnapshotData() - there's only a single writer, making the concurrency
> issues a lot simpler.

I've been prototyping Radix tree for a while.
Here every 4 xids are summarized my minimum Xid and number of underlying Xids. Of cause 4 is arbitrary number,
summarizationarea must be of cacheline size. 
┌───────┐
│ 1 / 9 │
├───────┴────┐
│            └────┐
│                 └────┐
│                      └────┐
▼                           └───▶
┌───────────────────────────────┐
│ 1 / 3 | 5 / 0 | 9 / 3 | D / 3 │
├───────┬───────┬────────┬──────┴────┐
│       └─┐     └───┐    └────┐      └─────┐
│         └─┐       └──┐      └────┐       └─────┐
│           └─┐        └──┐        └────┐        └────┐
▼             └─┐         └──┐          └───┐         └────┐
┌───────────────▼────────────┴─▶────────────┴──▶───────────┴───▶
│ 1 | 2 |   | 4 |   |   |   |   | 9 |   | B | C | D | E | F |  │
└──────────────────────────────────────────────────────────────┘
Bottom layer is current array (TransactionId *KnownAssignedXids).
When we remove Xid we need theoretical minimum of cachelines touched. I'd say 5-7 instead of 10-15 of binsearch (in
caseof millions of entries in KnownAssignedXids). 
Enumerating running Xids is not that difficult too: we will need to scan O(xip) memory, not whole KnownAssignedXids
array.

But the overall complexity of this approach does not seem good to me.

All in all, I think using proposed "KnownAssignedXidsNext" patch solves real problem and the problem of binary searches
shouldbe addressed by compressing KnownAssignedXids more often. 

Currently we do not compress array
        if (nelements < 4 * PROCARRAY_MAXPROCS ||  // It's not that big yet OR
            nelements < 2 * pArray->numKnownAssignedXids) // It's contains less than a half of a bloat
            return;
From my POV arbitrary number 4 is just too high.

Summary: I think ("KnownAssignedXidsNext" patch + compressing KnownAssignedXids more often) is better than major
KnownAssignedXidsredesign. 


Best regards, Andrey Borodin.

[0] https://github.com/postgres/postgres/commit/2871b4618af1acc85665eec0912c48f8341504c4


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Etsuro Fujita
Дата:
Сообщение: Re: Commitfest 2021-11 Patch Triage - Part 1
Следующее
От: Dinesh Chemuduru
Дата:
Сообщение: Re: [PROPOSAL] new diagnostic items for the dynamic sql