Re: Slow standby snapshot

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: Slow standby snapshot
Дата
Msg-id 20210803173531.2ww6v35255wppsni@alap3.anarazel.de
обсуждение исходный текст
Ответ на Re: Slow standby snapshot  (Andrey Borodin <x4mmm@yandex-team.ru>)
Ответы Re: Slow standby snapshot
Re: Slow standby snapshot
Список pgsql-hackers
Hi,

On 2021-08-03 10:33:50 +0500, Andrey Borodin wrote:
> > 3 авг. 2021 г., в 03:01, Andres Freund <andres@anarazel.de> написал(а):
> > On 2021-08-03 00:07:23 +0300, Michail Nikolaev wrote:
> >> The main idea is simple optimistic optimization - store offset to next
> >> valid entry. So, in most cases, we could just skip all the gaps.
> >> Of course, it adds some additional impact for workloads without long
> >> (few seconds) transactions but it is almost not detectable (because of
> >> CPU caches).
> > 
> > I'm doubtful that that's really the right direction. For workloads that
> > are replay heavy we already often can see the cost of maintaining the
> > known xids datastructures show up significantly - not surprising, given
> > the datastructure. And for standby workloads with active primaries the
> > cost of searching through the array in all backends is noticeable as
> > well.  I think this needs a bigger data structure redesign.
> 
> KnownAssignedXids implements simple membership test idea. What kind of
> redesign would you suggest? Proposed optimisation makes it close to optimal,
> but needs eventual compression.

Binary searches are very ineffecient on modern CPUs (unpredictable memory
accesses, unpredictable branches). We constantly need to do binary searches
during replay to remove xids from the array. I don't see how you can address
that with just the current datastructure.


> Maybe use a hashtable of running transactions? It will be slightly faster
> when adding\removing single transactions. But much worse when doing
> KnownAssignedXidsRemove().

Why would it be worse for KnownAssignedXidsRemove()? Were you intending to
write KnownAssignedXidsGet[AndSetXmin]()?


> Maybe use a tree? (AVL\RB or something like that) It will be slightly better, because it does not need eventual
compressionlike exiting array.
 

I'm not entirely sure what datastructure would work best. I can see something
like a radix tree work well, or a skiplist style approach. Or a hashtable:

I'm not sure that we need to care as much about the cost of
KnownAssignedXidsGetAndSetXmin() - for one, the caching we now have makes that
less frequent. But more importantly, it'd not be hard to maintain an
occasionally (or opportunistically) maintained denser version for
GetSnapshotData() - there's only a single writer, making the concurrency
issues a lot simpler.


Greetings,

Andres Freund



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: Parallel scan with SubTransGetTopmostTransaction assert coredump
Следующее
От: Andres Freund
Дата:
Сообщение: Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)