Re: Improving connection scalability (src/backend/storage/ipc/procarray.c)

Поиск
Список
Период
Сортировка
От Ranier Vilela
Тема Re: Improving connection scalability (src/backend/storage/ipc/procarray.c)
Дата
Msg-id CAEudQAqnAgrKmcx+zhOgHJ8YULhHMQq2kmZfk3u+axaCdXcVpw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Improving connection scalability (src/backend/storage/ipc/procarray.c)  (Andres Freund <andres@anarazel.de>)
Ответы Re: Improving connection scalability (src/backend/storage/ipc/procarray.c)  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Список pgsql-hackers
Em sex., 27 de mai. de 2022 às 18:22, Andres Freund <andres@anarazel.de> escreveu:
Hi,

On 2022-05-27 10:35:08 -0300, Ranier Vilela wrote:
> Em qui., 26 de mai. de 2022 às 22:30, Tomas Vondra <
> tomas.vondra@enterprisedb.com> escreveu:
>
> > On 5/27/22 02:11, Ranier Vilela wrote:
> > >
> > > ...
> > >
> > > Here the results with -T 60:
> >
> > Might be a good idea to share your analysis / interpretation of the
> > results, not just the raw data. After all, the change is being proposed
> > by you, so do you think this shows the change is beneficial?
> >
> I think so, but the expectation has diminished.
> I expected that the more connections, the better the performance.
> And for both patch and head, this doesn't happen in tests.
> Performance degrades with a greater number of connections.

Your system has four CPUs. Once they're all busy, adding more connections
won't improve performance. It'll just add more and more context switching,
cache misses, and make the OS scheduler do more work.
conns       tps head
10      82365.634750
50      74593.714180
80      69219.756038
90      67419.574189
100     66613.771701
Yes it is quite disappointing that with 100 connections, tps loses to 10 connections.
 



> GetSnapShowData() isn't a bottleneck?

I'd be surprised if it showed up in a profile on your machine with that
workload in any sort of meaningful way. The snapshot reuse logic will always
work - because there are no writes - and thus the only work that needs to be
done is to acquire the ProcArrayLock briefly.  And because there is only a
small number of cores, contention on the cacheline for that isn't a problem.
Thanks for sharing this.
 


> > These results look much saner, but IMHO it also does not show any clear
> > benefit of the patch. Or are you still claiming there is a benefit?
> >
> We agree that they are micro-optimizations.  However, I think they should be
> considered micro-optimizations in inner loops, because all in procarray.c is
> a hotpath.

As explained earlier, I don't agree that they optimize anything - you're
making some of the scalability behaviour *worse*, if it's changed at all.


> The first objective, I believe, was achieved, with no performance
> regression.
> I agree, the gains are small, by the tests done.

There are no gains.
IMHO, I must disagree.
 


> But, IMHO, this is a good way, small gains turn into big gains in the end,
> when applied to all code.
>
> Consider GetSnapShotData()
> 1. Most of the time the snapshot is not null, so:
> if (snaphost == NULL), will fail most of the time.
>
> With the patch:
> if (snapshot->xip != NULL)
> {
>     if (GetSnapshotDataReuse(snapshot))
>      return snapshot;
> }
>
> Most of the time the test is true and GetSnapshotDataReuse is not called
> for new
> snapshots.
> count, subcount and  suboverflowed, will not be initialized, for all
> snapshots.

But that's irrelevant. There's only a few "new" snapshots in the life of a
connection. You're optimizing something irrelevant.
IMHO, when GetSnapShotData() is the bottleneck, all is relevant.
 


> 2. If snapshot is taken during recoverys
> The pgprocnos and ProcGlobal->subxidStates are not touched unnecessarily.

That code isn't reached when in recovery?
Currently it is reached *even* when not in recovery.
With the patch, *only* is reached when in recovery.



> 3. Calling GetSnapshotDataReuse() without first acquiring ProcArrayLock.
> There's an agreement that this would be fine, for now.

There's no such agreement at all. It's not correct.
Ok, but there is a chance it will work correctly.



> Consider ComputeXidHorizons()
> 1. ProcGlobal->statusFlags is touched before the lock.

Hard to believe that'd have a measurable effect.
IMHO, anything you take out of the lock is a benefit.
 


> 2. allStatusFlags[index] is not touched for all numProcs.

I'd be surprised if the compiler couldn't defer that load on its own.
Better be sure of that, no?

regards,
Ranier Vilela

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Ranier Vilela
Дата:
Сообщение: Re: Improving connection scalability (src/backend/storage/ipc/procarray.c)
Следующее
От: Thomas Munro
Дата:
Сообщение: Re: SLRUs in the main buffer pool, redux