Re: Improving connection scalability: GetSnapshotData()

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: Improving connection scalability: GetSnapshotData()
Дата
Msg-id 20200302232421.2mlvu2nqoisqidpy@alap3.anarazel.de
обсуждение исходный текст
Ответ на Improving connection scalability: GetSnapshotData()  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers
Hi,

On 2020-03-01 00:36:01 -0800, Andres Freund wrote:
> Here are some numbers for the submitted patch series. I'd to cull some
> further improvements to make it more manageable, but I think the numbers
> still are quite convincing.
> 
> The workload is a pgbench readonly, with pgbench -M prepared -c $conns
> -j $conns -S -n for each client count.  This is on a machine with 2
> Intel(R) Xeon(R) Platinum 8168, but virtualized.
> 
> conns   tps master        tps pgxact-split
> 
> 1       26842.492845            26524.194821
> 10      246923.158682           249224.782661
> 50      695956.539704           709833.746374
> 100     1054727.043139          1903616.306028
> 200     964795.282957           1949200.338012
> 300     906029.377539           1927881.231478
> 400     845696.690912           1911065.369776
> 500     812295.222497           1926237.255856
> 600     888030.104213           1903047.236273
> 700     866896.532490           1886537.202142
> 800     863407.341506           1883768.592610
> 900     871386.608563           1874638.012128
> 1000    887668.277133           1876402.391502
> 1500    860051.361395           1815103.564241
> 2000    890900.098657           1775435.271018
> 3000    874184.980039           1653953.817997
> 4000    845023.080703           1582582.316043
> 5000    817100.195728           1512260.802371
> 
> I think these are pretty nice results.

> One further cool recognition of the fact that GetSnapshotData()'s
> results can be made to only depend on the set of xids in progress, is
> that caching the results of GetSnapshotData() is almost trivial at that
> point: We only need to recompute snapshots when a toplevel transaction
> commits/aborts.
> 
> So we can avoid rebuilding snapshots when no commt has happened since it
> was last built. Which amounts to assigning a current 'commit sequence
> number' to the snapshot, and checking that against the current number
> at the time of the next GetSnapshotData() call. Well, turns out there's
> this "LSN" thing we assign to commits (there are some small issues with
> that though).  I've experimented with that, and it considerably further
> improves the numbers above. Both with a higher peak throughput, but more
> importantly it almost entirely removes the throughput regression from
> 2000 connections onwards.
> 
> I'm still working on cleaning that part of the patch up, I'll post it in
> a bit.

I triggered a longer run on the same hardware, that also includes
numbers for the caching patch.

nclients    master    pgxact-split    pgxact-split-cache
1    29742.805074    29086.874404    28120.709885
2    58653.005921    56610.432919    57343.937924
3    116580.383993    115102.94057    117512.656103
4    150821.023662    154130.354635    152053.714824
5    186679.754357    189585.156519    191095.841847
6    219013.756252    223053.409306    224480.026711
7    256861.673892    256709.57311    262427.179555
8    291495.547691    294311.524297    296245.219028
9    332835.641015    333223.666809    335460.280487
10    367883.74842    373562.206447    375682.894433
15    561008.204553    578601.577916    587542.061911
20    748000.911053    794048.140682    810964.700467
25    904581.660543    1037279.089703    1043615.577083
30    999231.007768    1251113.123461    1288276.726489
35    1001274.289847    1438640.653822    1438508.432425
40    991672.445199    1518100.079695    1573310.171868
45    994427.395069    1575758.31948    1649264.339117
50    1017561.371878    1654776.716703    1715762.303282
60    993943.210188    1720318.989894    1789698.632656
70    971379.995255    1729836.303817    1819477.25356
80    966276.137538    1744019.347399    1842248.57152
90    901175.211649    1768907.069263    1847823.970726
100    803175.74326    1784636.397822    1865795.782943
125    664438.039582    1806275.514545    1870983.64688
150    623562.201749    1796229.009658    1876529.428419
175    680683.150597    1809321.487338    1910694.40987
200    668413.988251    1833457.942035    1878391.674828
225    682786.299485    1816577.462613    1884587.77743
250    727308.562076    1825796.324814    1864692.025853
275    676295.999761    1843098.107926    1908698.584573
300    698831.398432    1832068.168744    1892735.290045
400    661534.639489    1859641.983234    1898606.247281
500    645149.788352    1851124.475202    1888589.134422
600    740636.323211    1875152.669115    1880653.747185
700    858645.363292    1833527.505826    1874627.969414
800    858287.957814    1841914.668668    1892106.319085
900    882204.933544    1850998.221969    1868260.041595
1000    910988.551206    1836336.091652    1862945.18557
1500    917727.92827    1808822.338465    1864150.00307
2000    982137.053108    1813070.209217    1877104.342864
3000    1013514.639108    1753026.733843    1870416.924248
4000    1025476.80688    1600598.543635    1859908.314496
5000    1019889.160511    1534501.389169    1870132.571895
7500    968558.864242    1352137.828569    1853825.376742
10000    887558.112017    1198321.352461    1867384.381886
15000    687766.593628    950788.434914    1710509.977169

The odd dip for master between 90 and 700 connections looks like it's
not directly related to GetSnapshotData(). It looks like it's related to
the linux scheduler and virtiualization. When a pgbench thread and
postgres backend need to swap who gets executed, and both are on
different CPUs, the wakeup is more expensive when the target CPU is idle
or isn't going to reschedule soon. In the expensive path a
inter-process-interrupt (IPI) gets triggered, which requires to exit out
of the VM (which is really expensive on azure, apparently).  I can
trigger similar behaviour for the other runs by renicing, albeit on a
slightly smaller scale.

I'll try to find a larger system that's not virtualized :/.

Greetings,

Andres Freund

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Nikita Glukhov
Дата:
Сообщение: Re: SQL/JSON: functions
Следующее
От: Cary Huang
Дата:
Сообщение: Re: Internal key management system