Re: Speed up Clog Access by increasing CLOG buffers
От | Amit Kapila |
---|---|
Тема | Re: Speed up Clog Access by increasing CLOG buffers |
Дата | |
Msg-id | CAA4eK1KoGTUTWH=X3yqWAEqfHt0mKrBCMynY_sEoE4fEzPAfgg@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Speed up Clog Access by increasing CLOG buffers (Amit Kapila <amit.kapila16@gmail.com>) |
Список | pgsql-hackers |
On Thu, Mar 24, 2016 at 8:08 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Mar 24, 2016 at 5:40 AM, Andres Freund <andres@anarazel.de> wrote:
> >
> > Have you, in your evaluation of the performance of this patch, done
> > profiles over time? I.e. whether the performance benefits are the
> > immediately, or only after a significant amount of test time? Comparing
> > TPS over time, for both patched/unpatched looks relevant.
> >
>
> I have mainly done it with half-hour read-write tests. What do you want to observe via smaller tests, sometimes it gives inconsistent data for read-write tests?
>
I have done some tests on both intel and power m/c (configuration of which are mentioned at end-of-mail) to see the results at different time-intervals and it is always showing greater than 50% improvement in power m/c at 128 client-count and greater than 29% improvement in Intel m/c at 88 client-count.
>
> On Thu, Mar 24, 2016 at 5:40 AM, Andres Freund <andres@anarazel.de> wrote:
> >
> > Have you, in your evaluation of the performance of this patch, done
> > profiles over time? I.e. whether the performance benefits are the
> > immediately, or only after a significant amount of test time? Comparing
> > TPS over time, for both patched/unpatched looks relevant.
> >
>
> I have mainly done it with half-hour read-write tests. What do you want to observe via smaller tests, sometimes it gives inconsistent data for read-write tests?
>
I have done some tests on both intel and power m/c (configuration of which are mentioned at end-of-mail) to see the results at different time-intervals and it is always showing greater than 50% improvement in power m/c at 128 client-count and greater than 29% improvement in Intel m/c at 88 client-count.
Non-default parameters
------------------------------------
max_connections = 300
shared_buffers=8GB
min_wal_size=10GB
max_wal_size=15GB
checkpoint_timeout =35min
maintenance_work_mem = 1GB
checkpoint_completion_target = 0.9
wal_buffers = 256MB
pgbench setup
------------------------
scale factor - 300
used *unlogged* tables : pgbench -i --unlogged-tables -s 300 ..
pgbench -M prepared tpc-b
Results on Intel m/c
--------------------------------
client-count - 88
Time (minutes) | Base | Patch | % |
5 | 39978 | 51858 | 29.71 |
10 | 38169 | 52195 | 36.74 |
20 | 36992 | 52173 | 41.03 |
30 | 37042 | 52149 | 40.78 |
Results on power m/c
-----------------------------------
Client-count - 128
Time (minutes) | Base | Patch | % |
5 | 42479 | 65655 | 54.55 |
10 | 41876 | 66050 | 57.72 |
20 | 38099 | 65200 | 71.13 |
30 | 37838 | 61908 | 63.61 |
>
> >
> > Even after changing to scale 500, the performance benefits on this,
> > older 2 socket, machine were minor; even though contention on the
> > ClogControlLock was the second most severe (after ProcArrayLock).
> >
>
> I have tried this patch on mainly 8 socket machine with 300 & 1000 scale factor. I am hoping that you have tried this test on unlogged tables and by the way at what client count, you have seen these results.
>
> >
> > Even after changing to scale 500, the performance benefits on this,
> > older 2 socket, machine were minor; even though contention on the
> > ClogControlLock was the second most severe (after ProcArrayLock).
> >
>
> I have tried this patch on mainly 8 socket machine with 300 & 1000 scale factor. I am hoping that you have tried this test on unlogged tables and by the way at what client count, you have seen these results.
>
Do you think in your tests, we don't see increase in performance in your tests because of m/c difference (sockets/cpu cores) or client-count?
Intel m/c config (lscpu)
-------------------------------------
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 128
On-line CPU(s) list: 0-127
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 8
NUMA node(s): 8
Vendor ID: GenuineIntel
CPU family: 6
Model: 47
Model name: Intel(R) Xeon(R) CPU E7- 8830 @ 2.13GHz
Stepping: 2
CPU MHz: 1064.000
BogoMIPS: 4266.62
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 24576K
NUMA node0 CPU(s): 0,65-71,96-103
NUMA node1 CPU(s): 72-79,104-111
NUMA node2 CPU(s): 80-87,112-119
NUMA node3 CPU(s): 88-95,120-127
NUMA node4 CPU(s): 1-8,33-40
NUMA node5 CPU(s): 9-16,41-48
NUMA node6 CPU(s): 17-24,49-56
NUMA node7 CPU(s): 25-32,57-64
Power m/c config (lscpu)
-------------------------------------
Architecture: ppc64le
Byte Order: Little Endian
CPU(s): 192
On-line CPU(s) list: 0-191
Thread(s) per core: 8
Core(s) per socket: 1
Socket(s): 24
NUMA node(s): 4
Model: IBM,8286-42A
L1d cache: 64K
L1i cache: 32K
L2 cache: 512K
L3 cache: 8192K
NUMA node0 CPU(s): 0-47
NUMA node1 CPU(s): 48-95
NUMA node2 CPU(s): 96-143
NUMA node3 CPU(s): 144-191
В списке pgsql-hackers по дате отправления:
Следующее
От: Masahiko SawadaДата:
Сообщение: Re: Support for N synchronous standby servers - take 2