Обсуждение: [GENERAL] Questionaire: Common WAL write rates on busy servers.

Поиск
Список
Период
Сортировка

[GENERAL] Questionaire: Common WAL write rates on busy servers.

От
Andres Freund
Дата:
Hi,

I've lately seen more and more installations where the generation of
write-ahead-log (WAL) is one of the primary bottlenecks.  I'm curious
whether that's primarily a "sampling error" of mine, or whether that's
indeed more common.

The primary reason I'm curious is that I'm pondering a few potential
optimizations, and would like to have some guidance which are more and
which are less important.

Questions (answer as many you can comfortably answer):
- How many MB/s, segments/s do you see on busier servers?
- What generates the bulk of WAL on your servers (9.5+ can use
  pg_xlogdump --stats to compute that)?
- Are you seeing WAL writes being a bottleneck?OA
- What kind of backup methods are you using and is the WAL volume a
  problem?
- What kind of replication are you using and is the WAL volume a
  problem?
- What are your settings for wal_compression, max_wal_size (9.5+) /
  checkpoint_segments (< 9.5), checkpoint_timeout and wal_buffers?
- Could you quickly describe your workload?

Feel free to add any information you think is pertinent ;)

Greetings,

Andres Freund


Re: [GENERAL] [PERFORM] Questionaire: Common WAL write rates on busy servers.

От
Vladimir Borodin
Дата:
Hi Andres.

25 апр. 2017 г., в 7:17, Andres Freund <andres@anarazel.de> написал(а):

Hi,

I've lately seen more and more installations where the generation of
write-ahead-log (WAL) is one of the primary bottlenecks.  I'm curious
whether that's primarily a "sampling error" of mine, or whether that's
indeed more common.

The primary reason I'm curious is that I'm pondering a few potential
optimizations, and would like to have some guidance which are more and
which are less important.

Questions (answer as many you can comfortably answer):
- How many MB/s, segments/s do you see on busier servers?

Nearly one WAL (16 MB) per second most of the time and 3 WALs per second in the beginning of checkpoint (due to full_page_writes).

- What generates the bulk of WAL on your servers (9.5+ can use
 pg_xlogdump --stats to compute that)?

Here is the output from a couple of our masters (and that is actually two hours before peak load):

$ pg_xlogdump --stats 0000000100012B2800000089 0000000100012B3000000088 | fgrep -v 0.00

Type                                           N      (%)          Record size      (%)             FPI size      (%)        Combined size      (%)
----                                           -      ---          -----------      ---             --------      ---        -------------      ---
Heap2                                   55820638 ( 21.31)           1730485085 ( 22.27)           1385795249 ( 13.28)           3116280334 ( 17.12)
Heap                                    74366993 ( 28.39)           2288644932 ( 29.46)           5880717650 ( 56.34)           8169362582 ( 44.87)
Btree                                   84655827 ( 32.32)           2243526276 ( 28.88)           3170518879 ( 30.38)           5414045155 ( 29.74)
                                        --------                      --------                      --------                      --------
Total                                  261933790                    7769663301 [42.67%]          10437031778 [57.33%]          18206695079 [100%]
$

$ pg_xlogdump --stats 000000010000D17F000000A5 000000010000D19100000004 | fgrep -v 0.00
Type                                           N      (%)          Record size      (%)             FPI size      (%)        Combined size      (%)
----                                           -      ---          -----------      ---             --------      ---        -------------      ---
Heap2                                   13676881 ( 18.95)            422289539 ( 19.97)          15319927851 ( 25.63)          15742217390 ( 25.44)
Heap                                    22284283 ( 30.88)            715293050 ( 33.83)          17119265188 ( 28.64)          17834558238 ( 28.82)
Btree                                   27640155 ( 38.30)            725674896 ( 34.32)          19244109632 ( 32.19)          19969784528 ( 32.27)
Gin                                      6580760 (  9.12)            172246586 (  8.15)           8091332009 ( 13.54)           8263578595 ( 13.35)
                                        --------                      --------                      --------                      --------
Total                                   72172983                    2114133847 [3.42%]           59774634680 [96.58%]          61888768527 [100%]
$

- Are you seeing WAL writes being a bottleneck?OA

We do sometimes see WALWriteLock in pg_stat_activity.wait_event, but not too often.

- What kind of backup methods are you using and is the WAL volume a
 problem?

We use fork of barman project. In most cases that’s not a problem.

- What kind of replication are you using and is the WAL volume a
 problem?

Physical streaming replication. We used to have problems with network bandwidth (1 Gbit/s was consumed by transferring WAL to two replicas and one archive) but that became better after 1. upgrading to 9.5 and turning wal_compression on, 2. changing archive command to doing parallel compression and sending WALs to archive, 3. increasing checkpoint_timeout.

- What are your settings for wal_compression, max_wal_size (9.5+) /
 checkpoint_segments (< 9.5), checkpoint_timeout and wal_buffers?

xdb301e/postgres M # SELECT name, current_setting(name) FROM pg_settings
WHERE name IN ('max_wal_size', 'checkpoint_timeout', 'wal_compression', 'wal_buffers');
        name        | current_setting
--------------------+-----------------
 checkpoint_timeout | 1h
 max_wal_size       | 128GB
 wal_buffers        | 16MB
 wal_compression    | on
(4 rows)

Time: 0.938 ms
xdb301e/postgres M #

- Could you quickly describe your workload?

OLTP workload with 80% reads and 20% writes.


Feel free to add any information you think is pertinent ;)

Well, we actually workarounded issues with WAL write rate by increasing checkpoint_timeout to maximum possible (in 9.6 it can be even more). The downside of this change is recovery time. Thanks postgres for its stability but sometimes you can waste ~ 10 minutes just to restart postgres for upgrading to new minor version and that’s not really cool.


Greetings,

Andres Freund


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [GENERAL] Questionaire: Common WAL write rates on busy servers.

От
bricklen
Дата:
On Mon, Apr 24, 2017 at 9:17 PM, Andres Freund <andres@anarazel.de> wrote:

Questions (answer as many you can comfortably answer):
- How many MB/s, segments/s do you see on busier servers?
- What generates the bulk of WAL on your servers (9.5+ can use
  pg_xlogdump --stats to compute that)?
- Are you seeing WAL writes being a bottleneck?OA
- What kind of backup methods are you using and is the WAL volume a
  problem?
- What kind of replication are you using and is the WAL volume a
  problem?
- What are your settings for wal_compression, max_wal_size (9.5+) /
  checkpoint_segments (< 9.5), checkpoint_timeout and wal_buffers?
- Could you quickly describe your workload?

* Postgresql 9.3
* 1500+ db servers
* Daily peak for busy databases: 75 WALs switched per second (less than 10% of the servers experience this)
* Avg per db: 2 WALs/s
* Mainly generated by large batch sync processes that occur throughout the day, and by a legacy archiving process to purge older data (potentially many millions of cascading deletes).
*Half the servers have (encrypted) pg_dump backups, WAL volume hasn't proved to be a problem there, though dump size is a problem for a few of the larger databases (less than 1TB).
* Inter-data-centre replication is all streaming, across DC's (over the WAN) WAL shipping is over compressed SSH tunnels.
Occasionally the streaming replication falls behind, but more commonly it is the cross-DC log shipping that becomes a problem. Some of the servers will generate 50+ GBs of WAL in a matter of minutes and that backs up immediately on the masters. Occasionally this has a knock-on effect for other servers and slows down their log shipping due to network saturation.
* checkpoint_segments: 64, checkpoint_timeout: 5 mins, wal_buffers: 16MB

Workload:
70% of servers are generally quiet, with occasional bursty reads and writes.
20% are medium use, avg a few hundred transactions/second
10% average around 5k txns/s, with bursts up to 25k txns/s for several minutes.
All servers have about 80% reads / 20% writes, though those numbers flip during big sync jobs and when the purging maintenance kicks off.

Re: [GENERAL] [PERFORM] Questionaire: Common WAL write rates on busy servers.

От
Claudio Freire
Дата:
On Tue, Apr 25, 2017 at 1:17 AM, Andres Freund <andres@anarazel.de> wrote:
> Questions (answer as many you can comfortably answer):
> - How many MB/s, segments/s do you see on busier servers?

~20MB/s with FPW compression, with peaks of ~35MB/s. Writes become the
bottleneck without compression and it tops at about 40-50MB/s, WAL
archiving cannot keep up beyond that point.

> - What generates the bulk of WAL on your servers (9.5+ can use
>   pg_xlogdump --stats to compute that)?

Type                                           N      (%)
Record size      (%)             FPI size      (%)        Combined
size      (%)
----                                           -      ---
-----------      ---             --------      ---
-------------      ---
XLOG                                           0 (  0.00)
      0 (  0.00)                    0 (  0.00)                    0 (
0.00)
Transaction                                   30 (  0.00)
    960 (  0.00)                    0 (  0.00)                  960 (
0.00)
Storage                                        0 (  0.00)
      0 (  0.00)                    0 (  0.00)                    0 (
0.00)
CLOG                                           0 (  0.00)
      0 (  0.00)                    0 (  0.00)                    0 (
0.00)
Database                                       0 (  0.00)
      0 (  0.00)                    0 (  0.00)                    0 (
0.00)
Tablespace                                     0 (  0.00)
      0 (  0.00)                    0 (  0.00)                    0 (
0.00)
MultiXact                                    110 (  0.01)
   7456 (  0.02)                    0 (  0.00)                 7456 (
0.00)
RelMap                                         0 (  0.00)
      0 (  0.00)                    0 (  0.00)                    0 (
0.00)
Standby                                        2 (  0.00)
    368 (  0.00)                    0 (  0.00)                  368 (
0.00)
Heap2                                       2521 (  0.22)
  78752 (  0.24)              4656133 (  2.82)              4734885 (
2.39)
Heap                                      539419 ( 46.52)
15646903 ( 47.14)             98720258 ( 59.87)            114367161 (
57.73)
Btree                                     606573 ( 52.31)
15872182 ( 47.82)             57514798 ( 34.88)             73386980 (
37.05)
Hash                                           0 (  0.00)
      0 (  0.00)                    0 (  0.00)                    0 (
0.00)
Gin                                         2866 (  0.25)
 134330 (  0.40)              4012251 (  2.43)              4146581 (
2.09)
Gist                                           0 (  0.00)
      0 (  0.00)                    0 (  0.00)                    0 (
0.00)
Sequence                                    7970 (  0.69)
1450540 (  4.37)                    0 (  0.00)              1450540 (
0.73)
SPGist                                         0 (  0.00)
      0 (  0.00)                    0 (  0.00)                    0 (
0.00)
BRIN                                           0 (  0.00)
      0 (  0.00)                    0 (  0.00)                    0 (
0.00)
CommitTs                                       0 (  0.00)
      0 (  0.00)                    0 (  0.00)                    0 (
0.00)
ReplicationOrigin                              0 (  0.00)
      0 (  0.00)                    0 (  0.00)                    0 (
0.00)
                                        --------
--------                      --------                      --------
Total                                    1159491
33191491 [16.76%]            164903440 [83.24%]            198094931
[100%]


> - Are you seeing WAL writes being a bottleneck?OA

Sometimes, more so without FPW compression

> - What kind of backup methods are you using and is the WAL volume a
>   problem?
> - What kind of replication are you using and is the WAL volume a
>   problem?

Streaming to hot standby + WAL archiving, delayed standby as backup
and PITR. Backups are regular filesystem-level snapshots of the
delayed standby (with postgres down to get consistent snapshots).

WAL volume getting full during periods where the hot standby lags
behind (or when we have to stop it to create consistent snapshots) are
an issue indeed, and we've had to provision significant storage to be
able to absorb those peaks (1TB of WAL)

We bundle WAL segments into groups of 256 segments for archiving and
recovery to minimize the impact of TCP slow start. We further gzip
segments before transfer with pigz, and we use mostly rsync (with a
wrapper script that takes care of durability and error handling) to
move segments around. Getting the archive/recovery scripts to handle
the load hasn't been trivial.

> - What are your settings for wal_compression, max_wal_size (9.5+) /
>   checkpoint_segments (< 9.5), checkpoint_timeout and wal_buffers?

wal_compression = on
max_wal_size = 12GB
min_wal_size = 2GB
checkpoint_timeout = 30min
wal_buffers = -1  (16MB effective)

> - Could you quickly describe your workload?

Steady stream of (preaggregated) input events plus upserts into ~12
partitioned aggregate "matviews" (within quotes since they're manually
maintained up to date).

Input rate is approximately 9000 rows/s without counting the upserts
onto the aggregate matviews. Old information is regularly compressed
and archived into less detailed partitions for a steady size of about
5TB.


Re: [GENERAL] Questionaire: Common WAL write rates on busy servers.

От
Jerry Sievers
Дата:
Andres Freund <andres@anarazel.de> writes:

> Hi,
>
> I've lately seen more and more installations where the generation of
> write-ahead-log (WAL) is one of the primary bottlenecks.  I'm curious
> whether that's primarily a "sampling error" of mine, or whether that's
> indeed more common.
>
> The primary reason I'm curious is that I'm pondering a few potential
> optimizations, and would like to have some guidance which are more and
> which are less important.

I have a busy warehouse spitting out about 400k
segments/week... ~10MB/second :-)

We have resorted to a rather complex batch/parallel compressor/shipper
to keep up with the volume.

>
> Questions (answer as many you can comfortably answer):
> - How many MB/s, segments/s do you see on busier servers?

Our busiest system Avg 10MB/second but very burst.  Assume it'w many
times that during high churn periods.

> - What generates the bulk of WAL on your servers (9.5+ can use
>   pg_xlogdump --stats to compute that)?

Simply warehouse incremental loading and/or full table delete/trunc and
reload, plus dirived data being created.  Many of the transient tables
are on NVME and unlogged.

> - Are you seeing WAL writes being a bottleneck?OA
> - What kind of backup methods are you using and is the WAL volume a
>   problem?

I do not know if basic local WAL writing itself is a problem of or not
but as mentioned, we are scarcely able to handle the necessary archiving
to make backups and PITR possible.

> - What kind of replication are you using and is the WAL volume a

Th;are 2 streamers both feeding directly from master.  We use a fairly
large 30k keep-segments value to help avoid streamers falling behind and
then having to resort to remote archive fetching.

It does appear that since streaming WAL reception and application as
well as of course remote fetching are single threaded, this is a
bottleneck as well.  That is, a totally unloded and well outfitted
(hardware wise) streamer can barely keep up with master.

> - What are your settings for wal_compression, max_wal_size (9.5+) /
>   checkpoint_segments (< 9.5), checkpoint_timeout and wal_buffers?

 checkpoint_timeout                  | 5min
 max_wal_size                        | 4GB
 wal_buffers                         | 16MB
 wal_compression                     | off

> - Could you quickly describe your workload?

warehouse with user self-service reporting creation/storage allowed in
same system.

>
> Feel free to add any information you think is pertinent ;)

Great idea!!  Thanks

>
> Greetings,
>
> Andres Freund

--
Jerry Sievers
Postgres DBA/Development Consulting
e: postgres.consulting@comcast.net
p: 312.241.7800


Re: [GENERAL] [PERFORM] Questionaire: Common WAL write rates on busy servers.

От
Andres Freund
Дата:
Hi,

On 2017-04-24 21:17:43 -0700, Andres Freund wrote:
> I've lately seen more and more installations where the generation of
> write-ahead-log (WAL) is one of the primary bottlenecks.  I'm curious
> whether that's primarily a "sampling error" of mine, or whether that's
> indeed more common.
>
> The primary reason I'm curious is that I'm pondering a few potential
> optimizations, and would like to have some guidance which are more and
> which are less important.
>
> Questions (answer as many you can comfortably answer):
> - How many MB/s, segments/s do you see on busier servers?
> - What generates the bulk of WAL on your servers (9.5+ can use
>   pg_xlogdump --stats to compute that)?
> - Are you seeing WAL writes being a bottleneck?OA
> - What kind of backup methods are you using and is the WAL volume a
>   problem?
> - What kind of replication are you using and is the WAL volume a
>   problem?
> - What are your settings for wal_compression, max_wal_size (9.5+) /
>   checkpoint_segments (< 9.5), checkpoint_timeout and wal_buffers?
> - Could you quickly describe your workload?

Ok, based on the, few, answers I've got so far, my experience is indeed
skewed.  A number of the PG users I interacted with over the last couple
years had WAL write ranges somewhere in the range of 500MB/s to 2.2GB/s
(max I'veseen).  At that point WAL insertion became a major bottleneck,
even if storage was more than fast enough to keep up.  To address these
we'd need some changes, but the feedback so far suggest that it's not
yet a widespread issue...

- Andres


Re: [GENERAL] [PERFORM] Questionaire: Common WAL write rates on busy servers.

От
"Joshua D. Drake"
Дата:
On 04/27/2017 08:59 AM, Andres Freund wrote:

>
> Ok, based on the, few, answers I've got so far, my experience is indeed
> skewed.  A number of the PG users I interacted with over the last couple
> years had WAL write ranges somewhere in the range of 500MB/s to 2.2GB/s
> (max I'veseen).  At that point WAL insertion became a major bottleneck,
> even if storage was more than fast enough to keep up.  To address these
> we'd need some changes, but the feedback so far suggest that it's not
> yet a widespread issue...

I would agree it isn't yet a widespread issue.

The only people that are likely going to see this are going to be on
bare metal. We should definitely plan on that issue for say 11. I do
have a question though, where you have seen this issue is it with
synchronous_commit on or off?

Thanks,

JD



--
Command Prompt, Inc.                  http://the.postgres.company/
                         +1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.
Unless otherwise stated, opinions are my own.


Re: [GENERAL] [PERFORM] Questionaire: Common WAL write rates on busy servers.

От
Andres Freund
Дата:
On 2017-04-27 09:31:34 -0700, Joshua D. Drake wrote:
> On 04/27/2017 08:59 AM, Andres Freund wrote:
>
> >
> > Ok, based on the, few, answers I've got so far, my experience is indeed
> > skewed.  A number of the PG users I interacted with over the last couple
> > years had WAL write ranges somewhere in the range of 500MB/s to 2.2GB/s
> > (max I'veseen).  At that point WAL insertion became a major bottleneck,
> > even if storage was more than fast enough to keep up.  To address these
> > we'd need some changes, but the feedback so far suggest that it's not
> > yet a widespread issue...
>
> I would agree it isn't yet a widespread issue.

I'm not yet sure about that actually.  I suspect a large percentage of
people with such workloads aren't lingering lots on the lists.


> The only people that are likely going to see this are going to be on bare
> metal. We should definitely plan on that issue for say 11.

"plan on that issue" - heh. We're talking about major engineering
projects here ;)


> I do have a question though, where you have seen this issue is it with
> synchronous_commit on or off?

Both. Whether that matters or not really depends on the workload. If you
have bulk writes, it doesn't really matter much.

- Andres


Re: [GENERAL] [PERFORM] Questionaire: Common WAL write rates on busy servers.

От
"Joshua D. Drake"
Дата:
On 04/27/2017 09:34 AM, Andres Freund wrote:
> On 2017-04-27 09:31:34 -0700, Joshua D. Drake wrote:
>> On 04/27/2017 08:59 AM, Andres Freund wrote:
>>

>> I would agree it isn't yet a widespread issue.
>
> I'm not yet sure about that actually.  I suspect a large percentage of
> people with such workloads aren't lingering lots on the lists.

That would probably be true. I was thinking of it more as the "most new
users are in the cloud" and the "cloud" is going to be rare that a cloud
user is going to be able to hit that level of writes. (at least not
without spending LOTS of money)

>
>
>> The only people that are likely going to see this are going to be on bare
>> metal. We should definitely plan on that issue for say 11.
>
> "plan on that issue" - heh. We're talking about major engineering
> projects here ;)

Sorry, wasn't trying to make light of the effort. :D

>
>
>> I do have a question though, where you have seen this issue is it with
>> synchronous_commit on or off?
>
> Both. Whether that matters or not really depends on the workload. If you
> have bulk writes, it doesn't really matter much.

Sure, o.k.

Thanks,

Andres

>
> - Andres
>


--
Command Prompt, Inc.                  http://the.postgres.company/
                         +1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.
Unless otherwise stated, opinions are my own.


Re: [GENERAL] [PERFORM] Questionaire: Common WAL write rates on busy servers.

От
Andres Freund
Дата:
On 2017-04-27 10:29:48 -0700, Joshua D. Drake wrote:
> On 04/27/2017 09:34 AM, Andres Freund wrote:
> > On 2017-04-27 09:31:34 -0700, Joshua D. Drake wrote:
> > > On 04/27/2017 08:59 AM, Andres Freund wrote:
> > >
>
> > > I would agree it isn't yet a widespread issue.
> >
> > I'm not yet sure about that actually.  I suspect a large percentage of
> > people with such workloads aren't lingering lots on the lists.
>
> That would probably be true. I was thinking of it more as the "most new
> users are in the cloud" and the "cloud" is going to be rare that a cloud
> user is going to be able to hit that level of writes. (at least not without
> spending LOTS of money)

You can get pretty decent NVMe SSD drives on serveral cloud providers
these days, without immediately bancrupting you.  Sure, it's instance
storage, but with a decent replication and archival setup, that's not
necessarily an issue.

It's not that hard to get to the point where postgres can't keep up with
storage, at least for some workloads.

- Andres


Re: [GENERAL] [PERFORM] Questionaire: Common WAL write rates on busy servers.

От
Tomas Vondra
Дата:
On 04/27/2017 06:34 PM, Andres Freund wrote:
> On 2017-04-27 09:31:34 -0700, Joshua D. Drake wrote:
>> On 04/27/2017 08:59 AM, Andres Freund wrote:
>>
>>>
>>> Ok, based on the, few, answers I've got so far, my experience is
>>> indeed skewed. A number of the PG users I interacted with over
>>> the last couple years had WAL write ranges somewhere in the range
>>> of 500MB/s to 2.2GB/s (max I'veseen). At that point WAL insertion
>>> became a major bottleneck, even if storage was more than fast
>>> enough to keep up. To address these we'd need some changes, but
>>> the feedback so far suggest that it's not yet a widespread
>>> issue...
>>
>> I would agree it isn't yet a widespread issue.
>
> I'm not yet sure about that actually. I suspect a large percentage
> of people with such workloads aren't lingering lots on the lists.
>

To a certain extent, this is a self-fulfilling prophecy. If you know
you'll have such a busy system, you probably do some research and
testing first, before choosing the database. If we don't perform well
enough, you pick something else. Which removes the data point.

Obviously, there are systems that start small and get busier and busier
over time. And those are the ones we see.

cheers

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [GENERAL] [PERFORM] Questionaire: Common WAL write rates on busy servers.

От
Tomas Vondra
Дата:
On 04/27/2017 07:35 PM, Andres Freund wrote:
> On 2017-04-27 10:29:48 -0700, Joshua D. Drake wrote:
>> On 04/27/2017 09:34 AM, Andres Freund wrote:
>>> On 2017-04-27 09:31:34 -0700, Joshua D. Drake wrote:
>>>> On 04/27/2017 08:59 AM, Andres Freund wrote:
>>>>
>>
>>>> I would agree it isn't yet a widespread issue.
>>>
>>> I'm not yet sure about that actually. I suspect a large
>>> percentage of people with such workloads aren't lingering lots on
>>> the lists.
>>
>> That would probably be true. I was thinking of it more as the
>> "most new users are in the cloud" and the "cloud" is going to be
>> rare that a cloud user is going to be able to hit that level of
>> writes. (at least not without spending LOTS of money)
>
> You can get pretty decent NVMe SSD drives on serveral cloud
> providers these days, without immediately bancrupting you. Sure, it's
> instance storage, but with a decent replication and archival setup,
> that's not necessarily an issue.
>
> It's not that hard to get to the point where postgres can't keep up
> with storage, at least for some workloads.
>

I can confirm this observation. I bought the Intel 750 NVMe SSD last
year, the device has 1GB DDR3 cache on it (power-loss protected), can do
~1GB/s of sustained O_DIRECT sequential writes. But when running
pgbench, I can't push more than ~300MB/s of WAL to it, no matter what I
do because of WALWriteLock.

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [GENERAL] [PERFORM] Questionaire: Common WAL write rates on busy servers.

От
Andres Freund
Дата:
On 2017-04-28 01:29:14 +0200, Tomas Vondra wrote:
> I can confirm this observation. I bought the Intel 750 NVMe SSD last year,
> the device has 1GB DDR3 cache on it (power-loss protected), can do ~1GB/s of
> sustained O_DIRECT sequential writes. But when running pgbench, I can't push
> more than ~300MB/s of WAL to it, no matter what I do because of
> WALWriteLock.

Hm, interesting.  Even if you up wal_buffers to 128MB, use
synchronous_commit = off, and play with wal_writer_delay/flush_after?

- Andres


Re: [GENERAL] [PERFORM] Questionaire: Common WAL write rates on busy servers.

От
Tomas Vondra
Дата:
On 04/28/2017 01:34 AM, Andres Freund wrote:
> On 2017-04-28 01:29:14 +0200, Tomas Vondra wrote:
>> I can confirm this observation. I bought the Intel 750 NVMe SSD last year,
>> the device has 1GB DDR3 cache on it (power-loss protected), can do ~1GB/s of
>> sustained O_DIRECT sequential writes. But when running pgbench, I can't push
>> more than ~300MB/s of WAL to it, no matter what I do because of
>> WALWriteLock.
>
> Hm, interesting.  Even if you up wal_buffers to 128MB, use
> synchronous_commit = off, and play with wal_writer_delay/flush_after?
>

I think I've tried things like that, but let me do some proper testing.
I'll report the numbers in a few days.

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services