Re: Enable data checksums by default

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: Enable data checksums by default
Дата
Msg-id 23278C72-65BD-4EAE-85D4-15CC51E77DC9@anarazel.de
обсуждение исходный текст
Ответ на Re: Enable data checksums by default  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Список pgsql-hackers

On March 30, 2019 3:25:43 PM EDT, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>On Fri, Mar 29, 2019 at 08:35:26PM +0100, Christoph Berg wrote:
>>Re: Bernd Helmle 2019-03-29
><3586bb9345a59bfc8d13a50a7c729be1ee6759fd.camel@oopsware.de>
>>> Am Freitag, den 29.03.2019, 23:10 +0900 schrieb Michael Paquier:
>>> >
>>> > I can't really believe that many people set up shared_buffers at
>>> > 128kB
>>> > which would cause such a large number of page evictions, but I can
>>> > believe that many users have shared_buffers set to its default
>value
>>> > and that we are going to get complains about "performance drop
>after
>>> > upgrade to v12" if we switch data checksums to on by default.
>>>
>>> Yeah, i think Christoph's benchmark is based on this thinking. I
>assume
>>> this very unrealistic scenery should emulate the worst case (many
>>> buffer_reads, high checksum calculation load).
>>
>>It's not unrealistic to have large seqscans that are all buffer
>>misses, the table just has to be big enough. The idea in my benchmark
>>was that if I make shared buffers really small, and the table still
>>fits in to RAM, I should be seeing only buffer misses, but without any
>>delay for actually reading from disk.
>>
>>Christoph
>>
>
>FWIW I think it's a mistake to focus solely on CPU utilization, which
>all the benchmarks performed on this thread do because they look at tps
>of in-memory read-only workloads. Checksums have other costs too, not
>just the additional CPU time. Most importanly they require
>wal_log_hints
>to be set (which people may or may not want anyway).
>
>I've done a simple benchmark, that does read-only (-S) and read-write
>(-N) pgbench runs with different scales, but also measures duration of
>the pgbench init and amount of WAL produced during the tests.
>
>On a small machine (i5, 8GB RAM, SSD RAID) the results are these:
>
>         scale    config     |    init           tps        wal
>    =========================|==================================
>    ro      10    no-hints   |       2        117038        130
>                  hints      |       2        116378        146
>                  checksums  |       2        115619        147
>          -------------------|----------------------------------
>           200    no-hints   |      32         88340        2407
>                  hints      |      37         86154        2628
>                  checksums  |      36         83336        2624
>          -------------------|----------------------------------
>          2000    no-hints   |     365         38680        1967
>                  hints      |     423         38670        2123
>                  checksums  |     504         37510        2046
>    -------------------------|----------------------------------
>    rw      10    no-hints   |       2         19691        437
>                  hints      |       2         19712        437
>                  checksums  |       2         19654        437
>          -------------------|----------------------------------
>           200    no-hints   |      32         15839        2745
>                  hints      |      37         15735        2783
>                  checksums  |      36         15646        2775
>          -------------------|----------------------------------
>          2000    no-hints   |     365         5371         3721
>                  hints      |     423         5270         3671
>                  checksums  |     504         5094         3574
>
>The no-hints config is default (wal_log_hints=off, data_checksums=off),
>hints sets wal_log_hints=on and checksums enables data checksums. All
>the configs were somewhat tuned (1GB shared buffers, max_wal_size high
>enough not to hit checkpoints very often, etc.).
>
>I've also done the tests on the a larger machine (2x E5-2620v4, 32GB of
>RAM, NVMe SSD), and the general pattern is about the same - while the
>tps and amount of WAL (not covering the init) does not change, the time
>for initialization increases significantly (by 20-40%).
>
>This effect is even clearer when using slower storage (SATA-based
>RAID).
>The results then look like this:
>
>         scale    config     |    init           tps        wal
>    =========================|==================================
>    ro     100    no-hints   |      49        229459        122
>                  hints      |     101        167983        190
>                  checksums  |     103        156307        190
>          -------------------|----------------------------------
>          1000    no-hints   |     580        152167        109
>                  hints      |    1047        122814        142
>                  checksums  |    1080        118586        141
>          -------------------|----------------------------------
>          6000    no-hints   |    4035           508          1
>                  hints      |   11193           502          1
>                  checksums  |   11376           506          1
>    -------------------------|----------------------------------
>    rw     100    no-hints   |      49           279        192
>                  hints      |     101           275        190
>                  checksums  |     103           275        190
>          -------------------|----------------------------------
>          1000    no-hints   |     580           237        210
>                  hints      |    1047           225        201
>                  checksums  |    1080           224        200
>          -------------------|----------------------------------
>          6000    no-hints   |    4035           135        123
>                  hints      |   11193           133        122
>                  checksums  |   11376           132        121
>
>and when expressed as relative to no-hints:
>
>            scale       config  |     init       tps        wal
>    ============================|===============================
>    ro        100        hints  |     206%       73%        155%
>                     checksums  |     210%       68%        155%
>             -------------------|--------------------------------
>             1000        hints  |     181%       81%        131%
>                     checksums  |     186%       78%        129%
>             -------------------|--------------------------------
>             6000        hints  |     277%       99%        100%
>                     checksums  |     282%       100%       104%
>    ----------------------------|--------------------------------
>    rw        100        hints  |     206%       99%        99%
>                     checksums  |     210%       99%        99%
>             -------------------|--------------------------------
>             1000        hints  |     181%       95%        96%
>                     checksums  |     186%       95%        95%
>             -------------------|--------------------------------
>             6000        hints  |     277%       99%        99%
>                     checksums  |     282%       98%        98%
>
>I have not investigated the exact reasons, but my hypothesis it's about
>the amount of WAL generated during the initial CREATE INDEX (because it
>probably ends up setting the hint bits), which puts additional pressure
>on the storage.
>
>Unfortunately, this additional cost is unlikely to go away :-(
>
>Now, maybe we want to enable checksums by default anyway, but we should
>not pretent the only cost related to checksums is CPU usage.

Thanks for running these, very helpful.
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tomas Vondra
Дата:
Сообщение: Re: [HACKERS] PATCH: multivariate histograms and MCV lists
Следующее
От: Andrew Dunstan
Дата:
Сообщение: Teach pg_upgrade test to honor NO_TEMP_INSTALL