Обсуждение: Using of --data-checksums
initdb --data-checksums "... help to detect corruption by the I/O system" There is an (negligible?) impact on performance, ok. Is there another reason NOT to use this feature ? Has anyone had good or bad experience with the use of --data-checksums? Thanks in advance! Bernhard -- Sent from: https://www.postgresql-archive.org/PostgreSQL-general-f1843780.html
On Tue, Apr 07, 2020 at 08:10:13AM -0700, BGoebel wrote: > initdb --data-checksums "... help to detect corruption by the I/O system" > There is an (negligible?) impact on performance, ok. > > Is there another reason NOT to use this feature ? > Has anyone had good or bad experience with the use of --data-checksums? FWIW, I have a good experience with it. Note that some performance impact of up to ~1% may be noticeable if you have a large number of buffer evictions from PostgreSQL shared buffer pool, but IMO the insurance of knowing that Postgres is not the cause of an on-disk corruption is largely worth it (in applications where I got that enabled we did not notice any performance impact even in very heavy production-like workloads, and this even if we had a rather low shared buffer setting with a much larger set of hot pages, causing the OS cache to be filled with most of the hot data). -- Michael
Вложения
Greetings, * BGoebel (b.goebel@prisma-computer.de) wrote: > initdb --data-checksums "... help to detect corruption by the I/O system" > There is an (negligible?) impact on performance, ok. > > Is there another reason NOT to use this feature ? Not in my view. > Has anyone had good or bad experience with the use of --data-checksums? Have had good experience with it. We should really make it the default already. Thanks, Stephen
Вложения
On Wed, Apr 8, 2020 at 11:54:34AM -0400, Stephen Frost wrote: > Greetings, > > * BGoebel (b.goebel@prisma-computer.de) wrote: > > initdb --data-checksums "... help to detect corruption by the I/O system" > > There is an (negligible?) impact on performance, ok. > > > > Is there another reason NOT to use this feature ? > > Not in my view. > > > Has anyone had good or bad experience with the use of --data-checksums? > > Have had good experience with it. We should really make it the default > already. Yeah, but I think we wanted more ability to change an existing cluster before doing that since it would affect pg_upgraded servers. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EnterpriseDB https://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
Greetings, * Bruce Momjian (bruce@momjian.us) wrote: > On Wed, Apr 8, 2020 at 11:54:34AM -0400, Stephen Frost wrote: > > * BGoebel (b.goebel@prisma-computer.de) wrote: > > > initdb --data-checksums "... help to detect corruption by the I/O system" > > > There is an (negligible?) impact on performance, ok. > > > > > > Is there another reason NOT to use this feature ? > > > > Not in my view. > > > > > Has anyone had good or bad experience with the use of --data-checksums? > > > > Have had good experience with it. We should really make it the default > > already. > > Yeah, but I think we wanted more ability to change an existing cluster > before doing that since it would affect pg_upgraded servers. There's definitely a lot of reasons to want to have the ability to change an existing cluster. Considering the complications around running pg_upgrade already, I don't really think that changing the default of initdb would be that big a hurdle for folks to deal with- they'd try the pg_upgrade, get a very quick error that the new cluster has checksums enabled and the old one didn't, and they'd re-initdb the new cluster and then re-run pg_upgrade to figure out what the next issue is.. Thanks, Stephen
Вложения
On Fri, Apr 10, 2020 at 04:37:46PM -0400, Stephen Frost wrote: > There's definitely a lot of reasons to want to have the ability to > change an existing cluster. Considering the complications around > running pg_upgrade already, I don't really think that changing the > default of initdb would be that big a hurdle for folks to deal with- > they'd try the pg_upgrade, get a very quick error that the new cluster > has checksums enabled and the old one didn't, and they'd re-initdb the > new cluster and then re-run pg_upgrade to figure out what the next issue > is.. We discussed that a couple of months ago, and we decided to keep that out of the upgrade story, no? Anyway, if you want to enable or disable data checksums on an existing cluster, you always have the possibility to use pg_checksums --enable. This exists in core since 12, and there is also a version on out of core for older versions of Postgres: https://github.com/credativ/pg_checksums. On apt-based distributions like Debian, this stuff is under the package postgresql-12-pg-checksums. -- Michael
Вложения
On Sun, Apr 12, 2020 at 8:05 AM Michael Paquier <michael@paquier.xyz> wrote:
On Fri, Apr 10, 2020 at 04:37:46PM -0400, Stephen Frost wrote:
> There's definitely a lot of reasons to want to have the ability to
> change an existing cluster. Considering the complications around
> running pg_upgrade already, I don't really think that changing the
> default of initdb would be that big a hurdle for folks to deal with-
> they'd try the pg_upgrade, get a very quick error that the new cluster
> has checksums enabled and the old one didn't, and they'd re-initdb the
> new cluster and then re-run pg_upgrade to figure out what the next issue
> is..
We discussed that a couple of months ago, and we decided to keep that
out of the upgrade story, no? Anyway, if you want to enable or
disable data checksums on an existing cluster, you always have the
possibility to use pg_checksums --enable. This exists in core since
12, and there is also a version on out of core for older versions of
Postgres: https://github.com/credativ/pg_checksums. On apt-based
distributions like Debian, this stuff is under the package
postgresql-12-pg-checksums.
The fact that this tool exists, and then in the format of pg_checksums --disable, I think is what makes the argument to turn on checksums by default possible. Because it's now very easy and fast to turn it off even if you've accumulated sizable data in your cluster. (Turning it on in this case is easy, but not fast).
And FWIW, I do think we should change the default. And maybe spend some extra effort on the message coming out of pg_upgrade in this case to make it clear to people what their options are and exactly what to do.
Magnus Hagander <magnus@hagander.net> writes: > And FWIW, I do think we should change the default. And maybe spend some > extra effort on the message coming out of pg_upgrade in this case to make > it clear to people what their options are and exactly what to do. Is there any hard evidence of checksums catching problems at all? Let alone in sufficient number to make them be on-by-default? regards, tom lane
On Sun, Apr 12, 2020 at 10:23:24AM -0400, Tom Lane wrote: > Magnus Hagander <magnus@hagander.net> writes: >> And FWIW, I do think we should change the default. And maybe spend some >> extra effort on the message coming out of pg_upgrade in this case to make >> it clear to people what their options are and exactly what to do. > > Is there any hard evidence of checksums catching problems at all? > Let alone in sufficient number to make them be on-by-default? I don't know if that's a sufficient number, but I have dealt with corruption cases on virtual environments where these have been really essential to find out proof that the origin of the problem was not Postgres because those bugs created wild and incorrect block overwrites. With the software stack getting more complicated, making them the default would make sense IMO. Now the case of upgrades is more tricky than it is, no? There is a copy of the file so we may be able to do a block-to-block copy and update of the checksum, but you cannot do that with the --link mode. -- Michael
Вложения
On Sun, Apr 12, 2020 at 4:23 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Magnus Hagander <magnus@hagander.net> writes:
> And FWIW, I do think we should change the default. And maybe spend some
> extra effort on the message coming out of pg_upgrade in this case to make
> it clear to people what their options are and exactly what to do.
Is there any hard evidence of checksums catching problems at all?
Let alone in sufficient number to make them be on-by-default?
I would say yes. I've certainly had a fair number of cases where they've detected storage corruption, especially with larger SAN type installation. And coupled with validating the checksum on backup (either with pg_basebackup or pgbackrest) it enables you to find the errors *early*, while you can still restore a previous backup and replay WAL to get to a point where you don't have to lose any data.
I believe both Stephen and David have some good stories they've heard from people catching such issues with backrest as well.
This and as Michael also points out, it lets you know that the problem occurred outside of PostgreSQL, makes for very important information when tracking down issues.
On 4/12/20 07:23, Tom Lane wrote: > Magnus Hagander <magnus@hagander.net> writes: >> And FWIW, I do think we should change the default. And maybe spend some >> extra effort on the message coming out of pg_upgrade in this case to make >> it clear to people what their options are and exactly what to do. > > Is there any hard evidence of checksums catching problems at all? > Let alone in sufficient number to make them be on-by-default? Data checksums are a hard requirement across the entire RDS PostgreSQL fleet - we do not allow it to be disabled in RDS. I've definitely seen a lot of hard evidence (for example, customer cases I've personally been involved in) that it catches problems. I could not exaggerate how useful and important I think this feature is: being able to very quickly and easily know that a problem originated outside of the PostgreSQL code. This was in part what led to that long blog article I wrote about checksums, and it's why enabling checksums was happiness hint #1 until I broke them into categories. FWIW, I also strongly agree that checksums should be enabled by default in the git.postgresql.org code. -Jeremy -- Jeremy Schneider Database Engineer Amazon Web Services
On Thu, Apr 16, 2020 at 03:47:34PM -0700, Jeremy Schneider wrote: > Data checksums are a hard requirement across the entire RDS PostgreSQL > fleet - we do not allow it to be disabled in RDS. I've definitely seen a > lot of hard evidence (for example, customer cases I've personally been > involved in) that it catches problems. Oh, that's good to know. Thanks for the information. I pushed hard as well to make this a requirement on what I work on. > I could not exaggerate how useful > and important I think this feature is: being able to very quickly and > easily know that a problem originated outside of the PostgreSQL code. The worst part with checksums disabled is having to tell a customer or a support staff that you don't actually know from where the problem comes, what is the actual origin of it, and why you think that the error you are seeing in the Postgres logs is most likely linked to a lower-level corruption as there can be many patterns, like broken btree pages, toast errors, missing attributes in catalogs, failures with FK references, primary key duplicates, etc. And people like to complain a lot about the database being broken because that's a very sensitive piece and usually more things depend on it. With checksums enabled, you still cannot say exactly from where the problem comes, but you can redirect the complains more easily to the correct people to help find out what the actual problem is. Even better, you can also know if a problem probably comes directly from Postgres and some backend logic if you don't see a checksum failure (note that could be as well a misdesigned HA workflow, custom backup script as well who knows but at least you know that something you control directly gets wrong). And the error message provided is clear. > This was in part what led to that long blog article I wrote about > checksums, and it's why enabling checksums was happiness hint #1 until I > broke them into categories. Reference? ;p -- Michael