Обсуждение: Online enabling of checksums

Поиск
Список
Период
Сортировка

Online enabling of checksums

От
Magnus Hagander
Дата:

Once more, here is an attempt to solve the problem of on-line enabling of checksums that me and Daniel have been hacking on for a bit. See for example https://www.postgresql.org/message-id/CABUevEx8KWhZE_XkZQpzEkZypZmBp3GbM9W90JLp%3D-7OJWBbcg%40mail.gmail.com and https://www.postgresql.org/message-id/flat/FF393672-5608-46D6-9224-6620EC532693%40endpoint.com#FF393672-5608-46D6-9224-6620EC532693@endpoint.com for some previous discussions.

Base design:

Change the checksum flag to instead of on and off be an enum. off/inprogress/on. When checksums are off and on, they work like today. When checksums are in progress, checksums are *written* but not verified. State can go from “off” to “inprogress”, from “inprogress” to either “on” or “off”, or from “on” to “off”.

Two new functions are added, pg_enable_data_checksums() and pg_disable_data_checksums(). The disable one is easy -- it just changes to disable. The enable one will change the state to inprogress, and then start a background worker (the “checksumhelper launcher”). This worker in turn will start one sub-worker (“checksumhelper worker”) in each database (currently all done sequentially). This worker will enumerate all tables/indexes/etc in the database and validate their checksums. If there is no checksum, or the checksum is incorrect, it will compute a new checksum and write it out. When all databases have been processed, the checksum state changes to “on” and the launcher shuts down. At this point, the cluster has checksums enabled as if it was initdb’d with checksums turned on.

If the cluster shuts down while “inprogress”, the DBA will have to manually either restart the worker (by calling pg_enable_checksums()) or turn checksums off again. Checksums “in progress” only carries a cost and no benefit.

The change of the checksum state is WAL logged with a new xlog record. All the buffers written by the background worker are forcibly enabled full page writes to make sure the checksum is fully updated on the standby even if no actual contents of the buffer changed.

We’ve also included a small commandline tool, bin/pg_verify_checksums, that can be run against an offline cluster to validate all checksums. Future improvements includes being able to use the background worker/launcher to perform an online check as well. Being able to run more parallel workers in the checksumhelper might also be of interest.

The patch includes two sets of tests, an isolation test turning on checksums while one session is writing to the cluster and another is continuously reading, to simulate turning on checksums in a production database. There is also a TAP test which enables checksums with streaming replication turned on to test the new xlog record. The isolation test ran into the 1024 character limit of the isolation test lexer, with a separate patch and discussion at https://www.postgresql.org/message-id/8D628BE4-6606-4FF6-A3FF-8B2B0E9B43D0@yesql.se


--
Вложения

Re: Online enabling of checksums

От
Magnus Hagander
Дата:
Re-sending this one with proper formatting. Apologies for the horrible gmail-screws-up-the-text-part of the last one!


No change to patch or text, just the formatting.

//Magnus



Once more, here is an attempt to solve the problem of on-line enabling of checksums that me and Daniel have been hacking on for a bit. See for example https://www.postgresql.org/message-id/CABUevEx8KWhZE_XkZQpzEkZypZmBp3GbM9W90JLp%3D-7OJWBbcg%40mail.gmail.com and https://www.postgresql.org/message-id/flat/FF393672-5608-46D6-9224-6620EC532693%40endpoint.com#FF393672-5608-46D6-9224-6620EC532693@endpoint.com for some previous discussions.


Base design:

Change the checksum flag to instead of on and off be an enum. off/inprogress/on. When checksums are off and on, they work like today. When checksums are in progress, checksums are *written* but not verified. State can go from “off” to “inprogress”, from “inprogress” to either “on” or “off”, or from “on” to “off”.


Two new functions are added, pg_enable_data_checksums() and pg_disable_data_checksums(). The disable one is easy -- it just changes to disable. The enable one will change the state to inprogress, and then start a background worker (the “checksumhelper launcher”). This worker in turn will start one sub-worker (“checksumhelper worker”) in each database (currently all done sequentially). This worker will enumerate all tables/indexes/etc in the database and validate their checksums. If there is no checksum, or the checksum is incorrect, it will compute a new checksum and write it out. When all databases have been processed, the checksum state changes to “on” and the launcher shuts down. At this point, the cluster has checksums enabled as if it was initdb’d with checksums turned on.


If the cluster shuts down while “inprogress”, the DBA will have to manually either restart the worker (by calling pg_enable_checksums()) or turn checksums off again. Checksums “in progress” only carries a cost and no benefit.


The change of the checksum state is WAL logged with a new xlog record. All the buffers written by the background worker are forcibly enabled full page writes to make sure the checksum is fully updated on the standby even if no actual contents of the buffer changed.


We’ve also included a small commandline tool, bin/pg_verify_checksums, that can be run against an offline cluster to validate all checksums. Future improvements includes being able to use the background worker/launcher to perform an online check as well. Being able to run more parallel workers in the checksumhelper might also be of interest.


The patch includes two sets of tests, an isolation test turning on checksums while one session is writing to the cluster and another is continuously reading, to simulate turning on checksums in a production database. There is also a TAP test which enables checksums with streaming replication turned on to test the new xlog record. The isolation test ran into the 1024 character limit of the isolation test lexer, with a separate patch and discussion at https://www.postgresql.org/message-id/8D628BE4-6606-4FF6-A3FF-8B2B0E9B43D0@yesql.se
Вложения

Re: Online enabling of checksums

От
Peter Eisentraut
Дата:
On 2/21/18 15:53, Magnus Hagander wrote:
> *Two new functions are added, pg_enable_data_checksums() and
> pg_disable_data_checksums(). The disable one is easy -- it just changes
> to disable. The enable one will change the state to inprogress, and then
> start a background worker (the “checksumhelper launcher”). This worker
> in turn will start one sub-worker (“checksumhelper worker”) in each
> database (currently all done sequentially).*

This is at least the fourth version of the pattern launcher plus worker
background workers.  I wonder whether we can do something to make this
easier and less repetitive.  Not in this patch, of course.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Online enabling of checksums

От
Andrey Borodin
Дата:
Hello, Magnus, Peter!

I'm excited that this feature emerged, thanks for the patch. Hope it will help to fix some mistakes made during initdb long time ago...

22 февр. 2018 г., в 18:22, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> написал(а):

On 2/21/18 15:53, Magnus Hagander wrote:
*Two new functions are added, pg_enable_data_checksums() and
pg_disable_data_checksums(). The disable one is easy -- it just changes
to disable. The enable one will change the state to inprogress, and then
start a background worker (the “checksumhelper launcher”). This worker
in turn will start one sub-worker (“checksumhelper worker”) in each
database (currently all done sequentially).*

This is at least the fourth version of the pattern launcher plus worker
background workers.  I wonder whether we can do something to make this
easier and less repetitive.  Not in this patch, of course.

Peter, can I ask for some pointers in searching for previous versions?
I want to review patch this patch and some code comparision could be handy....

So far I've found only this [0,1] (without code) and threads mentioned by Magnus [2,3]

Or do you mean extracting "worker+lancher" for reuse for other purposes?

Best regards, Andrey Borodin.


Re: Online enabling of checksums

От
Magnus Hagander
Дата:
On Thu, Feb 22, 2018 at 4:47 PM, Andrey Borodin <x4mmm@yandex-team.ru> wrote:
Hello, Magnus, Peter!

I'm excited that this feature emerged, thanks for the patch. Hope it will help to fix some mistakes made during initdb long time ago...

22 февр. 2018 г., в 18:22, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> написал(а):

On 2/21/18 15:53, Magnus Hagander wrote:
*Two new functions are added, pg_enable_data_checksums() and
pg_disable_data_checksums(). The disable one is easy -- it just changes
to disable. The enable one will change the state to inprogress, and then
start a background worker (the “checksumhelper launcher”). This worker
in turn will start one sub-worker (“checksumhelper worker”) in each
database (currently all done sequentially).*

This is at least the fourth version of the pattern launcher plus worker
background workers.  I wonder whether we can do something to make this
easier and less repetitive.  Not in this patch, of course.

Peter, can I ask for some pointers in searching for previous versions?
I want to review patch this patch and some code comparision could be handy....

So far I've found only this [0,1] (without code) and threads mentioned by Magnus [2,3]

Or do you mean extracting "worker+lancher" for reuse for other purposes?


I'm pretty sure Peter means the second. Which could be interesting, but as he says, not the topic for this patch. 

I'm not entirely sure which the others ones are. Auto-Vacuum obviously is one, which doesn't use the worker infrastructure. But I'm not sure which the others are referring to? 

--

Re: Online enabling of checksums

От
Andres Freund
Дата:
On 2018-02-22 08:22:48 -0500, Peter Eisentraut wrote:
> On 2/21/18 15:53, Magnus Hagander wrote:
> > *Two new functions are added, pg_enable_data_checksums() and
> > pg_disable_data_checksums(). The disable one is easy -- it just changes
> > to disable. The enable one will change the state to inprogress, and then
> > start a background worker (the “checksumhelper launcher”). This worker
> > in turn will start one sub-worker (“checksumhelper worker”) in each
> > database (currently all done sequentially).*
> 
> This is at least the fourth version of the pattern launcher plus worker
> background workers.  I wonder whether we can do something to make this
> easier and less repetitive.  Not in this patch, of course.

I suspect I'm going to get some grief for this, but I think the time has
come to bite the bullet and support changing databases in the same
process...

Greetings,

Andres Freund


Re: Online enabling of checksums

От
Magnus Hagander
Дата:
On Thu, Feb 22, 2018 at 8:24 PM, Andres Freund <andres@anarazel.de> wrote:
On 2018-02-22 08:22:48 -0500, Peter Eisentraut wrote:
> On 2/21/18 15:53, Magnus Hagander wrote:
> > *Two new functions are added, pg_enable_data_checksums() and
> > pg_disable_data_checksums(). The disable one is easy -- it just changes
> > to disable. The enable one will change the state to inprogress, and then
> > start a background worker (the “checksumhelper launcher”). This worker
> > in turn will start one sub-worker (“checksumhelper worker”) in each
> > database (currently all done sequentially).*
>
> This is at least the fourth version of the pattern launcher plus worker
> background workers.  I wonder whether we can do something to make this
> easier and less repetitive.  Not in this patch, of course.

I suspect I'm going to get some grief for this, but I think the time has
come to bite the bullet and support changing databases in the same
process...

Hey, I can't even see the goalposts anymore :P

Are you saying this should be done *in general*, or specifically for background workers? I'm assuming you mean the general case? That would be very useful, but is probably a fairly non-trivial task (TM). 


--

Re: Online enabling of checksums

От
Andres Freund
Дата:
Hi,

On 2018-02-22 20:30:52 +0100, Magnus Hagander wrote:
> On Thu, Feb 22, 2018 at 8:24 PM, Andres Freund <andres@anarazel.de> wrote:
> > I suspect I'm going to get some grief for this, but I think the time has
> > come to bite the bullet and support changing databases in the same
> > process...
> >
> 
> Hey, I can't even see the goalposts anymore :P

Hah. I vote for making this a hard requirement :P


> Are you saying this should be done *in general*, or specifically for
> background workers? I'm assuming you mean the general case?

I'd say bgworkers first. It's a lot clearer how to exactly do it
there. Refactoring the mainloop handling in PostgresMain() would be a
bigger task.


> That would be very useful, but is probably a fairly non-trivial task
> (TM).

I'm not actually that sure it is. We have nearly all the code, I
think. Syscache inval, ProcKill(), and then you're nearly ready to do
the normal connection dance again.

Greetings,

Andres Freund


Re: Online enabling of checksums

От
Magnus Hagander
Дата:
On Thu, Feb 22, 2018 at 8:41 PM, Andres Freund <andres@anarazel.de> wrote:

On 2018-02-22 20:30:52 +0100, Magnus Hagander wrote:
> On Thu, Feb 22, 2018 at 8:24 PM, Andres Freund <andres@anarazel.de> wrote:
> > I suspect I'm going to get some grief for this, but I think the time has
> > come to bite the bullet and support changing databases in the same
> > process...
> >
>
> Hey, I can't even see the goalposts anymore :P

Hah. I vote for making this a hard requirement :P

Hah! Are you handing out binoculars? :)

 
> Are you saying this should be done *in general*, or specifically for
> background workers? I'm assuming you mean the general case?

I'd say bgworkers first. It's a lot clearer how to exactly do it
there. Refactoring the mainloop handling in PostgresMain() would be a
bigger task.


Yeah, it'd probably be easier. I don't know exactly what it'd involve but clearly less.

In this particular case that would at least phase 1 simplify it because we'd only need one process instead of worker/launcher. However, if we'd ever want to parallellize it -- or any other process of the style, like autovacuum -- you'd still need a launcher+worker combo. So making that particular scenario simpler might be worthwhile on it's own.


 
> That would be very useful, but is probably a fairly non-trivial task
> (TM).

I'm not actually that sure it is. We have nearly all the code, I
think. Syscache inval, ProcKill(), and then you're nearly ready to do
the normal connection dance again.

I'll take your word for it :) I haven't dug into that part.
 
--

Re: Online enabling of checksums

От
Andres Freund
Дата:

On February 22, 2018 11:44:17 AM PST, Magnus Hagander <magnus@hagander.net> wrote:
>On Thu, Feb 22, 2018 at 8:41 PM, Andres Freund <andres@anarazel.de>
>wrote:
>In this particular case that would at least phase 1 simplify it because
>we'd only need one process instead of worker/launcher. However, if we'd
>ever want to parallellize it -- or any other process of the style, like
>autovacuum -- you'd still need a launcher+worker combo. So making that
>particular scenario simpler might be worthwhile on it's own.

Why is that needed? You can just start two bgworkers and process a list of items stored in shared memory. Or even just
check,I assume there'd be a catalog flag somewhere, whether a database / table / object of granularity has already been
processedand use locking to prevent concurrent access. 

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.


Re: Online enabling of checksums

От
Peter Eisentraut
Дата:
On 2/22/18 12:38, Magnus Hagander wrote:
> I'm not entirely sure which the others ones are. Auto-Vacuum obviously
> is one, which doesn't use the worker infrastructure. But I'm not sure
> which the others are referring to? 

autovacuum, subscription workers, auto prewarm

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Online enabling of checksums

От
Magnus Hagander
Дата:
On Thu, Feb 22, 2018 at 8:52 PM, Andres Freund <andres@anarazel.de> wrote:


On February 22, 2018 11:44:17 AM PST, Magnus Hagander <magnus@hagander.net> wrote:
>On Thu, Feb 22, 2018 at 8:41 PM, Andres Freund <andres@anarazel.de>
>wrote:
>In this particular case that would at least phase 1 simplify it because
>we'd only need one process instead of worker/launcher. However, if we'd
>ever want to parallellize it -- or any other process of the style, like
>autovacuum -- you'd still need a launcher+worker combo. So making that
>particular scenario simpler might be worthwhile on it's own.

Why is that needed? You can just start two bgworkers and process a list of items stored in shared memory. Or even just check, I assume there'd be a catalog flag somewhere, whether a database / table / object of granularity has already been processed and use locking to prevent concurrent access.

You could do that, but then you've moving the complexity to managing that list in shared memory instead. I'm not  sure that's any easier... And certainly adding a catalog flag for a usecase like this one is not making it easier.

--

Re: Online enabling of checksums

От
Andres Freund
Дата:
Hi,

On 2018-02-22 21:16:02 +0100, Magnus Hagander wrote:
> You could do that, but then you've moving the complexity to managing that
> list in shared memory instead.

Maybe I'm missing something, but how are you going to get quick parallel
processing if you don't have a shmem piece? You can't assign one
database per worker because commonly there's only one database. You
don't want to start/stop a worker for each relation because that'd be
extremely slow for databases with a lot of tables. Without shmem you
can't pass more than an oid to a bgworker. To me the combination of
these things imply that you need some other synchronization mechanism
*anyway*.


> I'm not  sure that's any easier... And
> certainly adding a catalog flag for a usecase like this one is not making
> it easier.

Hm, I imagined you'd need that anyway. Imagine a 10TB database that's
online converted to checksums. I assume you'd not want to reread 9TB if
you crash after processing most of the cluster already?

Regards,

Andres Freund


Re: Online enabling of checksums

От
Magnus Hagander
Дата:


On Thu, Feb 22, 2018 at 9:23 PM, Andres Freund <andres@anarazel.de> wrote:
Hi,

On 2018-02-22 21:16:02 +0100, Magnus Hagander wrote:
> You could do that, but then you've moving the complexity to managing that
> list in shared memory instead.

Maybe I'm missing something, but how are you going to get quick parallel
processing if you don't have a shmem piece? You can't assign one
database per worker because commonly there's only one database. You
don't want to start/stop a worker for each relation because that'd be
extremely slow for databases with a lot of tables. Without shmem you
can't pass more than an oid to a bgworker. To me the combination of
these things imply that you need some other synchronization mechanism
*anyway*.

Yes, you probably need something like that if you want to be able to parallelize on things inside each database. If you are OK parallelizing things on a per-database level, you don't need it.

 
> I'm not  sure that's any easier... And
> certainly adding a catalog flag for a usecase like this one is not making
> it easier.

Hm, I imagined you'd need that anyway. Imagine a 10TB database that's
online converted to checksums. I assume you'd not want to reread 9TB if
you crash after processing most of the cluster already?

I would prefer that yes. But having to re-read 9TB is still significantly better than not being able to turn on checksums at all (state today). And adding a catalog column for it will carry the cost of the migration *forever*, both for clusters that never have checksums and those that had it from the beginning.

Accepting that the process will start over (but only read, not re-write, the blocks that have already been processed) in case of a crash does significantly simplify the process, and reduce the long-term cost of it in the form of entries in the catalogs. Since this is a on-time operation (or for many people, a zero-time operation), paying that cost that one time is probably better than paying a much smaller cost but constantly.

--

Re: Online enabling of checksums

От
Michael Paquier
Дата:
On Thu, Feb 22, 2018 at 11:24:37AM -0800, Andres Freund wrote:
> I suspect I'm going to get some grief for this, but I think the time has
> come to bite the bullet and support changing databases in the same
> process...

I'd like to see that.  Last time this has been discussed, and Robert
complained to me immediately when I suggested it, is that this is not
worth it with the many complications around syscache handling and
resource cleanup.  It is in the long term more stable to use a model
where a parent process handles a set of children and decides to which
database each child should spawn, which is what autovacuum does.
--
Michael

Вложения

Re: Online enabling of checksums

От
Andres Freund
Дата:
How does:

On 2018-02-23 11:48:16 +0900, Michael Paquier wrote:
> On Thu, Feb 22, 2018 at 11:24:37AM -0800, Andres Freund wrote:
> > I suspect I'm going to get some grief for this, but I think the time has
> > come to bite the bullet and support changing databases in the same
> > process...
> 
> I'd like to see that.  Last time this has been discussed, and Robert
> complained to me immediately when I suggested it, is that this is not
> worth it with the many complications around syscache handling and
> resource cleanup.

relate to:

> It is in the long term more stable to use a model
> where a parent process handles a set of children and decides to which
> database each child should spawn, which is what autovacuum does.

?


Re: Online enabling of checksums

От
Robert Haas
Дата:
On Thu, Feb 22, 2018 at 9:48 PM, Michael Paquier <michael@paquier.xyz> wrote:
> On Thu, Feb 22, 2018 at 11:24:37AM -0800, Andres Freund wrote:
>> I suspect I'm going to get some grief for this, but I think the time has
>> come to bite the bullet and support changing databases in the same
>> process...
>
> I'd like to see that.  Last time this has been discussed, and Robert
> complained to me immediately when I suggested it, is that this is not
> worth it with the many complications around syscache handling and
> resource cleanup.  It is in the long term more stable to use a model
> where a parent process handles a set of children and decides to which
> database each child should spawn, which is what autovacuum does.

My position is that allowing processes to change databases is a good
idea but (1) it will probably take some work to get correct and (2) it
probably won't be super-fast due to the need to flush absolutely every
bit of state in sight that might've been influenced by the choice of
database.

I also agree with Andres that this email is not very easy to
understand, although my complaint is not so much that I don't see how
the parts relate as that you seem to be contradicting yourself.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Online enabling of checksums

От
Magnus Hagander
Дата:
On Thu, Feb 22, 2018 at 9:09 PM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
On 2/22/18 12:38, Magnus Hagander wrote:
> I'm not entirely sure which the others ones are. Auto-Vacuum obviously
> is one, which doesn't use the worker infrastructure. But I'm not sure
> which the others are referring to? 

autovacuum, subscription workers, auto prewarm


Oh, for some reason I thought you were thinking in pending patches. Yeah, for those it makes sense -- though autovacuum isn't (currently) using background workers for what it does, the rest certainly makes sense to do something with.

But as you say, that's a separate patch :) 


--

Re: Online enabling of checksums

От
Robert Haas
Дата:
On Thu, Feb 22, 2018 at 3:28 PM, Magnus Hagander <magnus@hagander.net> wrote:
> I would prefer that yes. But having to re-read 9TB is still significantly
> better than not being able to turn on checksums at all (state today). And
> adding a catalog column for it will carry the cost of the migration
> *forever*, both for clusters that never have checksums and those that had it
> from the beginning.
>
> Accepting that the process will start over (but only read, not re-write, the
> blocks that have already been processed) in case of a crash does
> significantly simplify the process, and reduce the long-term cost of it in
> the form of entries in the catalogs. Since this is a on-time operation (or
> for many people, a zero-time operation), paying that cost that one time is
> probably better than paying a much smaller cost but constantly.

That's not totally illogical, but to be honest I'm kinda surprised
that you're approaching it that way.  I would have thought that
relchecksums and datchecksums columns would have been a sort of
automatic design choice for this feature.  The thing to keep in mind
is that nobody's going to notice the overhead of adding those columns
in practice, but someone will surely notice the pain that comes from
having to restart the whole operation.  You're talking about trading
an effectively invisible overhead for a very noticeable operational
problem.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Online enabling of checksums

От
Tomas Vondra
Дата:

On 02/24/2018 01:34 AM, Robert Haas wrote:
> On Thu, Feb 22, 2018 at 3:28 PM, Magnus Hagander <magnus@hagander.net> wrote:
>> I would prefer that yes. But having to re-read 9TB is still significantly
>> better than not being able to turn on checksums at all (state today). And
>> adding a catalog column for it will carry the cost of the migration
>> *forever*, both for clusters that never have checksums and those that had it
>> from the beginning.
>>
>> Accepting that the process will start over (but only read, not re-write, the
>> blocks that have already been processed) in case of a crash does
>> significantly simplify the process, and reduce the long-term cost of it in
>> the form of entries in the catalogs. Since this is a on-time operation (or
>> for many people, a zero-time operation), paying that cost that one time is
>> probably better than paying a much smaller cost but constantly.
> 
> That's not totally illogical, but to be honest I'm kinda surprised
> that you're approaching it that way.  I would have thought that
> relchecksums and datchecksums columns would have been a sort of
> automatic design choice for this feature.  The thing to keep in mind
> is that nobody's going to notice the overhead of adding those columns
> in practice, but someone will surely notice the pain that comes from
> having to restart the whole operation.  You're talking about trading
> an effectively invisible overhead for a very noticeable operational
> problem.
> 

I agree having to restart the whole operation after a crash is not
ideal, but I don't see how adding a flag actually solves it. The problem
is the large databases often store most of the data (>80%) in one or two
central tables (think fact tables in star schema, etc.). So if you
crash, it's likely half-way while processing this table, so the whole
table would still have relchecksums=false and would have to be processed
from scratch.

But perhaps you meant something like "position" instead of just a simple
true/false flag?

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Online enabling of checksums

От
Andres Freund
Дата:
Hi,

On 2018-02-24 03:07:28 +0100, Tomas Vondra wrote:
> I agree having to restart the whole operation after a crash is not
> ideal, but I don't see how adding a flag actually solves it. The problem
> is the large databases often store most of the data (>80%) in one or two
> central tables (think fact tables in star schema, etc.). So if you
> crash, it's likely half-way while processing this table, so the whole
> table would still have relchecksums=false and would have to be processed
> from scratch.

I don't think it's quite as large a problem as you make it out to
be. Even in those cases you'll usually have indexes, toast tables and so
forth.

> But perhaps you meant something like "position" instead of just a simple
> true/false flag?

I think that'd incur a much larger complexity cost.

Greetings,

Andres Freund


Re: Online enabling of checksums

От
Tomas Vondra
Дата:

On 02/24/2018 03:11 AM, Andres Freund wrote:
> Hi,
> 
> On 2018-02-24 03:07:28 +0100, Tomas Vondra wrote:
>> I agree having to restart the whole operation after a crash is not
>> ideal, but I don't see how adding a flag actually solves it. The problem
>> is the large databases often store most of the data (>80%) in one or two
>> central tables (think fact tables in star schema, etc.). So if you
>> crash, it's likely half-way while processing this table, so the whole
>> table would still have relchecksums=false and would have to be processed
>> from scratch.
> 
> I don't think it's quite as large a problem as you make it out to
> be. Even in those cases you'll usually have indexes, toast tables and so
> forth.
> 

Hmmm, right. I've been focused on tables and kinda forgot that the other
objects need to be transformed too ... :-/

>> But perhaps you meant something like "position" instead of just a simple
>> true/false flag?
> 
> I think that'd incur a much larger complexity cost.
> 

Yep, that was part of the point that I was getting to - that actually
addressing the issue would be more expensive than simple flags. But as
you pointed out, that was not quite ... well thought through.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Online enabling of checksums

От
Stephen Frost
Дата:
Greetings,

* Tomas Vondra (tomas.vondra@2ndquadrant.com) wrote:
> On 02/24/2018 03:11 AM, Andres Freund wrote:
> > On 2018-02-24 03:07:28 +0100, Tomas Vondra wrote:
> >> I agree having to restart the whole operation after a crash is not
> >> ideal, but I don't see how adding a flag actually solves it. The problem
> >> is the large databases often store most of the data (>80%) in one or two
> >> central tables (think fact tables in star schema, etc.). So if you
> >> crash, it's likely half-way while processing this table, so the whole
> >> table would still have relchecksums=false and would have to be processed
> >> from scratch.
> >
> > I don't think it's quite as large a problem as you make it out to
> > be. Even in those cases you'll usually have indexes, toast tables and so
> > forth.
>
> Hmmm, right. I've been focused on tables and kinda forgot that the other
> objects need to be transformed too ... :-/

There's also something of a difference between just scanning a table or
index, where you don't have to do much in the way of actual writes
because most of the table already has valid checksums, and having to
actually write out all the changes.

> >> But perhaps you meant something like "position" instead of just a simple
> >> true/false flag?
> >
> > I think that'd incur a much larger complexity cost.
>
> Yep, that was part of the point that I was getting to - that actually
> addressing the issue would be more expensive than simple flags. But as
> you pointed out, that was not quite ... well thought through.

No, but it's also not entirely wrong.  Huge tables aren't uncommon.

That said, I'm not entirely convinced that these new flags would be as
unnoticed as is being suggested here, but rather than focus on either
side of that, I'm thinking about what we want to have *next*- we know
that enabling/disabling checksums is an issue that needs to be solved,
and this patch is making progress towards that, but the next question is
what does one do when a page has been detected as corrupted?  Are there
flag fields which would be useful to have at a per-relation level to
support some kind of corrective action or setting that says "don't care
about checksums on this table, even though the entire database is
supposed to have valid checksums, but instead do X with failed pages" or
similar.

Beyond dealing with corruption-recovery cases, are there other use cases
for having a given table not have checksums?

Would it make sense to introduce a flag or field which indicates that an
entire table's pages has some set of attributes, of which 'checksums' is
just one attribute?  Perhaps a page version, which potentially allows us
to have a way to change page layouts in the future?

I'm happy to be told that we simply don't have enough information at
this point to make anything larger than a relchecksums field-level
decision, but perhaps these thoughts will spark an idea about how we
could define something a bit broader with clear downstream usefulness
that happens to also cover the "does this relation have checksums?"
question.

Thanks!

Stephen

Вложения

Re: Online enabling of checksums

От
Tomas Vondra
Дата:
Hi,

I see the patch also does throttling by calling vacuum_delay_point().
Being able to throttle the checksum workers not to affect user activity
definitely seems like a useful feature, no complaints here.

But perhaps binding it to vacuum_cost_limit/vacuum_cost_delay is not the
best idea? I mean, enabling checksums seems rather unrelated to vacuum,
so it seems a bit strange to configure it by vacuum_* options.

Also, how am I supposed to set the cost limit? Perhaps I'm missing
something, but the vacuum_delay_point call happens in the bgworker, so
setting the cost limit before running pg_enable_data_checksums() will
get there, right? I really don't want to be setting it in the config
file, because then it will suddenly affect all user VACUUM commands.

And if this patch gets improved to use multiple parallel workers, we'll
need a global limit (something like we have for autovacuum workers).

In any case, I suggest mentioning this in the docs.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Online enabling of checksums

От
Michael Banck
Дата:
Hi,

On Wed, Feb 21, 2018 at 09:53:31PM +0100, Magnus Hagander wrote:
> We’ve also included a small commandline tool, bin/pg_verify_checksums,
> that can be run against an offline cluster to validate all checksums.

The way it is coded in the patch will make pg_verify_checksums fail for
heap files with multiple segments, i.e. tables over 1 GB, becuase the
block number is consecutive and you start over from 0:

$ pgbench -i -s 80 -h /tmp
[...]
$ pg_verify_checksums -D data1
pg_verify_checksums: data1/base/12364/16396.1, block 0, invalid checksum
in file 6D61, calculated 6D5F
pg_verify_checksums: data1/base/12364/16396.1, block 1, invalid checksum
in file 7BE5, calculated 7BE7
[...]
Checksum scan completed
Data checksum version: 1
Files scanned:  943
Blocks scanned: 155925
Bad checksums:  76
 

Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer


Re: Online enabling of checksums

От
Magnus Hagander
Дата:
On Sat, Feb 24, 2018 at 1:34 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Feb 22, 2018 at 3:28 PM, Magnus Hagander <magnus@hagander.net> wrote:
> I would prefer that yes. But having to re-read 9TB is still significantly
> better than not being able to turn on checksums at all (state today). And
> adding a catalog column for it will carry the cost of the migration
> *forever*, both for clusters that never have checksums and those that had it
> from the beginning.
>
> Accepting that the process will start over (but only read, not re-write, the
> blocks that have already been processed) in case of a crash does
> significantly simplify the process, and reduce the long-term cost of it in
> the form of entries in the catalogs. Since this is a on-time operation (or
> for many people, a zero-time operation), paying that cost that one time is
> probably better than paying a much smaller cost but constantly.

That's not totally illogical, but to be honest I'm kinda surprised
that you're approaching it that way.  I would have thought that
relchecksums and datchecksums columns would have been a sort of
automatic design choice for this feature.  The thing to keep in mind
is that nobody's going to notice the overhead of adding those columns
in practice, but someone will surely notice the pain that comes from
having to restart the whole operation.  You're talking about trading
an effectively invisible overhead for a very noticeable operational
problem.

Is it really that invisible? Given how much we argue over adding single counters to the stats system, I'm not sure it's quite that low.

We did consider doing it at a per-table basis as well. But this is also an overhead that has to be paid forever, whereas the risk of having to read the database files more than once (because it'd only have to read them on the second pass, not write anything) is a one-off operation. And for all those that have initialized with checksums in the first place don't have to pay any overhead at all in the current design.

I very strongly doubg it's a "very noticeable operational problem". People don't restart their databases very often... Let's say it takes 2-3 weeks to complete a run in a fairly large database. How many such large databases actually restart that frequently? I'm not sure I know of any. And the only effect of it is you have to start the process over (but read-only for the part you have already done). It's certainly not ideal, but I don't agree it's in any form a "very noticeable problem".

The other point to it is that this keeps the code a lot simpler. That is both good for having a chance at all to finish it and get it into 11 (and it can then be improved upon to add for example incremental support in 12, or something like that). And of course, simple code means less overhead in the form of maintenance and effects on other parts of the system down the road. 

--

Re: Online enabling of checksums

От
Magnus Hagander
Дата:
On Sat, Feb 24, 2018 at 4:29 AM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
Hi,

I see the patch also does throttling by calling vacuum_delay_point().
Being able to throttle the checksum workers not to affect user activity
definitely seems like a useful feature, no complaints here.

But perhaps binding it to vacuum_cost_limit/vacuum_cost_delay is not the
best idea? I mean, enabling checksums seems rather unrelated to vacuum,
so it seems a bit strange to configure it by vacuum_* options.

Also, how am I supposed to set the cost limit? Perhaps I'm missing
something, but the vacuum_delay_point call happens in the bgworker, so
setting the cost limit before running pg_enable_data_checksums() will
get there, right? I really don't want to be setting it in the config
file, because then it will suddenly affect all user VACUUM commands.

And if this patch gets improved to use multiple parallel workers, we'll
need a global limit (something like we have for autovacuum workers).

In any case, I suggest mentioning this in the docs.


Ah yes. I actually have it on my TODO to work on that, but I forgot to put that in the email I sent out. Apologies for that, and thanks for pointing it out!

Right now you have to set the limit in the configuration file. That's of course not the way we want to have it long term (but as long as it is that way it should at least be documented). My plan is to either pick it up from the current session that calls pg_enable_data_checksums(), or to simply pass it down as parameters to the function directly.  I'm thinking the second one (pass a cost_delay and a cost_limit as optional parameters to the function) is the best one because as you say actually overloading it on the user visible GUCs seems a bit ugly. Once there I think the easiest is to just pass it down to the workers through the shared memory segment.

--

Re: Online enabling of checksums

От
Magnus Hagander
Дата:
On Sat, Feb 24, 2018 at 6:14 PM, Michael Banck <michael.banck@credativ.de> wrote:
Hi,

On Wed, Feb 21, 2018 at 09:53:31PM +0100, Magnus Hagander wrote:
> We’ve also included a small commandline tool, bin/pg_verify_checksums,
> that can be run against an offline cluster to validate all checksums.

The way it is coded in the patch will make pg_verify_checksums fail for
heap files with multiple segments, i.e. tables over 1 GB, becuase the
block number is consecutive and you start over from 0:

$ pgbench -i -s 80 -h /tmp
[...]
$ pg_verify_checksums -D data1
pg_verify_checksums: data1/base/12364/16396.1, block 0, invalid checksum
in file 6D61, calculated 6D5F
pg_verify_checksums: data1/base/12364/16396.1, block 1, invalid checksum
in file 7BE5, calculated 7BE7
[...]
Checksum scan completed
Data checksum version: 1
Files scanned:  943
Blocks scanned: 155925
Bad checksums:  76

Yikes. I could've sworn I tested that, but it's pretty obvious I didn't, at least not in this version. Thanks for the note, will fix and post a new version! 

--

Re: Online enabling of checksums

От
Andres Freund
Дата:
On 2018-02-24 22:45:09 +0100, Magnus Hagander wrote:
> Is it really that invisible? Given how much we argue over adding single
> counters to the stats system, I'm not sure it's quite that low.

That's appears to be entirely unrelated. The stats stuff is expensive
because we currently have to essentialy write out the stats for *all*
tables in a database, once a counter is updated. And those counters are
obviously constantly updated. Thus the overhead of adding one column is
essentially multiplied by the number of tables in the system. Whereas
here it's a single column that can be updated on a per-row basis, which
is barely ever going to be written to.

Am I missing something?


> We did consider doing it at a per-table basis as well. But this is also an
> overhead that has to be paid forever, whereas the risk of having to read
> the database files more than once (because it'd only have to read them on
> the second pass, not write anything) is a one-off operation. And for all
> those that have initialized with checksums in the first place don't have to
> pay any overhead at all in the current design.

Why does it have to be paid forever?


> I very strongly doubg it's a "very noticeable operational problem". People
> don't restart their databases very often... Let's say it takes 2-3 weeks to
> complete a run in a fairly large database. How many such large databases
> actually restart that frequently? I'm not sure I know of any. And the only
> effect of it is you have to start the process over (but read-only for the
> part you have already done). It's certainly not ideal, but I don't agree
> it's in any form a "very noticeable problem".

I definitely know large databases that fail over more frequently than
that.

Greetings,

Andres Freund


Re: Online enabling of checksums

От
Magnus Hagander
Дата:
On Sat, Feb 24, 2018 at 10:49 PM, Andres Freund <andres@anarazel.de> wrote:
On 2018-02-24 22:45:09 +0100, Magnus Hagander wrote:
> Is it really that invisible? Given how much we argue over adding single
> counters to the stats system, I'm not sure it's quite that low.

That's appears to be entirely unrelated. The stats stuff is expensive
because we currently have to essentialy write out the stats for *all*
tables in a database, once a counter is updated. And those counters are
obviously constantly updated. Thus the overhead of adding one column is
essentially multiplied by the number of tables in the system. Whereas
here it's a single column that can be updated on a per-row basis, which
is barely ever going to be written to.

Am I missing something?

It's probably at least partially unrelated, you are right. I may have misread our reluctance to add more values there as a general reluctancy to add more values to central columns.  

 
> We did consider doing it at a per-table basis as well. But this is also an
> overhead that has to be paid forever, whereas the risk of having to read
> the database files more than once (because it'd only have to read them on
> the second pass, not write anything) is a one-off operation. And for all
> those that have initialized with checksums in the first place don't have to
> pay any overhead at all in the current design.

Why does it have to be paid forever?

The size of the pg_class row would be there forever. Granted, it's not that big an overhead given that there are already plenty of columns there. But the point being you can never remove that column, and it will be there for users who never even considered running without checksums. It's certainly not a large overhead, but it's also not zero.


> I very strongly doubg it's a "very noticeable operational problem". People
> don't restart their databases very often... Let's say it takes 2-3 weeks to
> complete a run in a fairly large database. How many such large databases
> actually restart that frequently? I'm not sure I know of any. And the only
> effect of it is you have to start the process over (but read-only for the
> part you have already done). It's certainly not ideal, but I don't agree
> it's in any form a "very noticeable problem".

I definitely know large databases that fail over more frequently than
that.

I would argue that they have bigger issues than enabling checksums... By far. 

--

Re: Online enabling of checksums

От
Andres Freund
Дата:
Hi,

On 2018-02-24 22:56:57 +0100, Magnus Hagander wrote:
> On Sat, Feb 24, 2018 at 10:49 PM, Andres Freund <andres@anarazel.de> wrote:
> > > We did consider doing it at a per-table basis as well. But this is also
> > an
> > > overhead that has to be paid forever, whereas the risk of having to read
> > > the database files more than once (because it'd only have to read them on
> > > the second pass, not write anything) is a one-off operation. And for all
> > > those that have initialized with checksums in the first place don't have
> > to
> > > pay any overhead at all in the current design.
> >
> > Why does it have to be paid forever?
> >
> 
> The size of the pg_class row would be there forever. Granted, it's not that
> big an overhead given that there are already plenty of columns there. But
> the point being you can never remove that column, and it will be there for
> users who never even considered running without checksums. It's certainly
> not a large overhead, but it's also not zero.

But it can be removed in the next major version, if we decide it's a
good idea? We're not bound on compatibility for catalog layout.

FWIW' there's some padding space available where we currently could
store two booleans without any space overhead. Also, If we decide that
the boolean columns (which don't matter much in comparison to the rest
of the data, particularly relname), we can compress them into a
bitfield.

I don't think this is a valid reason for not supporting
interrupability. You can make fair arguments about adding incremental
support incrementally and whatnot, but the catalog size argument doesn't
seem part of a valid argument.


> > I very strongly doubg it's a "very noticeable operational problem". People
> > > don't restart their databases very often... Let's say it takes 2-3 weeks
> > to
> > > complete a run in a fairly large database. How many such large databases
> > > actually restart that frequently? I'm not sure I know of any. And the
> > only
> > > effect of it is you have to start the process over (but read-only for the
> > > part you have already done). It's certainly not ideal, but I don't agree
> > > it's in any form a "very noticeable problem".
> >
> > I definitely know large databases that fail over more frequently than
> > that.
> >
> 
> I would argue that they have bigger issues than enabling checksums... By
> far.

In one case it's intentional, to make sure the overall system copes. Not
that insane.

Greetings,

Andres Freund


Re: Online enabling of checksums

От
Greg Stark
Дата:
> The change of the checksum state is WAL logged with a new xlog record. All the buffers written by the background
workerare forcibly enabled full page writes to make sure the checksum is fully updated on the standby even if no actual
contentsof the buffer changed. 

Hm. That doesn't sound necessary to me. If you generate a checkpoint
(or just wait until a new checkpoint has started) then go through and
do a normal xlog record for every page (any xlog record, a noop record
even) then the normal logic for full page writes ought to be
sufficient. If the noop record doesn't need a full page write it's
because someone else has already come in and done one and that one
will set the checksum. In fact if any page has an lsn > the checkpoint
start lsn for the checkpoint after the flag was flipped then you
wouldn't need to issue any record at all.


Re: Online enabling of checksums

От
Magnus Hagander
Дата:


On Sun, Feb 25, 2018 at 1:21 AM, Greg Stark <stark@mit.edu> wrote:
> The change of the checksum state is WAL logged with a new xlog record. All the buffers written by the background worker are forcibly enabled full page writes to make sure the checksum is fully updated on the standby even if no actual contents of the buffer changed.

Hm. That doesn't sound necessary to me. If you generate a checkpoint
(or just wait until a new checkpoint has started) then go through and
do a normal xlog record for every page (any xlog record, a noop record
even) then the normal logic for full page writes ought to be
sufficient. If the noop record doesn't need a full page write it's
because someone else has already come in and done one and that one
will set the checksum. In fact if any page has an lsn > the checkpoint
start lsn for the checkpoint after the flag was flipped then you
wouldn't need to issue any record at all.

What would be the actual benefit though? We'd have to invent a noop WAL record, and just have some other part of the system do the full page write? So why not just send the full page in the first place?

Also if that wasn't clear -- we only do the full page write if there isn't already a checksum on the page and that checksum is correct.

(We do trigger a checkpoint at the end, and wait for it to complete)

--

Re: Online enabling of checksums

От
Magnus Hagander
Дата:


On Sat, Feb 24, 2018 at 11:06 PM, Andres Freund <andres@anarazel.de> wrote:
Hi,

On 2018-02-24 22:56:57 +0100, Magnus Hagander wrote:
> On Sat, Feb 24, 2018 at 10:49 PM, Andres Freund <andres@anarazel.de> wrote:
> > > We did consider doing it at a per-table basis as well. But this is also
> > an
> > > overhead that has to be paid forever, whereas the risk of having to read
> > > the database files more than once (because it'd only have to read them on
> > > the second pass, not write anything) is a one-off operation. And for all
> > > those that have initialized with checksums in the first place don't have
> > to
> > > pay any overhead at all in the current design.
> >
> > Why does it have to be paid forever?
> >
>
> The size of the pg_class row would be there forever. Granted, it's not that
> big an overhead given that there are already plenty of columns there. But
> the point being you can never remove that column, and it will be there for
> users who never even considered running without checksums. It's certainly
> not a large overhead, but it's also not zero.

But it can be removed in the next major version, if we decide it's a
good idea? We're not bound on compatibility for catalog layout.

Sure.

But we can also *add* it in the next major version, if we decide it's a good idea?


FWIW' there's some padding space available where we currently could
store two booleans without any space overhead. Also, If we decide that
the boolean columns (which don't matter much in comparison to the rest
of the data, particularly relname), we can compress them into a
bitfield.

I don't think this is a valid reason for not supporting
interrupability. You can make fair arguments about adding incremental
support incrementally and whatnot, but the catalog size argument doesn't
seem part of a valid argument.

Fair enough.
 


> > I very strongly doubg it's a "very noticeable operational problem". People
> > > don't restart their databases very often... Let's say it takes 2-3 weeks
> > to
> > > complete a run in a fairly large database. How many such large databases
> > > actually restart that frequently? I'm not sure I know of any. And the
> > only
> > > effect of it is you have to start the process over (but read-only for the
> > > part you have already done). It's certainly not ideal, but I don't agree
> > > it's in any form a "very noticeable problem".
> >
> > I definitely know large databases that fail over more frequently than
> > that.
> >
>
> I would argue that they have bigger issues than enabling checksums... By
> far.

In one case it's intentional, to make sure the overall system copes. Not
that insane.

That I can understand. But in a scenario like that, you can also stop doing that for the period of time when you're rebuilding checksums, if re-reading the database over and over again is an issue.

Note, I'm not saying it wouldn't be nice to have the incremental functionality. I'm just saying it's not needed in a first version.

--

Re: Online enabling of checksums

От
Magnus Hagander
Дата:


On Sat, Feb 24, 2018 at 10:48 PM, Magnus Hagander <magnus@hagander.net> wrote:
On Sat, Feb 24, 2018 at 4:29 AM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
Hi,

I see the patch also does throttling by calling vacuum_delay_point().
Being able to throttle the checksum workers not to affect user activity
definitely seems like a useful feature, no complaints here.

But perhaps binding it to vacuum_cost_limit/vacuum_cost_delay is not the
best idea? I mean, enabling checksums seems rather unrelated to vacuum,
so it seems a bit strange to configure it by vacuum_* options.

Also, how am I supposed to set the cost limit? Perhaps I'm missing
something, but the vacuum_delay_point call happens in the bgworker, so
setting the cost limit before running pg_enable_data_checksums() will
get there, right? I really don't want to be setting it in the config
file, because then it will suddenly affect all user VACUUM commands.

And if this patch gets improved to use multiple parallel workers, we'll
need a global limit (something like we have for autovacuum workers).

In any case, I suggest mentioning this in the docs.


Ah yes. I actually have it on my TODO to work on that, but I forgot to put that in the email I sent out. Apologies for that, and thanks for pointing it out!

Right now you have to set the limit in the configuration file. That's of course not the way we want to have it long term (but as long as it is that way it should at least be documented). My plan is to either pick it up from the current session that calls pg_enable_data_checksums(), or to simply pass it down as parameters to the function directly.  I'm thinking the second one (pass a cost_delay and a cost_limit as optional parameters to the function) is the best one because as you say actually overloading it on the user visible GUCs seems a bit ugly. Once there I think the easiest is to just pass it down to the workers through the shared memory segment.


PFA an updated patch that adds this, and also fixes the problem in pg_verify_checksums spotted by Michael Banck. 

--
Вложения

Re: Online enabling of checksums

От
Tomas Vondra
Дата:

On 02/24/2018 03:51 AM, Stephen Frost wrote:
> Greetings,
> 
> * Tomas Vondra (tomas.vondra@2ndquadrant.com) wrote:
>> On 02/24/2018 03:11 AM, Andres Freund wrote:
>>> On 2018-02-24 03:07:28 +0100, Tomas Vondra wrote:
>>>> I agree having to restart the whole operation after a crash is not
>>>> ideal, but I don't see how adding a flag actually solves it. The problem
>>>> is the large databases often store most of the data (>80%) in one or two
>>>> central tables (think fact tables in star schema, etc.). So if you
>>>> crash, it's likely half-way while processing this table, so the whole
>>>> table would still have relchecksums=false and would have to be processed
>>>> from scratch.
>>>
>>> I don't think it's quite as large a problem as you make it out to
>>> be. Even in those cases you'll usually have indexes, toast tables and so
>>> forth.
>>
>> Hmmm, right. I've been focused on tables and kinda forgot that the other
>> objects need to be transformed too ... :-/
> 
> There's also something of a difference between just scanning a table or
> index, where you don't have to do much in the way of actual writes
> because most of the table already has valid checksums, and having to
> actually write out all the changes.
> 
>>>> But perhaps you meant something like "position" instead of just a simple
>>>> true/false flag?
>>>
>>> I think that'd incur a much larger complexity cost.
>>
>> Yep, that was part of the point that I was getting to - that actually
>> addressing the issue would be more expensive than simple flags. But as
>> you pointed out, that was not quite ... well thought through.
> 
> No, but it's also not entirely wrong. Huge tables aren't uncommon.
> 
> That said, I'm not entirely convinced that these new flags would be
> as unnoticed as is being suggested here, but rather than focus on
> either side of that, I'm thinking about what we want to have *next*-
> we know that enabling/disabling checksums is an issue that needs to
> be solved, and this patch is making progress towards that, but the
> next question is what does one do when a page has been detected as
> corrupted? Are there flag fields which would be useful to have at a
> per-relation level to support some kind of corrective action or
> setting that says "don't care about checksums on this table, even
> though the entire database is supposed to have valid checksums, but
> instead do X with failed pages" or similar.
> 

Those questions are definitely worth asking, and I agree our ability to
respond to data corruption (incorrect checksums) needs improvements. But
I don't really see how a single per-relation flag will make any that any
easier?

Perhaps there are other flags/fields that might help, like for example
the maximum number of checksum errors per relation (although I don't
consider that very useful in practice), but that seems rather unrelated
to this patch.

> Beyond dealing with corruption-recovery cases, are there other use
> cases for having a given table not have checksums?
> 

Well, I see checksums are a way to detect data corruption caused by
storage, so if you have tablespaces backed by different storage systems,
you could disable checksums for objects on the storage you 100% trust.
That would limit the overhead of computing checksums.

But then again, this seems entirely unrelated to the patch discussed
here. That would obviously require flags in catalogs, and if the patch
eventually adds flags those would need to be separate.

> Would it make sense to introduce a flag or field which indicates that
> an entire table's pages has some set of attributes, of which
> 'checksums' is just one attribute? Perhaps a page version, which
> potentially allows us to have a way to change page layouts in the
> future?
> 
> I'm happy to be told that we simply don't have enough information at 
> this point to make anything larger than a relchecksums field-level 
> decision, but perhaps these thoughts will spark an idea about how we 
> could define something a bit broader with clear downstream
> usefulness that happens to also cover the "does this relation have
> checksums?" question.
> 

I don't know. But I think we need to stop moving the goalposts further
and further away, otherwise we won't get anything until PostgreSQL 73.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Online enabling of checksums

От
Tomas Vondra
Дата:
On 02/25/2018 03:57 PM, Magnus Hagander wrote:
> 
> ...
> 
>> > I very strongly doubg it's a "very noticeable operational problem". People
>> > > don't restart their databases very often... Let's say it takes 2-3 weeks
>> > to
>> > > complete a run in a fairly large database. How many such large databases
>> > > actually restart that frequently? I'm not sure I know of any. And the
>> > only
>> > > effect of it is you have to start the process over (but read-only for the
>> > > part you have already done). It's certainly not ideal, but I don't agree
>> > > it's in any form a "very noticeable problem".
>> >
>> > I definitely know large databases that fail over more frequently than
>> > that.
>> >
>>
>> I would argue that they have bigger issues than enabling checksums... By
>> far.
> 
> In one case it's intentional, to make sure the overall system copes. Not
> that insane.
> 
> That I can understand. But in a scenario like that, you can also stop
> doing that for the period of time when you're rebuilding checksums, if
> re-reading the database over and over again is an issue.
> 
> Note, I'm not saying it wouldn't be nice to have the incremental
> functionality. I'm just saying it's not needed in a first version.
> 

I agree with this sentiment. I don't think we can make each patch
perfect for everyone - certainly not in v1 :-/

Sure, it would be great to allow resume after a restart, but if that
means we won't get anything in PG 11 then I think it's not a good
service to our users. OTOH if the patch without a resume addresses the
issue for 99% of users, and we can improve it in PG 12, why not? That
seems exactly like the incremental thing we do for many other features.

So +1 to not make the "incremental resume" mandatory. If we can support
it, great! But I think the patch may seem less complex than it actually
is, and figuring out how the resume should work will take some time.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Online enabling of checksums

От
Tomas Vondra
Дата:

On 02/24/2018 10:45 PM, Magnus Hagander wrote:
> On Sat, Feb 24, 2018 at 1:34 AM, Robert Haas <robertmhaas@gmail.com
> <mailto:robertmhaas@gmail.com>> wrote:
> 
>     On Thu, Feb 22, 2018 at 3:28 PM, Magnus Hagander
>     <magnus@hagander.net <mailto:magnus@hagander.net>> wrote:
>     > I would prefer that yes. But having to re-read 9TB is still significantly
>     > better than not being able to turn on checksums at all (state today). And
>     > adding a catalog column for it will carry the cost of the migration
>     > *forever*, both for clusters that never have checksums and those that had it
>     > from the beginning.
>     >
>     > Accepting that the process will start over (but only read, not re-write, the
>     > blocks that have already been processed) in case of a crash does
>     > significantly simplify the process, and reduce the long-term cost of it in
>     > the form of entries in the catalogs. Since this is a on-time operation (or
>     > for many people, a zero-time operation), paying that cost that one time is
>     > probably better than paying a much smaller cost but constantly.
> 
>     That's not totally illogical, but to be honest I'm kinda surprised
>     that you're approaching it that way.  I would have thought that
>     relchecksums and datchecksums columns would have been a sort of
>     automatic design choice for this feature.  The thing to keep in mind
>     is that nobody's going to notice the overhead of adding those columns
>     in practice, but someone will surely notice the pain that comes from
>     having to restart the whole operation.  You're talking about trading
>     an effectively invisible overhead for a very noticeable operational
>     problem.
> 
> 
> Is it really that invisible? Given how much we argue over adding
> single counters to the stats system, I'm not sure it's quite that
> low.

I'm a bit unsure where would the flags be stored - I initially assumed
pg_database/pg_class, but now I see mentions of the stats system.

But I wonder why should this be stored in a catalog at all? The info is
only needed by the bgworker(s), so they could easily flush the current
status to a file every now and then and fsync it. Then after restart, if
you find a valid file, use it to resume from the last OK position. If
not, start from scratch.

FWIW this is pretty much what the stats collector does.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Online enabling of checksums

От
Daniel Gustafsson
Дата:
> On 26 Feb 2018, at 05:48, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
> On 02/24/2018 10:45 PM, Magnus Hagander wrote:
>> Is it really that invisible? Given how much we argue over adding
>> single counters to the stats system, I'm not sure it's quite that
>> low.
>
> I'm a bit unsure where would the flags be stored - I initially assumed
> pg_database/pg_class, but now I see mentions of the stats system.
>
> But I wonder why should this be stored in a catalog at all? The info is
> only needed by the bgworker(s), so they could easily flush the current
> status to a file every now and then and fsync it. Then after restart, if
> you find a valid file, use it to resume from the last OK position. If
> not, start from scratch.

Since this allows checksums to be turned off as well, storing a flag in the
catalog would mean a issuing a fairly wide update in that case to switch it to
False, which might not be ideal.  Not that I expect (hope) turning checksums
off on a cluster will be a common operation, but thats a fairly large side
effect of doing so.

cheers ./daniel

Re: Online enabling of checksums

От
Andrey Borodin
Дата:
Hi, Magus!

> 25 февр. 2018 г., в 21:17, Magnus Hagander <magnus@hagander.net> написал(а):
>
> PFA an updated patch that adds this, and also fixes the problem in pg_verify_checksums spotted by Michael Banck.

I had to ALLOW_CONNECTIONS to template0 to make it work
postgres=# 2018-02-27 21:29:05.993 +05 [57259] ERROR:  Database template0 does not allow connections.
Is it a problem with my installation or some regression in the patch?

2018-02-27 21:40:47.132 +05 [57402] HINT:  either disable or enable checksums by calling the
pg_data_checksums_enable()/disable()functions 
Function names are wrong in this hint: pg_enable_data_checksums()

The code is nice and clear. One minor spot in the comment
>This option can only _be_ enabled when data checksums are enabled.

Is there any way we could provide this functionality for previous versions (9.6,10)? Like implement utility for offline
checksumenabling, without WAL-logging, surely. 


Thanks for the patch!

Best regards, Andrey Borodin.

Re: Online enabling of checksums

От
Daniel Gustafsson
Дата:
> On 28 Feb 2018, at 00:51, Andrey Borodin <x4mmm@yandex-team.ru> wrote:
>
>> 25 февр. 2018 г., в 21:17, Magnus Hagander <magnus@hagander.net> написал(а):
>>
>> PFA an updated patch that adds this, and also fixes the problem in pg_verify_checksums spotted by Michael Banck.
>
> I had to ALLOW_CONNECTIONS to template0 to make it work
> postgres=# 2018-02-27 21:29:05.993 +05 [57259] ERROR:  Database template0 does not allow connections.
> Is it a problem with my installation or some regression in the patch?

This is due to a limitation that apply to bgworkers, in order to add checksums
the bgworker must connect to the database and template0 does not allow
connections by default.  There is a discussion, and patch, for lifting this
restriction but until this makes it in (if it does), the user will have to
allow connections manually.  For reference, the thread for allowing bypassing
allowconn in bgworker is here:

https://www.postgresql.org/message-id/CABUevEwWT9ZmonBMRFF0owneoN3DAPgOVzwHAN0bUkxaqY3eNQ@mail.gmail.com

> 2018-02-27 21:40:47.132 +05 [57402] HINT:  either disable or enable checksums by calling the
pg_data_checksums_enable()/disable()functions 
> Function names are wrong in this hint: pg_enable_data_checksums()

Fixed.  The format of this hint (and errmsg) is actually also incorrect, which
I fixed while in there.

> The code is nice and clear. One minor spot in the comment
>> This option can only _be_ enabled when data checksums are enabled.

Fixed.

> Is there any way we could provide this functionality for previous versions (9.6,10)? Like implement utility for
offlinechecksum enabling, without WAL-logging, surely. 

While outside the scope of the patch in question (since it deals with enabling
checksums online), such a utility should be perfectly possible to write.

Thanks for reviewing!

cheers ./daniel


Вложения

Re: Online enabling of checksums

От
Robert Haas
Дата:
On Sun, Feb 25, 2018 at 9:54 AM, Magnus Hagander <magnus@hagander.net> wrote:
> Also if that wasn't clear -- we only do the full page write if there isn't
> already a checksum on the page and that checksum is correct.

Hmm.

Suppose that on the master there is a checksum on the page and that
checksum is correct, but on the standby the page contents differ in
some way that we don't always WAL-log, like as to hint bits, and there
the checksum is incorrect.  Then you'll enable checksums when the
standby still has some pages without valid checksums, and disaster
will ensue.

I think this could be hard to prevent if checksums are turned on and
off multiple times.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Online enabling of checksums

От
Alvaro Herrera
Дата:
I noticed that pg_verify_checksum takes an "-o oid" argument to only
check the relation with that OID; but that's wrong, because the number
is a relfilenode, not an OID (since it's compared to the on-disk file
name).  I would suggest changing everything to clarify that it's a
pg_class.relfilenode value, otherwise it's going to be very confusing.
Maybe use "-f filenode" if -f is available?

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Online enabling of checksums

От
Tomas Vondra
Дата:
On 02/28/2018 08:42 PM, Alvaro Herrera wrote:
> I noticed that pg_verify_checksum takes an "-o oid" argument to only
> check the relation with that OID; but that's wrong, because the number
> is a relfilenode, not an OID (since it's compared to the on-disk file
> name).  I would suggest changing everything to clarify that it's a
> pg_class.relfilenode value, otherwise it's going to be very confusing.
> Maybe use "-f filenode" if -f is available?
> 

I'd argue this is merely a mistake in the --help text. Firstly,
relfilenodes are OIDs too, so I don't think "-o" is incorrect. Secondly,
the SGML docs actually say:

  <varlistentry>
   <term><option>-o <replaceable>relfilenode</replaceable></option></term>
   <listitem>
    <para>
     Only validate checksums in the relation with specified relfilenode.
    </para>
   </listitem>
  </varlistentry>


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Online enabling of checksums

От
Daniel Gustafsson
Дата:
> On 01 Mar 2018, at 05:07, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>
> On 02/28/2018 08:42 PM, Alvaro Herrera wrote:
>> I noticed that pg_verify_checksum takes an "-o oid" argument to only
>> check the relation with that OID; but that's wrong, because the number
>> is a relfilenode, not an OID (since it's compared to the on-disk file
>> name).  I would suggest changing everything to clarify that it's a
>> pg_class.relfilenode value, otherwise it's going to be very confusing.
>> Maybe use "-f filenode" if -f is available?
>
> I'd argue this is merely a mistake in the --help text.

Agreed, the --help text isn’t really clear in this case and should be updated
to say something along the lines of:

printf(_("  -o relfilenode  check only relation with specified relfilenode\n"));

cheers ./daniel

Re: Online enabling of checksums

От
Craig Ringer
Дата:
On 1 March 2018 at 03:42, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
I noticed that pg_verify_checksum takes an "-o oid" argument to only
check the relation with that OID; but that's wrong, because the number
is a relfilenode, not an OID (since it's compared to the on-disk file
name).  I would suggest changing everything to clarify that it's a
pg_class.relfilenode value, otherwise it's going to be very confusing.
Maybe use "-f filenode" if -f is available?


I see this mistake/misunderstanding enough that I'd quite like to change how we generate relfilenode IDs, making them totally independent of the oid space.

Unsure how practical it is, but it'd be so nice to get rid of that trap.


--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Online enabling of checksums

От
Andrey Borodin
Дата:

> 28 февр. 2018 г., в 22:06, Robert Haas <robertmhaas@gmail.com> написал(а):
>
> On Sun, Feb 25, 2018 at 9:54 AM, Magnus Hagander <magnus@hagander.net> wrote:
>> Also if that wasn't clear -- we only do the full page write if there isn't
>> already a checksum on the page and that checksum is correct.
>
> Hmm.
>
> Suppose that on the master there is a checksum on the page and that
> checksum is correct, but on the standby the page contents differ in
> some way that we don't always WAL-log, like as to hint bits, and there
> the checksum is incorrect.  Then you'll enable checksums when the
> standby still has some pages without valid checksums, and disaster
> will ensue.
>
> I think this could be hard to prevent if checksums are turned on and
> off multiple times.

This seems 100% valid concern. If pages can be binary different (on master and standby), we have to log act of
checksumminga page. 
And WAL replay would have to verify that his checksum is OK.
What is unclear to me is what standby should do if he sees incorrect checksum? Change his page? Request page from
master?Shutdown? 

Or should we just WAL-log page whenever it is checksummed by worker? Even if the checksum was correct?

Best regards, Andrey Borodin.

Re: Online enabling of checksums

От
Andrey Borodin
Дата:

> 28 февр. 2018 г., в 6:22, Daniel Gustafsson <daniel@yesql.se> написал(а):
>
>> Is there any way we could provide this functionality for previous versions (9.6,10)? Like implement utility for
offlinechecksum enabling, without WAL-logging, surely. 
>
> While outside the scope of the patch in question (since it deals with enabling
> checksums online), such a utility should be perfectly possible to write.

I've tried to rebase this patch to 10 and, despite minor rebase issues (oids, bgw_type, changes to specscanner), patch
worksfine. 
Do we provide backporting for such features?

Best regards, Andrey Borodin.

Re: Online enabling of checksums

От
Michael Paquier
Дата:
On Thu, Mar 01, 2018 at 12:56:35PM +0500, Andrey Borodin wrote:
> I've tried to rebase this patch to 10 and, despite minor rebase issues
> (oids, bgw_type, changes to specscanner), patch works fine.   Do we
> provide backporting for such features?

New features are not backported in upstream.  Project policy is only to
address bugs to keep the branches already released stable a maximum.
--
Michael

Вложения

Re: Online enabling of checksums

От
Andres Freund
Дата:
On 2018-03-01 12:56:35 +0500, Andrey Borodin wrote:
> I've tried to rebase this patch to 10 and, despite minor rebase issues (oids, bgw_type, changes to specscanner),
patchworks fine.
 
> Do we provide backporting for such features?

Definitely not. With very rare exceptions (OS compatibility and the
like), features aren't backported.

- Andres


Re: Online enabling of checksums

От
Magnus Hagander
Дата:


On Thu, Mar 1, 2018 at 9:04 AM, Andres Freund <andres@anarazel.de> wrote:
On 2018-03-01 12:56:35 +0500, Andrey Borodin wrote:
> I've tried to rebase this patch to 10 and, despite minor rebase issues (oids, bgw_type, changes to specscanner), patch works fine.
> Do we provide backporting for such features?

Definitely not. With very rare exceptions (OS compatibility and the
like), features aren't backported.

Yeah. And definitely not something that both changes the format of pg_control (by adding new possible values to the checksum field) *and* adds a new WAL record type... 

--

Re: Online enabling of checksums

От
Andres Freund
Дата:
On 2018-03-01 16:18:48 +0100, Magnus Hagander wrote:
> On Thu, Mar 1, 2018 at 9:04 AM, Andres Freund <andres@anarazel.de> wrote:
> 
> > On 2018-03-01 12:56:35 +0500, Andrey Borodin wrote:
> > > I've tried to rebase this patch to 10 and, despite minor rebase issues
> > (oids, bgw_type, changes to specscanner), patch works fine.
> > > Do we provide backporting for such features?
> >
> > Definitely not. With very rare exceptions (OS compatibility and the
> > like), features aren't backported.
> >
> 
> Yeah. And definitely not something that both changes the format of
> pg_control (by adding new possible values to the checksum field) *and* adds
> a new WAL record type...

And even more so, I'm not even sure it makes sense to try to get this
into v11. This is a medium-large complicated feature, submitted to the
last CF for v11.  That's pretty late.  Now, Magnus is a committer, but
nevertheless...

Greetings,

Andres Freund


Re: Online enabling of checksums

От
Magnus Hagander
Дата:


On Fri, Mar 2, 2018 at 8:44 AM, Andres Freund <andres@anarazel.de> wrote:
On 2018-03-01 16:18:48 +0100, Magnus Hagander wrote:
> On Thu, Mar 1, 2018 at 9:04 AM, Andres Freund <andres@anarazel.de> wrote:
>
> > On 2018-03-01 12:56:35 +0500, Andrey Borodin wrote:
> > > I've tried to rebase this patch to 10 and, despite minor rebase issues
> > (oids, bgw_type, changes to specscanner), patch works fine.
> > > Do we provide backporting for such features?
> >
> > Definitely not. With very rare exceptions (OS compatibility and the
> > like), features aren't backported.
> >
>
> Yeah. And definitely not something that both changes the format of
> pg_control (by adding new possible values to the checksum field) *and* adds
> a new WAL record type...

And even more so, I'm not even sure it makes sense to try to get this
into v11. This is a medium-large complicated feature, submitted to the
last CF for v11.  That's pretty late.  Now, Magnus is a committer, but
nevertheless...

See, this is why I'm trying my hardest to avoid scope-creep in it at least :P  

--

Re: Online enabling of checksums

От
Magnus Hagander
Дата:


On Wed, Feb 28, 2018 at 10:07 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
On 02/28/2018 08:42 PM, Alvaro Herrera wrote:
> I noticed that pg_verify_checksum takes an "-o oid" argument to only
> check the relation with that OID; but that's wrong, because the number
> is a relfilenode, not an OID (since it's compared to the on-disk file
> name).  I would suggest changing everything to clarify that it's a
> pg_class.relfilenode value, otherwise it's going to be very confusing.
> Maybe use "-f filenode" if -f is available?
>

I'd argue this is merely a mistake in the --help text. Firstly,
relfilenodes are OIDs too, so I don't think "-o" is incorrect. Secondly,
the SGML docs actually say:

  <varlistentry>
   <term><option>-o <replaceable>relfilenode</replaceable></option></term>
   <listitem>
    <para>
     Only validate checksums in the relation with specified relfilenode.
    </para>
   </listitem>
  </varlistentry>


Yeah, that one is my fault. It used to say oid all over but I noticed and fixed it. Except I clearly missed the --help. 

--

Re: Online enabling of checksums

От
Magnus Hagander
Дата:


On Wed, Feb 28, 2018 at 6:06 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Sun, Feb 25, 2018 at 9:54 AM, Magnus Hagander <magnus@hagander.net> wrote:
> Also if that wasn't clear -- we only do the full page write if there isn't
> already a checksum on the page and that checksum is correct.

Hmm.

Suppose that on the master there is a checksum on the page and that
checksum is correct, but on the standby the page contents differ in
some way that we don't always WAL-log, like as to hint bits, and there
the checksum is incorrect.  Then you'll enable checksums when the
standby still has some pages without valid checksums, and disaster
will ensue.

I think this could be hard to prevent if checksums are turned on and
off multiple times.


Do we ever make hintbit changes on the standby for example? If so, it would definitely cause problems. I didn't realize we did, actually...

I guess we could get there even if we don't by:
* All checksums are correct
* Checkums are disabled (which replicates)
* Non-WAL logged change on the master, which updates checksum but does *not* replicate
* Checksums re-enabled
* Worker sees the checksum as correct, and thus does not force a full page write.
* Worker completes and flips checksums on which replicates. At this point, if the replica reads the page, boom.

I guess we have to remove that optimisation. It's definitely a bummer, but I don't think it's an absolute dealbreaker.

We could say that we keep the optimisation if wal_level=minimal for example, because then we know there is no replica. But I doubt that's worth it?

-- 

Re: Online enabling of checksums

От
Alvaro Herrera
Дата:
Magnus Hagander wrote:
> On Wed, Feb 28, 2018 at 10:07 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com

> > I'd argue this is merely a mistake in the --help text. Firstly,
> > relfilenodes are OIDs too, so I don't think "-o" is incorrect. Secondly,
> > the SGML docs actually say:

> Yeah, that one is my fault. It used to say oid all over but I noticed and
> fixed it. Except I clearly missed the --help.

Obviously option names are completely arbitrary -- you could say
"-P relfilenode" and it'd still be 'correct', since it works as
documented.  But we try to make these options mnemotechnic when we can,
and I don't see any relation between "-o" and "relfilenode", so I
suggest picking some other letter.  There's a whole alphabet out there.

Either "-r" or "-f" works better for me than "-o".

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Online enabling of checksums

От
Magnus Hagander
Дата:


On Fri, Mar 2, 2018 at 3:17 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
Magnus Hagander wrote:
> On Wed, Feb 28, 2018 at 10:07 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com

> > I'd argue this is merely a mistake in the --help text. Firstly,
> > relfilenodes are OIDs too, so I don't think "-o" is incorrect. Secondly,
> > the SGML docs actually say:

> Yeah, that one is my fault. It used to say oid all over but I noticed and
> fixed it. Except I clearly missed the --help.

Obviously option names are completely arbitrary -- you could say
"-P relfilenode" and it'd still be 'correct', since it works as
documented.  But we try to make these options mnemotechnic when we can,
and I don't see any relation between "-o" and "relfilenode", so I
suggest picking some other letter.  There's a whole alphabet out there.

Either "-r" or "-f" works better for me than "-o".

I have no problem with changing it to -r. -f seems a bit wrong to me, as it might read as a file. And in the future we might want to implement the ability to take full filename (with path), in which case it would make sense to use -f for that. 

--

Re: Online enabling of checksums

От
Tomas Vondra
Дата:

On 03/02/2018 03:22 PM, Magnus Hagander wrote:
> 
> 
> On Fri, Mar 2, 2018 at 3:17 PM, Alvaro Herrera <alvherre@alvh.no-ip.org
> <mailto:alvherre@alvh.no-ip.org>> wrote:
> 
>     Magnus Hagander wrote:
>     > On Wed, Feb 28, 2018 at 10:07 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com
<mailto:tomas.vondra@2ndquadrant.com>
> 
>     > > I'd argue this is merely a mistake in the --help text. Firstly,
>     > > relfilenodes are OIDs too, so I don't think "-o" is incorrect. Secondly,
>     > > the SGML docs actually say:
> 
>     > Yeah, that one is my fault. It used to say oid all over but I noticed and
>     > fixed it. Except I clearly missed the --help.
> 
>     Obviously option names are completely arbitrary -- you could say
>     "-P relfilenode" and it'd still be 'correct', since it works as
>     documented.  But we try to make these options mnemotechnic when we can,
>     and I don't see any relation between "-o" and "relfilenode", so I
>     suggest picking some other letter.  There's a whole alphabet out there.
> 
>     Either "-r" or "-f" works better for me than "-o".
> 
> 
> I have no problem with changing it to -r. -f seems a bit wrong to me, as
> it might read as a file. And in the future we might want to implement
> the ability to take full filename (with path), in which case it would
> make sense to use -f for that. 
> 

+1 to -r

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Online enabling of checksums

От
Robert Haas
Дата:
On Fri, Mar 2, 2018 at 8:35 AM, Magnus Hagander <magnus@hagander.net> wrote:
> Do we ever make hintbit changes on the standby for example? If so, it would
> definitely cause problems. I didn't realize we did, actually...

We do not.

> I guess we could get there even if we don't by:
> * All checksums are correct
> * Checkums are disabled (which replicates)
> * Non-WAL logged change on the master, which updates checksum but does *not*
> replicate
> * Checksums re-enabled
> * Worker sees the checksum as correct, and thus does not force a full page
> write.
> * Worker completes and flips checksums on which replicates. At this point,
> if the replica reads the page, boom.

Exactly.

> I guess we have to remove that optimisation. It's definitely a bummer, but I
> don't think it's an absolute dealbreaker.

I don't disagree.

> We could say that we keep the optimisation if wal_level=minimal for example,
> because then we know there is no replica. But I doubt that's worth it?

I don't have a strong feeling about this.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Online enabling of checksums

От
Robert Haas
Дата:
On Fri, Mar 2, 2018 at 2:44 AM, Andres Freund <andres@anarazel.de> wrote:
> And even more so, I'm not even sure it makes sense to try to get this
> into v11. This is a medium-large complicated feature, submitted to the
> last CF for v11.  That's pretty late.  Now, Magnus is a committer, but
> nevertheless...

Yeah, I would also favor bumping this one out to a later release.  I
think there is a significant risk either that the design is flawed --
and as evidence, I offer that I found a flaw in it which I noticed
only because of a passing remark in an email, not because I opened the
patch -- or that the design boxes us into a corner such that it will
be hard to improve this later.  I think that there are is a good
chance that there are other serious problems with this patch and to be
honest I don't really want to go try to find them right this minute; I
want to work on other patches that were submitted earlier and have
been waiting for a long time.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Online enabling of checksums

От
Tomas Vondra
Дата:

On 03/02/2018 02:35 PM, Magnus Hagander wrote:
> 
> 
> On Wed, Feb 28, 2018 at 6:06 PM, Robert Haas <robertmhaas@gmail.com
> <mailto:robertmhaas@gmail.com>> wrote:
> 
>     On Sun, Feb 25, 2018 at 9:54 AM, Magnus Hagander
>     <magnus@hagander.net <mailto:magnus@hagander.net>> wrote:
>     > Also if that wasn't clear -- we only do the full page write if there isn't
>     > already a checksum on the page and that checksum is correct.
> 
>     Hmm.
> 
>     Suppose that on the master there is a checksum on the page and that
>     checksum is correct, but on the standby the page contents differ in
>     some way that we don't always WAL-log, like as to hint bits, and there
>     the checksum is incorrect.  Then you'll enable checksums when the
>     standby still has some pages without valid checksums, and disaster
>     will ensue.
> 
>     I think this could be hard to prevent if checksums are turned on and
>     off multiple times.
> 
> 
> Do we ever make hintbit changes on the standby for example? If so, it
> would definitely cause problems. I didn't realize we did, actually...
> 

I don't think we do. SetHintBits does TransactionIdIsValid(xid) and
AFAIK that can't be true on a standby.

> I guess we could get there even if we don't by:
> * All checksums are correct
> * Checkums are disabled (which replicates)
> * Non-WAL logged change on the master, which updates checksum but does
> *not* replicate
> * Checksums re-enabled
> * Worker sees the checksum as correct, and thus does not force a full
> page write.
> * Worker completes and flips checksums on which replicates. At this
> point, if the replica reads the page, boom.
> 

Maybe.

My understanding of Robert's example is that you can start with an
instance that has wal_log_hints=off, and so pages on master/standby may
not be 100% identical. Then we do the online checksum thing, and the
standby may get pages with incorrect checksums.

> I guess we have to remove that optimisation. It's definitely a
> bummer, but I don't think it's an absolute dealbreaker.
> 

I agree it's not a deal-breaker. Or at least I don't see why it should
be - any other maintenance activity on the database (freezing etc.) will
also generate full-page writes.

The good thing is the throttling also limits the amount of WAL, so it's
possible to prevent generating too many checkpoints etc.

I suggest we simply:

1) set the checksums to in-progress
2) wait for a checkpoint
3) use the regular logic for full-pages (i.e. first change after
checkpoint does a FPW)

BTW speaking of checkpoints, I see ChecksumHelperLauncherMain does

    RequestCheckpoint(CHECKPOINT_FORCE | CHECKPOINT_WAIT | \
                      CHECKPOINT_IMMEDIATE);

I'm rather unhappy about that - immediate checkpoints have massive
impact on production systems, so we try not doing them (That's one of
the reasons why CREATE DATABASE is somewhat painful). It usually
requires a bit of thinking about when to do such commands. But in this
case it's unpredictable when exactly the checksumming completes, so it
may easily be in the middle of peak activity.

Why not to simply wait for regular spread checkpoint, the way
pg_basebackup does it?

> We could say that we keep the optimisation if wal_level=minimal for 
> example, because then we know there is no replica. But I doubt
> that's worth it?
> 

If it doesn't require a lot of code, why not? But I don't really see
much point in doing that.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Online enabling of checksums

От
Magnus Hagander
Дата:
On Fri, Mar 2, 2018 at 5:50 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:


On 03/02/2018 02:35 PM, Magnus Hagander wrote:
>
>
> On Wed, Feb 28, 2018 at 6:06 PM, Robert Haas <robertmhaas@gmail.com
> <mailto:robertmhaas@gmail.com>> wrote:
>
>     On Sun, Feb 25, 2018 at 9:54 AM, Magnus Hagander
>     <magnus@hagander.net <mailto:magnus@hagander.net>> wrote:
>     > Also if that wasn't clear -- we only do the full page write if there isn't
>     > already a checksum on the page and that checksum is correct.
>
>     Hmm.
>
>     Suppose that on the master there is a checksum on the page and that
>     checksum is correct, but on the standby the page contents differ in
>     some way that we don't always WAL-log, like as to hint bits, and there
>     the checksum is incorrect.  Then you'll enable checksums when the
>     standby still has some pages without valid checksums, and disaster
>     will ensue.
>
>     I think this could be hard to prevent if checksums are turned on and
>     off multiple times.
>
>
> Do we ever make hintbit changes on the standby for example? If so, it
> would definitely cause problems. I didn't realize we did, actually...
>

I don't think we do. SetHintBits does TransactionIdIsValid(xid) and
AFAIK that can't be true on a standby.

> I guess we could get there even if we don't by:
> * All checksums are correct
> * Checkums are disabled (which replicates)
> * Non-WAL logged change on the master, which updates checksum but does
> *not* replicate
> * Checksums re-enabled
> * Worker sees the checksum as correct, and thus does not force a full
> page write.
> * Worker completes and flips checksums on which replicates. At this
> point, if the replica reads the page, boom.
>

Maybe.

My understanding of Robert's example is that you can start with an
instance that has wal_log_hints=off, and so pages on master/standby may
not be 100% identical. Then we do the online checksum thing, and the
standby may get pages with incorrect checksums.

No, in that case the master will issue full page writes for *all* pages, since they dind't hvae a checksum. The current patch only avoids doing that if the checksum on the master is correct, which it isn't when you start from checksums=off.  So this particular problem only shows up if you iterate between off/on/off multiple times.


 
> I guess we have to remove that optimisation. It's definitely a
> bummer, but I don't think it's an absolute dealbreaker.
>

I agree it's not a deal-breaker. Or at least I don't see why it should
be - any other maintenance activity on the database (freezing etc.) will
also generate full-page writes.

Yes.

 
The good thing is the throttling also limits the amount of WAL, so it's
possible to prevent generating too many checkpoints etc.

I suggest we simply:

1) set the checksums to in-progress
2) wait for a checkpoint
3) use the regular logic for full-pages (i.e. first change after
checkpoint does a FPW)

This is very close to what it does now, except it does not wait for a checkpoint in #2. Why does it need that?


BTW speaking of checkpoints, I see ChecksumHelperLauncherMain does

    RequestCheckpoint(CHECKPOINT_FORCE | CHECKPOINT_WAIT | \
                      CHECKPOINT_IMMEDIATE);

I'm rather unhappy about that - immediate checkpoints have massive
impact on production systems, so we try not doing them (That's one of
the reasons why CREATE DATABASE is somewhat painful). It usually
requires a bit of thinking about when to do such commands. But in this
case it's unpredictable when exactly the checksumming completes, so it
may easily be in the middle of peak activity.

Why not to simply wait for regular spread checkpoint, the way
pg_basebackup does it?

Actually, that was my original idea. I changed it for testing, and shuld go change it back.



> We could say that we keep the optimisation if wal_level=minimal for
> example, because then we know there is no replica. But I doubt
> that's worth it?
>

If it doesn't require a lot of code, why not? But I don't really see
much point in doing that.


Yeah, I doubt there are a lot of people using "minimal" these days, not since we changed the default.

//Magnus 

Re: Online enabling of checksums

От
Tomas Vondra
Дата:
On 03/02/2018 11:01 PM, Magnus Hagander wrote:
> On Fri, Mar 2, 2018 at 5:50 PM, Tomas Vondra
> <tomas.vondra@2ndquadrant.com <mailto:tomas.vondra@2ndquadrant.com>> wrote:
> 
> 
> 
>     On 03/02/2018 02:35 PM, Magnus Hagander wrote:
>     >
>     >
>     > On Wed, Feb 28, 2018 at 6:06 PM, Robert Haas <robertmhaas@gmail.com <mailto:robertmhaas@gmail.com>
>     > <mailto:robertmhaas@gmail.com <mailto:robertmhaas@gmail.com>>> wrote:
>     >
>     >     On Sun, Feb 25, 2018 at 9:54 AM, Magnus Hagander
>     >     <magnus@hagander.net <mailto:magnus@hagander.net>
>     <mailto:magnus@hagander.net <mailto:magnus@hagander.net>>> wrote:
>     >     > Also if that wasn't clear -- we only do the full page write if there isn't
>     >     > already a checksum on the page and that checksum is correct.
>     >
>     >     Hmm.
>     >
>     >     Suppose that on the master there is a checksum on the page and that
>     >     checksum is correct, but on the standby the page contents differ in
>     >     some way that we don't always WAL-log, like as to hint bits, and there
>     >     the checksum is incorrect.  Then you'll enable checksums when the
>     >     standby still has some pages without valid checksums, and disaster
>     >     will ensue.
>     >
>     >     I think this could be hard to prevent if checksums are turned on and
>     >     off multiple times.
>     >
>     >
>     > Do we ever make hintbit changes on the standby for example? If so, it
>     > would definitely cause problems. I didn't realize we did, actually...
>     >
> 
>     I don't think we do. SetHintBits does TransactionIdIsValid(xid) and
>     AFAIK that can't be true on a standby.
> 
>     > I guess we could get there even if we don't by:
>     > * All checksums are correct
>     > * Checkums are disabled (which replicates)
>     > * Non-WAL logged change on the master, which updates checksum but does
>     > *not* replicate
>     > * Checksums re-enabled
>     > * Worker sees the checksum as correct, and thus does not force a full
>     > page write.
>     > * Worker completes and flips checksums on which replicates. At this
>     > point, if the replica reads the page, boom.
>     >
> 
>     Maybe.
> 
>     My understanding of Robert's example is that you can start with an
>     instance that has wal_log_hints=off, and so pages on master/standby may
>     not be 100% identical. Then we do the online checksum thing, and the
>     standby may get pages with incorrect checksums.
> 
> 
> No, in that case the master will issue full page writes for *all* pages,
> since they dind't hvae a checksum. The current patch only avoids doing
> that if the checksum on the master is correct, which it isn't when you
> start from checksums=off.  So this particular problem only shows up if
> you iterate between off/on/off multiple times.
> 

Hmmm, OK. So we need to have a valid checksum on a page, disable
checksums, set some hint bits on the page (which won't be WAL-logged),
enable checksums again and still get a valid checksum even with the new
hint bits? That's possible, albeit unlikely.

> 
>     > I guess we have to remove that optimisation. It's definitely a
>     > bummer, but I don't think it's an absolute dealbreaker.
>     >
> 
>     I agree it's not a deal-breaker. Or at least I don't see why it should
>     be - any other maintenance activity on the database (freezing etc.) will
>     also generate full-page writes.
> 
> 
> Yes.
> 
>  
> 
>     The good thing is the throttling also limits the amount of WAL, so it's
>     possible to prevent generating too many checkpoints etc.
> 
>     I suggest we simply:
> 
>     1) set the checksums to in-progress
>     2) wait for a checkpoint
>     3) use the regular logic for full-pages (i.e. first change after
>     checkpoint does a FPW)
> 
> 
> This is very close to what it does now, except it does not wait for a
> checkpoint in #2. Why does it need that?
> 

To guarantee that the page has a FPW with all the hint bits, before we
start messing with the checksums (or that setting the checksum itself
triggers a FPW).

> 
>     BTW speaking of checkpoints, I see ChecksumHelperLauncherMain does
> 
>         RequestCheckpoint(CHECKPOINT_FORCE | CHECKPOINT_WAIT | \
>                           CHECKPOINT_IMMEDIATE);
> 
>     I'm rather unhappy about that - immediate checkpoints have massive
>     impact on production systems, so we try not doing them (That's one of
>     the reasons why CREATE DATABASE is somewhat painful). It usually
>     requires a bit of thinking about when to do such commands. But in this
>     case it's unpredictable when exactly the checksumming completes, so it
>     may easily be in the middle of peak activity.
> 
>     Why not to simply wait for regular spread checkpoint, the way
>     pg_basebackup does it?
> 
> 
> Actually, that was my original idea. I changed it for testing, and shuld
> go change it back.
> 

OK

> 
> 
>     > We could say that we keep the optimisation if wal_level=minimal for
>     > example, because then we know there is no replica. But I doubt
>     > that's worth it?
>     >
> 
>     If it doesn't require a lot of code, why not? But I don't really see
>     much point in doing that.
> 
> 
> Yeah, I doubt there are a lot of people using "minimal" these days, not
> since we changed the default.
> 

Yeah. Although as I said, it depends on how much code would be needed to
enable that optimization (I guess not much). If someone is running with
wal_level=minimal intentionally, why not to help them.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Online enabling of checksums

От
Robert Haas
Дата:
On Fri, Mar 2, 2018 at 6:26 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
> Hmmm, OK. So we need to have a valid checksum on a page, disable
> checksums, set some hint bits on the page (which won't be WAL-logged),
> enable checksums again and still get a valid checksum even with the new
> hint bits? That's possible, albeit unlikely.

No, the problem is if - as is much more likely - the checksum is not
still valid.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Online enabling of checksums

От
Robert Haas
Дата:
On Sat, Mar 3, 2018 at 7:32 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Mar 2, 2018 at 6:26 PM, Tomas Vondra
> <tomas.vondra@2ndquadrant.com> wrote:
>> Hmmm, OK. So we need to have a valid checksum on a page, disable
>> checksums, set some hint bits on the page (which won't be WAL-logged),
>> enable checksums again and still get a valid checksum even with the new
>> hint bits? That's possible, albeit unlikely.
>
> No, the problem is if - as is much more likely - the checksum is not
> still valid.

Hmm, on second thought ... maybe I didn't think this through carefully
enough.  If the checksum matches on the master by chance, and the page
is the same on the standby, then we're fine, right?  It's a weird
accident, but nothing is actually broken.  The failure scenario is
where the standby has a version of the page with a bad checksum, but
the master has a good checksum.  So for example: checksums disabled,
master modifies the page (which is replicated), master sets some hint
bits (coincidentally making the checksum match), now we try to turn
checksums on and don't re-replicate the page because the checksum
already looks correct.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Online enabling of checksums

От
Tomas Vondra
Дата:
On 03/03/2018 01:38 PM, Robert Haas wrote:
> On Sat, Mar 3, 2018 at 7:32 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Fri, Mar 2, 2018 at 6:26 PM, Tomas Vondra
>> <tomas.vondra@2ndquadrant.com> wrote:
>>> Hmmm, OK. So we need to have a valid checksum on a page, disable 
>>> checksums, set some hint bits on the page (which won't be
>>> WAL-logged), enable checksums again and still get a valid
>>> checksum even with the new hint bits? That's possible, albeit
>>> unlikely.
>>
>> No, the problem is if - as is much more likely - the checksum is
>> not still valid.
> 
> Hmm, on second thought ... maybe I didn't think this through
> carefully enough. If the checksum matches on the master by chance,
> and the page is the same on the standby, then we're fine, right? It's
> a weird accident, but nothing is actually broken. The failure
> scenario is where the standby has a version of the page with a bad
> checksum, but the master has a good checksum. So for example:
> checksums disabled, master modifies the page (which is replicated),
> master sets some hint bits (coincidentally making the checksum
> match), now we try to turn checksums on and don't re-replicate the
> page because the checksum already looks correct.
> 

Yeah. Doesn't that pretty much mean we can't skip any pages that have
correct checksum, because we can't rely on standby having the same page
data? That is, this block in ProcessSingleRelationFork:

  /*
   * If checksum was not set or was invalid, mark the buffer as dirty
   * and force a full page write. If the checksum was already valid, we
   * can leave it since we know that any other process writing the
   * buffer will update the checksum.
   */
  if (checksum != pagehdr->pd_checksum)
  {
      START_CRIT_SECTION();
      MarkBufferDirty(buf);
      log_newpage_buffer(buf, false);
      END_CRIT_SECTION();
  }

That would mean this optimization - only doing the write when the
checksum does not match - is broken.

If that's the case, it probably makes restarts/resume more expensive,
because this optimization was why after restart the already processed
data was only read (and the checksums verified) but not written.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Online enabling of checksums

От
Magnus Hagander
Дата:


On Sat, Mar 3, 2018 at 5:06 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
On 03/03/2018 01:38 PM, Robert Haas wrote:
> On Sat, Mar 3, 2018 at 7:32 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Fri, Mar 2, 2018 at 6:26 PM, Tomas Vondra
>> <tomas.vondra@2ndquadrant.com> wrote:
>>> Hmmm, OK. So we need to have a valid checksum on a page, disable
>>> checksums, set some hint bits on the page (which won't be
>>> WAL-logged), enable checksums again and still get a valid
>>> checksum even with the new hint bits? That's possible, albeit
>>> unlikely.
>>
>> No, the problem is if - as is much more likely - the checksum is
>> not still valid.
>
> Hmm, on second thought ... maybe I didn't think this through
> carefully enough. If the checksum matches on the master by chance,
> and the page is the same on the standby, then we're fine, right? It's
> a weird accident, but nothing is actually broken. The failure
> scenario is where the standby has a version of the page with a bad
> checksum, but the master has a good checksum. So for example:
> checksums disabled, master modifies the page (which is replicated),
> master sets some hint bits (coincidentally making the checksum
> match), now we try to turn checksums on and don't re-replicate the
> page because the checksum already looks correct.
>

Yeah. Doesn't that pretty much mean we can't skip any pages that have
correct checksum, because we can't rely on standby having the same page
data? That is, this block in ProcessSingleRelationFork:

  /*
   * If checksum was not set or was invalid, mark the buffer as dirty
   * and force a full page write. If the checksum was already valid, we
   * can leave it since we know that any other process writing the
   * buffer will update the checksum.
   */
  if (checksum != pagehdr->pd_checksum)
  {
      START_CRIT_SECTION();
      MarkBufferDirty(buf);
      log_newpage_buffer(buf, false);
      END_CRIT_SECTION();
  }

That would mean this optimization - only doing the write when the
checksum does not match - is broken.



If that's the case, it probably makes restarts/resume more expensive,
because this optimization was why after restart the already processed
data was only read (and the checksums verified) but not written.


Yes, it definitely does. It's not a dealbreaker, but it's certainly a bit painful not to be able to resume as cheap.

 
--

Re: Online enabling of checksums

От
Tomas Vondra
Дата:

On 03/03/2018 05:08 PM, Magnus Hagander wrote:
> 
> 
> On Sat, Mar 3, 2018 at 5:06 PM, Tomas Vondra
> <tomas.vondra@2ndquadrant.com <mailto:tomas.vondra@2ndquadrant.com>> wrote:
> 
>     On 03/03/2018 01:38 PM, Robert Haas wrote:
>     > On Sat, Mar 3, 2018 at 7:32 AM, Robert Haas <robertmhaas@gmail.com <mailto:robertmhaas@gmail.com>> wrote:
>     >> On Fri, Mar 2, 2018 at 6:26 PM, Tomas Vondra
>     >> <tomas.vondra@2ndquadrant.com <mailto:tomas.vondra@2ndquadrant.com>>
>     wrote:
>     >>> Hmmm, OK. So we need to have a valid checksum on a page, disable
>     >>> checksums, set some hint bits on the page (which won't be
>     >>> WAL-logged), enable checksums again and still get a valid
>     >>> checksum even with the new hint bits? That's possible, albeit
>     >>> unlikely.
>     >>
>     >> No, the problem is if - as is much more likely - the checksum is
>     >> not still valid.
>     >
>     > Hmm, on second thought ... maybe I didn't think this through
>     > carefully enough. If the checksum matches on the master by chance,
>     > and the page is the same on the standby, then we're fine, right? It's
>     > a weird accident, but nothing is actually broken. The failure
>     > scenario is where the standby has a version of the page with a bad
>     > checksum, but the master has a good checksum. So for example:
>     > checksums disabled, master modifies the page (which is replicated),
>     > master sets some hint bits (coincidentally making the checksum
>     > match), now we try to turn checksums on and don't re-replicate the
>     > page because the checksum already looks correct.
>     >
> 
>     Yeah. Doesn't that pretty much mean we can't skip any pages that have
>     correct checksum, because we can't rely on standby having the same page
>     data? That is, this block in ProcessSingleRelationFork:
> 
>       /*
>        * If checksum was not set or was invalid, mark the buffer as dirty
>        * and force a full page write. If the checksum was already valid, we
>        * can leave it since we know that any other process writing the
>        * buffer will update the checksum.
>        */
>       if (checksum != pagehdr->pd_checksum)
>       {
>           START_CRIT_SECTION();
>           MarkBufferDirty(buf);
>           log_newpage_buffer(buf, false);
>           END_CRIT_SECTION();
>       }
> 
>     That would mean this optimization - only doing the write when the
>     checksum does not match - is broken.
> 
> 
> Yes. I think that was the conclusion of this, as posted
> in https://www.postgresql.org/message-id/CABUevExDZu__5KweT8fr3Ox45YcuvTDEEu%3DaDpGBT8Sk0RQE_g%40mail.gmail.com
> :)
> 

Oh, right. I did have a "deja vu" feeling, when writing that. Good that
I came to the same conclusion, though.

> 
>     If that's the case, it probably makes restarts/resume more expensive,
>     because this optimization was why after restart the already processed
>     data was only read (and the checksums verified) but not written.
> 
> 
> Yes, it definitely does. It's not a dealbreaker, but it's certainly
> a bit painful not to be able to resume as cheap.
> 

Yeah. It probably makes the more elaborate resuming more valuable, but I
still think it's not a "must have" for PG11.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Online enabling of checksums

От
Magnus Hagander
Дата:


On Sat, Mar 3, 2018 at 5:17 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:


On 03/03/2018 05:08 PM, Magnus Hagander wrote:
>
>
> On Sat, Mar 3, 2018 at 5:06 PM, Tomas Vondra
> <tomas.vondra@2ndquadrant.com <mailto:tomas.vondra@2ndquadrant.com>> wrote:
>
>     On 03/03/2018 01:38 PM, Robert Haas wrote:
>     > On Sat, Mar 3, 2018 at 7:32 AM, Robert Haas <robertmhaas@gmail.com <mailto:robertmhaas@gmail.com>> wrote:
>     >> On Fri, Mar 2, 2018 at 6:26 PM, Tomas Vondra
>     >> <tomas.vondra@2ndquadrant.com <mailto:tomas.vondra@2ndquadrant.com>>
>     wrote:
>     >>> Hmmm, OK. So we need to have a valid checksum on a page, disable
>     >>> checksums, set some hint bits on the page (which won't be
>     >>> WAL-logged), enable checksums again and still get a valid
>     >>> checksum even with the new hint bits? That's possible, albeit
>     >>> unlikely.
>     >>
>     >> No, the problem is if - as is much more likely - the checksum is
>     >> not still valid.
>     >
>     > Hmm, on second thought ... maybe I didn't think this through
>     > carefully enough. If the checksum matches on the master by chance,
>     > and the page is the same on the standby, then we're fine, right? It's
>     > a weird accident, but nothing is actually broken. The failure
>     > scenario is where the standby has a version of the page with a bad
>     > checksum, but the master has a good checksum. So for example:
>     > checksums disabled, master modifies the page (which is replicated),
>     > master sets some hint bits (coincidentally making the checksum
>     > match), now we try to turn checksums on and don't re-replicate the
>     > page because the checksum already looks correct.
>     >
>
>     Yeah. Doesn't that pretty much mean we can't skip any pages that have
>     correct checksum, because we can't rely on standby having the same page
>     data? That is, this block in ProcessSingleRelationFork:
>
>       /*
>        * If checksum was not set or was invalid, mark the buffer as dirty
>        * and force a full page write. If the checksum was already valid, we
>        * can leave it since we know that any other process writing the
>        * buffer will update the checksum.
>        */
>       if (checksum != pagehdr->pd_checksum)
>       {
>           START_CRIT_SECTION();
>           MarkBufferDirty(buf);
>           log_newpage_buffer(buf, false);
>           END_CRIT_SECTION();
>       }
>
>     That would mean this optimization - only doing the write when the
>     checksum does not match - is broken.
>
>
> Yes. I think that was the conclusion of this, as posted
> in https://www.postgresql.org/message-id/CABUevExDZu__5KweT8fr3Ox45YcuvTDEEu%3DaDpGBT8Sk0RQE_g%40mail.gmail.com
> :)
>

Oh, right. I did have a "deja vu" feeling, when writing that. Good that
I came to the same conclusion, though.

>
>     If that's the case, it probably makes restarts/resume more expensive,
>     because this optimization was why after restart the already processed
>     data was only read (and the checksums verified) but not written.
>
>
> Yes, it definitely does. It's not a dealbreaker, but it's certainly
> a bit painful not to be able to resume as cheap.
>

Yeah. It probably makes the more elaborate resuming more valuable, but I
still think it's not a "must have" for PG11.


Attached is a rebased patch which removes this optimization, updates the pg_proc entry for the new format, and changes pg_verify_checksums to use -r instead of -o for relfilenode.

--
Вложения

Re: Online enabling of checksums

От
Michael Banck
Дата:
Hi,

On Sat, Mar 03, 2018 at 07:23:31PM +0100, Magnus Hagander wrote:
> diff --git a/src/bin/pg_verify_checksums/pg_verify_checksums.c b/src/bin/pg_verify_checksums/pg_verify_checksums.c
> new file mode 100644
> index 0000000000..a4bfe7284d
> --- /dev/null
> +++ b/src/bin/pg_verify_checksums/pg_verify_checksums.c
> @@ -0,0 +1,308 @@
[...]

> +/*
> + * pg_verify_checksums
> + *
> + * Verifies page level checksums in an offline cluster
> + *
> + *    Copyright (c) 2010-2018, PostgreSQL Global Development Group

Weird copyright statement for a new file, did you base it off another
one, or just copy-pasted the boilerplate?

[...]

> +        csum = pg_checksum_page(buf, blockno + segmentno*RELSEG_SIZE);
> +        if (csum != header->pd_checksum)
> +        {
> +            if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION)
> +                fprintf(stderr, _("%s: %s, block %d, invalid checksum in file %X, calculated %X\n"),
> +                        progname, fn, blockno, header->pd_checksum, csum);

The error message sounds a bit strange to me, I would expect the
filename after "in file [...]", but you print the expected checksum.
Also, 'invalid' sounds  a bit like we found something which is malformed
checksum (no hex), so maybe "checksum mismatch in file, expected %X,
found %X" or something?


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer


Re: Online enabling of checksums

От
Daniel Gustafsson
Дата:
> On 04 Mar 2018, at 15:24, Michael Banck <michael.banck@credativ.de> wrote:

>> +        csum = pg_checksum_page(buf, blockno + segmentno*RELSEG_SIZE);
>> +        if (csum != header->pd_checksum)
>> +        {
>> +            if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION)
>> +                fprintf(stderr, _("%s: %s, block %d, invalid checksum in file %X, calculated %X\n"),
>> +                        progname, fn, blockno, header->pd_checksum, csum);
>
> The error message sounds a bit strange to me, I would expect the
> filename after "in file [...]", but you print the expected checksum.
> Also, 'invalid' sounds  a bit like we found something which is malformed
> checksum (no hex), so maybe "checksum mismatch in file, expected %X,
> found %X" or something?

Agreed.  Looking at our current error messages, “in file” is conventionally
followed by the filename.  I do however think “calculated” is better than
“expected” since it conveys clearly that the compared checksum is calculated by
pg_verify_checksum and not read from somewhere.

How about something like this?

_(“%s: checksum mismatch in file \”%s\”, block %d: calculated %X, found %X”),
    progname, fn, blockno, csum, header->pd_checksum);

cheers ./daniel

Re: Online enabling of checksums

От
Michael Banck
Дата:
Hi,

Am Sonntag, den 04.03.2018, 23:30 +0100 schrieb Daniel Gustafsson:
> Agreed.  Looking at our current error messages, “in file” is conventionally
> followed by the filename.  I do however think “calculated” is better than
> “expected” since it conveys clearly that the compared checksum is calculated by
> pg_verify_checksum and not read from somewhere.
> 
> How about something like this?
> 
> _(“%s: checksum mismatch in file \”%s\”, block %d: calculated %X, found %X”),
>     progname, fn, blockno, csum, header->pd_checksum);

I still find that confusing, but maybe it's just me. I thought the one
in the pageheader is the "expected" checksum, and we compare the "found"
or "computed/calculated" (in the page itself) against it.

I had the same conversation with an external tool author, by the way:

https://github.com/uptimejp/postgres-toolkit/issues/48



Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer


Re: Online enabling of checksums

От
Magnus Hagander
Дата:


On Mon, Mar 5, 2018 at 10:43 AM, Michael Banck <michael.banck@credativ.de> wrote:
Hi,

Am Sonntag, den 04.03.2018, 23:30 +0100 schrieb Daniel Gustafsson:
> Agreed.  Looking at our current error messages, “in file” is conventionally
> followed by the filename.  I do however think “calculated” is better than
> “expected” since it conveys clearly that the compared checksum is calculated by
> pg_verify_checksum and not read from somewhere.
>
> How about something like this?
>
> _(“%s: checksum mismatch in file \”%s\”, block %d: calculated %X, found %X”),
>       progname, fn, blockno, csum, header->pd_checksum);

I still find that confusing, but maybe it's just me. I thought the one
in the pageheader is the "expected" checksum, and we compare the "found"
or "computed/calculated" (in the page itself) against it.

I had the same conversation with an external tool author, by the way:

Maybe we should just say "on disk" for the one that's on disk, would that break the confusion? So "calculated %X, found %X on disk"? 

--

Re: Online enabling of checksums

От
Michael Banck
Дата:
Hi,

On Mon, Mar 05, 2018 at 11:09:02AM +0100, Magnus Hagander wrote:
> On Mon, Mar 5, 2018 at 10:43 AM, Michael Banck <michael.banck@credativ.de>
> wrote:
> > I still find that confusing, but maybe it's just me. I thought the one
> > in the pageheader is the "expected" checksum, and we compare the "found"
> > or "computed/calculated" (in the page itself) against it.
> >
> > I had the same conversation with an external tool author, by the way:
> 
> Maybe we should just say "on disk" for the one that's on disk, would that
> break the confusion? So "calculated %X, found %X on disk"?

I found that there is a precedent in bufpage.c:

|        ereport(WARNING,
|                (ERRCODE_DATA_CORRUPTED,
|                 errmsg("page verification failed, calculated checksum %u but expected %u",
|                        checksum, p->pd_checksum)));

apart from the fact that it doesn't print out the hex value (which I
find strange), it sounds like a sensible message to me. But "found %X on
disk" would work as well I guess.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer


Re: Online enabling of checksums

От
Michael Banck
Дата:
Hi,

I had a closer look at v3 of the patch now.

On Sat, Mar 03, 2018 at 07:23:31PM +0100, Magnus Hagander wrote:
> Attached is a rebased patch which removes this optimization, updates the
> pg_proc entry for the new format, and changes pg_verify_checksums to use -r
> instead of -o for relfilenode.
 
The patch applies fine with minimal fuzz and compiles with no warnings;
make check and the added isolation tests, as well as the added checksum
tests pass.

If I blindly run "SELECT pg_enable_data_checksums();" on new cluster, I
get:

|postgres=# SELECT pg_enable_data_checksums();
| pg_enable_data_checksums 
|--------------------------
| t
|(1 row)
|
|postgres=# SHOW data_checksums ;
| data_checksums 
|----------------
| inprogress
|(1 row)


However, inspecting the log one sees:

|2018-03-10 14:15:57.702 CET [3313] ERROR:  Database template0 does not allow connections.
|2018-03-10 14:15:57.702 CET [3313] HINT:  Allow connections using ALTER DATABASE and try again.
|2018-03-10 14:15:57.702 CET [3152] LOG:  background worker "checksum helper launcher" (PID 3313) exited with exit code
1

and the background worker is no longer running without any obvious hint
to the client.

I am aware that this is discussed already, but as-is the user experience
is pretty bad, I think pg_enable_data_checksums() should either bail
earlier if it cannot connect to all databases, or it should be better
documented.

Otherwise, if I allow connections to template0, the patch works as
expected, I have not managed to break it so far.

Some further review comments:

> diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
> index f4bc2d4161..89afecb341 100644
> --- a/doc/src/sgml/wal.sgml
> +++ b/doc/src/sgml/wal.sgml
> @@ -230,6 +230,73 @@
[...]
> +   <para>
> +    Checksums can be enabled or disabled online, by calling the appropriate
> +    <link linkend="functions-admin-checksum">functions</link>.
> +    Disabling of checksums takes effect immediately when the function is called.
> +   </para>
> +
> +   <para>
> +    Enabling checksums will put the cluster in <literal>inprogress</literal> mode.
> +    During this time, checksums will be written but not verified. In addition to
> +    this, a background worker process is started that enables checksums on all
> +    existing data in the cluster. Once this worker has completed processing all
> +    databases in the cluster, the checksum mode will automatically switch to
> +    <literal>on</literal>.
> +   </para>
> +
> +   <para>
> +    If the cluster is stopped while in <literal>inprogress</literal> mode, for
> +    any reason, then this process must be restarted manually. To do this,
> +    re-execute the function <function>pg_enable_data_checksums()</function>
> +    once the cluster has been restarted.
> +   </para>

Maybe document the above issue here, unless it is clear that the
templat0-needs-to-allow-connections issue will go away before the patch
is pushed.

> diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
> index 47a6c4d895..56aaa88de1 100644
> --- a/src/backend/access/transam/xlog.c
> +++ b/src/backend/access/transam/xlog.c

[...]

> +void
> +SetDataChecksumsOn(void)
> +{
> +    if (!DataChecksumsEnabledOrInProgress())
> +        elog(ERROR, "Checksums not enabled or in progress");
> +
> +    LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
> +
> +    if (ControlFile->data_checksum_version != PG_DATA_CHECKSUM_INPROGRESS_VERSION)
> +    {
> +        LWLockRelease(ControlFileLock);
> +        elog(ERROR, "Checksums not in in_progress mode");

The string used in "SHOW data_checksums" is "inprogress", not
"in_progress".

[...]

> @@ -7769,6 +7839,16 @@ StartupXLOG(void)
>      CompleteCommitTsInitialization();
>  
>      /*
> +     * If we reach this point with checksums in inprogress state, we notify
> +     * the user that they need to manually restart the process to enable
> +     * checksums.
> +     */
> +    if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_VERSION)
> +        ereport(WARNING,
> +                (errmsg("checksum state is \"in progress\" with no worker"),
> +                 errhint("Either disable or enable checksums by calling the pg_disable_data_checksums() or
pg_enable_data_checksums()functions.")));
 

Again, string is "inprogress", not "in progress", not sure if that
matters.

> diff --git a/src/backend/postmaster/checksumhelper.c b/src/backend/postmaster/checksumhelper.c
> new file mode 100644
> index 0000000000..6aa71bcf30
> --- /dev/null
> +++ b/src/backend/postmaster/checksumhelper.c
> @@ -0,0 +1,631 @@
> +/*-------------------------------------------------------------------------
> + *
> + * checksumhelper.c
> + *      Backend worker to walk the database and write checksums to pages

Backend or Background?

[...]

> +typedef struct ChecksumHelperShmemStruct
> +{
> +    pg_atomic_flag launcher_started;
> +    bool        success;
> +    bool        process_shared_catalogs;
> +    /* Parameter values  set on start */

double space.

[...]

> +static bool
> +ProcessSingleRelationFork(Relation reln, ForkNumber forkNum, BufferAccessStrategy strategy)
> +{
> +    BlockNumber numblocks = RelationGetNumberOfBlocksInFork(reln, forkNum);
> +    BlockNumber b;
> +
> +    for (b = 0; b < numblocks; b++)
> +    {
> +        Buffer        buf = ReadBufferExtended(reln, forkNum, b, RBM_NORMAL, strategy);
> +        Page        page;
> +        PageHeader    pagehdr;
> +        uint16        checksum;
> +
> +        /* Need to get an exclusive lock before we can flag as dirty */
> +        LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE);
> +
> +        /* Do we already have a valid checksum? */
> +        page = BufferGetPage(buf);
> +        pagehdr = (PageHeader) page;
> +        checksum = pg_checksum_page((char *) page, b);

This looks like it does not take the segment number into account;
however it is also unclear to me what the purpose of this is, as
checksum is never validated against the pagehdr, and nothing is done
with it. Indeed, I even get a compiler warning about pagehdr and
checksum:

git/postgresql/build/../src/backend/postmaster/checksumhelper.c:
In function ‘ProcessSingleRelationFork’:
git/postgresql/build/../src/backend/postmaster/checksumhelper.c:155:11:
warning: variable ‘checksum’ set but not used
[-Wunused-but-set-variable]
   uint16  checksum;
           ^~~~~~~~
git/postgresql/build/../src/backend/postmaster/checksumhelper.c:154:14:
warning: variable ‘pagehdr’ set but not used [-Wunused-but-set-variable]
   PageHeader pagehdr;
              ^~~~~~~

I guess the above block running pg_checksum_page() is a leftover from
previous versions of the patch and should be removed... 

> +        /*
> +         * Mark the buffer as dirty and force a full page write.
> +         * We have to re-write the page to wal even if the checksum hasn't
> +         * changed, because if there is a replica it might have a slightly
> +         * different version of the page with an invalid checksum, caused
> +         * by unlogged changes (e.g. hintbits) on the master happening while
> +         * checksums were off. This can happen if there was a valid checksum
> +         * on the page at one point in the past, so only when checksums
> +         * are first on, then off, and then turned on again.
> +         */
> +        START_CRIT_SECTION();
> +        MarkBufferDirty(buf);
> +        log_newpage_buffer(buf, false);
> +        END_CRIT_SECTION();

... seeing how MarkBufferDirty(buf) is now run unconditionally.

[...]

> +    /*
> +     * Initialize a connection to shared catalogs only.
> +     */
> +    BackgroundWorkerInitializeConnection(NULL, NULL);
> +
> +    /*
> +     * Set up so first run processes shared catalogs, but not once in every db
> +     */

Comment should have a full stop like the above and below ones?

> +    ChecksumHelperShmem->process_shared_catalogs = true;
> +
> +    /*
> +     * Create a database list.  We don't need to concern ourselves with
> +     * rebuilding this list during runtime since any new created database will
> +     * be running with checksums turned on from the start.
> +     */

[...]

> +    DatabaseList = BuildDatabaseList();
> +
> +    /*
> +     * If there are no databases at all to checksum, we can exit immediately
> +     * as there is no work to do.
> +     */
> +    if (DatabaseList == NIL || list_length(DatabaseList) == 0)
> +        return;
> +
> +    while (true)
> +    {
> +        List       *remaining = NIL;
> +        ListCell   *lc,
> +                   *lc2;
> +        List       *CurrentDatabases = NIL;
> +
> +        foreach(lc, DatabaseList)
> +        {
> +            ChecksumHelperDatabase *db = (ChecksumHelperDatabase *) lfirst(lc);
> +
> +            if (ProcessDatabase(db))
> +            {
> +                pfree(db->dbname);
> +                pfree(db);
> +
> +                if (ChecksumHelperShmem->process_shared_catalogs)
> +
> +                    /*
> +                     * Now that one database has completed shared catalogs, we
> +                     * don't have to process them again .

Stray space.

> +                     */
> +                    ChecksumHelperShmem->process_shared_catalogs = false;
> +            }
> +            else
> +            {
> +                /*
> +                 * Put failed databases on the remaining list.
> +                 */
> +                remaining = lappend(remaining, db);
> +            }
> +        }
> +        list_free(DatabaseList);
> +
> +        DatabaseList = remaining;
> +        remaining = NIL;
> +
> +        /*
> +         * DatabaseList now has all databases not yet processed. This can be
> +         * because they failed for some reason, or because the database was
> +         * DROPed between us getting the database list and trying to process

DROPed looks wrong, and there's no other occurence of it in the source
tree. DROPped looks even weirder, so maybe just "dropped"?

> +         * it. Get a fresh list of databases to detect the second case with.

That sentence looks unfinished or at least is unclear to me.

> +         * Any database that still exists but failed we retry for a limited

I'm not a native speaker, but this looks wrong to me as well, maybe "We
retry any database that still exists but failed for a limited [...]"?

> diff --git a/src/bin/pg_upgrade/controldata.c b/src/bin/pg_upgrade/controldata.c
> index 0fe98a550e..4bb2b7e6ec 100644
> --- a/src/bin/pg_upgrade/controldata.c
> +++ b/src/bin/pg_upgrade/controldata.c
> @@ -591,6 +591,15 @@ check_control_data(ControlData *oldctrl,
>       */
>  
>      /*
> +     * If checksums have been turned on in the old cluster, but the
> +     * checksumhelper have yet to finish, then disallow upgrading. The user
> +     * should either let the process finish, or turn off checksums, before
> +     * retrying.
> +     */
> +    if (oldctrl->data_checksum_version == 2)
> +        pg_fatal("transition to data checksums not completed in old cluster\n");
> +
> +    /*
>       * We might eventually allow upgrades from checksum to no-checksum
>       * clusters.
>       */

See below about src/bin/pg_upgrade/pg_upgrade.h having
data_checksum_version be a bool. 

I checked pg_ugprade (from master to master though), and could find no
off-hand issues, i.e. it reported all issues correctly.

> --- /dev/null
> +++ b/src/bin/pg_verify_checksums/Makefile

[...]

> +check:
> +       $(prove_check)
> +
> +installcheck:
> +       $(prove_installcheck)

If I run "make check" in src/bin/pg_verify_checksums, I get a fat perl
error:

|src/bin/pg_verify_checksums$ LANG=C make check
|rm -rf '/home/mba/Projekte/OSS/PostgreSQL/git/postgresql/build'/tmp_install
|/bin/mkdir -p '/home/mba/Projekte/OSS/PostgreSQL/git/postgresql/build'/tmp_install/log
|make -C '../../..' DESTDIR='/home/mba/Projekte/OSS/PostgreSQL/git/postgresql/build'/tmp_install install
>'/home/mba/Projekte/OSS/PostgreSQL/git/postgresql/build'/tmp_install/log/install.log2>&1
 
|rm -rf '/home/mba/Projekte/OSS/PostgreSQL/git/postgresql/build/src/bin/pg_verify_checksums'/tmp_check
|/bin/mkdir -p '/home/mba/Projekte/OSS/PostgreSQL/git/postgresql/build/src/bin/pg_verify_checksums'/tmp_check
|cd /home/mba/Projekte/OSS/PostgreSQL/git/postgresql/build/../src/bin/pg_verify_checksums &&
TESTDIR='/home/mba/Projekte/OSS/PostgreSQL/git/postgresql/build/src/bin/pg_verify_checksums'
PATH="/home/mba/Projekte/OSS/PostgreSQL/git/postgresql/build/tmp_install//bin:$PATH"
LD_LIBRARY_PATH="/home/mba/Projekte/OSS/PostgreSQL/git/postgresql/build/tmp_install//lib"PGPORT='65432'
PG_REGRESS='/home/mba/Projekte/OSS/PostgreSQL/git/postgresql/build/src/bin/pg_verify_checksums/../../../src/test/regress/pg_regress'
/usr/bin/prove-I /home/mba/Projekte/OSS/PostgreSQL/git/postgresql/build/../src/test/perl/ -I
/home/mba/Projekte/OSS/PostgreSQL/git/postgresql/build/../src/bin/pg_verify_checksums t/*.pl
 
|Cannot detect source of 't/*.pl'! at /usr/share/perl/5.24/TAP/Parser/IteratorFactory.pm line 261.
|    TAP::Parser::IteratorFactory::detect_source(TAP::Parser::IteratorFactory=HASH(0x55eed10df3e8),
TAP::Parser::Source=HASH(0x55eed10bd358))called at /usr/share/perl/5.24/TAP/Parser/IteratorFactory.pm line 211
 
|    TAP::Parser::IteratorFactory::make_iterator(TAP::Parser::IteratorFactory=HASH(0x55eed10df3e8),
TAP::Parser::Source=HASH(0x55eed10bd358))called at /usr/share/perl/5.24/TAP/Parser.pm line 472
 
|    TAP::Parser::_initialize(TAP::Parser=HASH(0x55eed10df328), HASH(0x55eed0ea2e90)) called at
/usr/share/perl/5.24/TAP/Object.pmline 55
 
|    TAP::Object::new("TAP::Parser", HASH(0x55eed0ea2e90)) called at /usr/share/perl/5.24/TAP/Object.pm line 130
|    TAP::Object::_construct(TAP::Harness=HASH(0x55eed09176b0), "TAP::Parser", HASH(0x55eed0ea2e90)) called at
/usr/share/perl/5.24/TAP/Harness.pmline 852
 
|    TAP::Harness::make_parser(TAP::Harness=HASH(0x55eed09176b0), TAP::Parser::Scheduler::Job=HASH(0x55eed0fdc708))
calledat /usr/share/perl/5.24/TAP/Harness.pm line 651
 
|    TAP::Harness::_aggregate_single(TAP::Harness=HASH(0x55eed09176b0), TAP::Parser::Aggregator=HASH(0x55eed091e520),
TAP::Parser::Scheduler=HASH(0x55eed0fdc6a8))called at /usr/share/perl/5.24/TAP/Harness.pm line 743
 
|    TAP::Harness::aggregate_tests(TAP::Harness=HASH(0x55eed09176b0), TAP::Parser::Aggregator=HASH(0x55eed091e520),
"t/*.pl")called at /usr/share/perl/5.24/TAP/Harness.pm line 558
 
|    TAP::Harness::__ANON__() called at /usr/share/perl/5.24/TAP/Harness.pm line 571
|    TAP::Harness::runtests(TAP::Harness=HASH(0x55eed09176b0), "t/*.pl") called at /usr/share/perl/5.24/App/Prove.pm
line546
 
|    App::Prove::_runtests(App::Prove=HASH(0x55eed090b0c8), HASH(0x55eed0d79cf0), "t/*.pl") called at
/usr/share/perl/5.24/App/Prove.pmline 504
 
|    App::Prove::run(App::Prove=HASH(0x55eed090b0c8)) called at /usr/bin/prove line 13
|Makefile:39: recipe for target 'check' failed
|make: *** [check] Error 2


> diff --git a/src/bin/pg_verify_checksums/pg_verify_checksums.c b/src/bin/pg_verify_checksums/pg_verify_checksums.c
> new file mode 100644
> index 0000000000..a4bfe7284d
> --- /dev/null
> +++ b/src/bin/pg_verify_checksums/pg_verify_checksums.c

[...]

> +    if (DataDir == NULL)
> +    {
> +        if (optind < argc)
> +            DataDir = argv[optind++];
> +        else
> +            DataDir = getenv("PGDATA");
> +    }
> +
> +    /* Complain if any arguments remain */
> +    if (optind < argc)
> +    {
> +        fprintf(stderr, _("%s: too many command-line arguments (first is \"%s\")\n"),
> +                progname, argv[optind]);
> +        fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
> +                progname);
> +        exit(1);
> +    }
> +
> +    if (DataDir == NULL)
> +    {
> +        fprintf(stderr, _("%s: no data directory specified\n"), progname);
> +        fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
> +        exit(1);
> +    }

Those two if (DataDir == NULL) checks could maybe be put together into
one block.

> diff --git a/src/include/postmaster/checksumhelper.h b/src/include/postmaster/checksumhelper.h
> new file mode 100644
> index 0000000000..7f296264a9
> --- /dev/null
> +++ b/src/include/postmaster/checksumhelper.h
> @@ -0,0 +1,31 @@
> +/*-------------------------------------------------------------------------
> + *
> + * checksumhelper.h
> + *      header file for checksum helper deamon

"deamon" is surely wrong (it'd be "daemon"), but maybe "(background)
worker" is better?

> diff --git a/src/include/storage/bufpage.h b/src/include/storage/bufpage.h
> index 85dd10c45a..bd46bf2ce6 100644
> --- a/src/include/storage/bufpage.h
> +++ b/src/include/storage/bufpage.h
> @@ -194,6 +194,7 @@ typedef PageHeaderData *PageHeader;
>   */
>  #define PG_PAGE_LAYOUT_VERSION        4
>  #define PG_DATA_CHECKSUM_VERSION    1
> +#define PG_DATA_CHECKSUM_INPROGRESS_VERSION        2

I am not very sure about the semantics of PG_DATA_CHECKSUM_VERSION being
1, but I assumed it was a version, like, if we ever decide to use a
different checksumming algorithm, we'd bump it to 2.

Now PG_DATA_CHECKSUM_INPROGRESS_VERSION is defined to 2, which I agree
is convenient, but is there some strategy what to do about this in case
the PG_DATA_CHECKSUM_VERSION needs to be increased?

In any case, src/bin/pg_upgrade/pg_upgrade.h has

        bool            data_checksum_version;

in the ControlData struct, which might need updating?


That's all for now.

Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer


Re: Online enabling of checksums

От
Daniel Gustafsson
Дата:
> On 10 Mar 2018, at 16:09, Michael Banck <michael.banck@credativ.de> wrote:

> I had a closer look at v3 of the patch now.

Thanks, much appreciated!  Sorry for the late response, just came back from a
conference and have had little time for hacking.

All whitespace, punctuation and capitalization comments have been addressed
with your recommendations, so I took the liberty to trim them from the
response.

> I am aware that this is discussed already, but as-is the user experience
> is pretty bad, I think pg_enable_data_checksums() should either bail
> earlier if it cannot connect to all databases, or it should be better
> documented.

Personally I think we should first attempt to solve the "allow-connections in
background workers” issue which has been raised on another thread.  For now I’m
documenting this better.

>> diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
>> index f4bc2d4161..89afecb341 100644
>> --- a/doc/src/sgml/wal.sgml
>> +++ b/doc/src/sgml/wal.sgml
>> @@ -230,6 +230,73 @@
> [...]
>> +   <para>
>> +    Checksums can be enabled or disabled online, by calling the appropriate
>> +    <link linkend="functions-admin-checksum">functions</link>.
>> +    Disabling of checksums takes effect immediately when the function is called.
>> +   </para>
>> +
>> +   <para>
>> +    Enabling checksums will put the cluster in <literal>inprogress</literal> mode.
>> +    During this time, checksums will be written but not verified. In addition to
>> +    this, a background worker process is started that enables checksums on all
>> +    existing data in the cluster. Once this worker has completed processing all
>> +    databases in the cluster, the checksum mode will automatically switch to
>> +    <literal>on</literal>.
>> +   </para>
>> +
>> +   <para>
>> +    If the cluster is stopped while in <literal>inprogress</literal> mode, for
>> +    any reason, then this process must be restarted manually. To do this,
>> +    re-execute the function <function>pg_enable_data_checksums()</function>
>> +    once the cluster has been restarted.
>> +   </para>
>
> Maybe document the above issue here, unless it is clear that the
> templat0-needs-to-allow-connections issue will go away before the patch
> is pushed.

I have added a paragraph on allowing connections here, as well as a note that
template0 will need to be handled.

>> +void
>> +SetDataChecksumsOn(void)
>> +{
>> +    if (!DataChecksumsEnabledOrInProgress())
>> +        elog(ERROR, "Checksums not enabled or in progress");
>> +
>> +    LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
>> +
>> +    if (ControlFile->data_checksum_version != PG_DATA_CHECKSUM_INPROGRESS_VERSION)
>> +    {
>> +        LWLockRelease(ControlFileLock);
>> +        elog(ERROR, "Checksums not in in_progress mode");
>
> The string used in "SHOW data_checksums" is "inprogress", not
> "in_progress”.

Fixed.

>> @@ -7769,6 +7839,16 @@ StartupXLOG(void)
>>     CompleteCommitTsInitialization();
>>
>>     /*
>> +     * If we reach this point with checksums in inprogress state, we notify
>> +     * the user that they need to manually restart the process to enable
>> +     * checksums.
>> +     */
>> +    if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_VERSION)
>> +        ereport(WARNING,
>> +                (errmsg("checksum state is \"in progress\" with no worker"),
>> +                 errhint("Either disable or enable checksums by calling the pg_disable_data_checksums() or
pg_enable_data_checksums()functions."))); 
>
> Again, string is "inprogress", not "in progress", not sure if that
> matters.

I think it does, we need to be consistent in userfacing naming.  Updated.

>> diff --git a/src/backend/postmaster/checksumhelper.c b/src/backend/postmaster/checksumhelper.c
>> new file mode 100644
>> index 0000000000..6aa71bcf30
>> --- /dev/null
>> +++ b/src/backend/postmaster/checksumhelper.c
>> @@ -0,0 +1,631 @@
>> +/*-------------------------------------------------------------------------
>> + *
>> + * checksumhelper.c
>> + *      Backend worker to walk the database and write checksums to pages
>
> Backend or Background?

“Background” is the right term here, fixed.

>> +static bool
>> +ProcessSingleRelationFork(Relation reln, ForkNumber forkNum, BufferAccessStrategy strategy)
>> +{
>> +    BlockNumber numblocks = RelationGetNumberOfBlocksInFork(reln, forkNum);
>> +    BlockNumber b;
>> +
>> +    for (b = 0; b < numblocks; b++)
>> +    {
>> +        Buffer        buf = ReadBufferExtended(reln, forkNum, b, RBM_NORMAL, strategy);
>> +        Page        page;
>> +        PageHeader    pagehdr;
>> +        uint16        checksum;
>> +
>> +        /* Need to get an exclusive lock before we can flag as dirty */
>> +        LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE);
>> +
>> +        /* Do we already have a valid checksum? */
>> +        page = BufferGetPage(buf);
>> +        pagehdr = (PageHeader) page;
>> +        checksum = pg_checksum_page((char *) page, b);
>
> This looks like it does not take the segment number into account;
> however it is also unclear to me what the purpose of this is, as
> checksum is never validated against the pagehdr, and nothing is done
> with it.

> I guess the above block running pg_checksum_page() is a leftover from
> previous versions of the patch and should be removed…

Correct, the checksum and pagehdr was used in the previous optimization to skip
forced writes where the checksum was valid.  That optimization was however
based on a faulty assumption and was removed in the v3 patch.  The leftover
variables are now removed.

>> +         * it. Get a fresh list of databases to detect the second case with.
>
> That sentence looks unfinished or at least is unclear to me.

Updated to indicate what we mean by “second case”.

>> +         * Any database that still exists but failed we retry for a limited
>
> I'm not a native speaker, but this looks wrong to me as well, maybe "We
> retry any database that still exists but failed for a limited [...]”?

Updated and extended a bit for clarity.

Fixing this review comment also made me realize that checksumhelper.c was using
%s for outputting the database name in errmsg(), but \”%s\” is what we commonly
use.  Updated all errmsg() invocations to quote the database name.

>> diff --git a/src/bin/pg_upgrade/controldata.c b/src/bin/pg_upgrade/controldata.c
>> index 0fe98a550e..4bb2b7e6ec 100644
>> --- a/src/bin/pg_upgrade/controldata.c
>> +++ b/src/bin/pg_upgrade/controldata.c
>> @@ -591,6 +591,15 @@ check_control_data(ControlData *oldctrl,
>>      */
>>
>>     /*
>> +     * If checksums have been turned on in the old cluster, but the
>> +     * checksumhelper have yet to finish, then disallow upgrading. The user
>> +     * should either let the process finish, or turn off checksums, before
>> +     * retrying.
>> +     */
>> +    if (oldctrl->data_checksum_version == 2)
>> +        pg_fatal("transition to data checksums not completed in old cluster\n");
>> +
>> +    /*
>>      * We might eventually allow upgrades from checksum to no-checksum
>>      * clusters.
>>      */
>
> See below about src/bin/pg_upgrade/pg_upgrade.h having
> data_checksum_version be a bool.
>
> I checked pg_ugprade (from master to master though), and could find no
> off-hand issues, i.e. it reported all issues correctly.

data_checksum_version is indeed defined bool, but rather than being an actual
bool it’s a bool backed by a char via a typedef in c.h.  This is why it works
to assign 0, 1 or 2 without issues.

That being said, I agree that it reads odd now that checksum version isn’t just
0 or 1 which mentally translate neatly to false or true.  Since this is the
patch introducing version 2, I changed data_checksum_version to a uint32 as
that makes the intent much clearer.

>> +check:
>> +       $(prove_check)
>> +
>> +installcheck:
>> +       $(prove_installcheck)
>
> If I run "make check" in src/bin/pg_verify_checksums, I get a fat perl
> error:

Thats because the check and installcheck targets are copy-pasteos, there are no
tests for pg_verify_checksums currently so the targets should not be in the
Makefile. Fixed.

>> +    if (DataDir == NULL)
>> +    {
>> +        if (optind < argc)
>> +            DataDir = argv[optind++];
>> +        else
>> +            DataDir = getenv("PGDATA");
>> +    }
>> +
>> +    /* Complain if any arguments remain */
>> +    if (optind < argc)
>> +    {
>> +        fprintf(stderr, _("%s: too many command-line arguments (first is \"%s\")\n"),
>> +                progname, argv[optind]);
>> +        fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
>> +                progname);
>> +        exit(1);
>> +    }
>> +
>> +    if (DataDir == NULL)
>> +    {
>> +        fprintf(stderr, _("%s: no data directory specified\n"), progname);
>> +        fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
>> +        exit(1);
>> +    }
>
> Those two if (DataDir == NULL) checks could maybe be put together into
> one block.

Moved the check into the first block, as it makes code clearr and doesn’t
change the order in which the error messages for missing datadir and
too-many-arguments will be output.

>> diff --git a/src/include/postmaster/checksumhelper.h b/src/include/postmaster/checksumhelper.h
>> new file mode 100644
>> index 0000000000..7f296264a9
>> --- /dev/null
>> +++ b/src/include/postmaster/checksumhelper.h
>> @@ -0,0 +1,31 @@
>> +/*-------------------------------------------------------------------------
>> + *
>> + * checksumhelper.h
>> + *      header file for checksum helper deamon
>
> "deamon" is surely wrong (it'd be "daemon"), but maybe "(background)
> worker" is better?

Yes, "background worker” is better, updated.

>> diff --git a/src/include/storage/bufpage.h b/src/include/storage/bufpage.h
>> index 85dd10c45a..bd46bf2ce6 100644
>> --- a/src/include/storage/bufpage.h
>> +++ b/src/include/storage/bufpage.h
>> @@ -194,6 +194,7 @@ typedef PageHeaderData *PageHeader;
>>  */
>> #define PG_PAGE_LAYOUT_VERSION        4
>> #define PG_DATA_CHECKSUM_VERSION    1
>> +#define PG_DATA_CHECKSUM_INPROGRESS_VERSION        2
>
> I am not very sure about the semantics of PG_DATA_CHECKSUM_VERSION being
> 1, but I assumed it was a version, like, if we ever decide to use a
> different checksumming algorithm, we'd bump it to 2.
>
> Now PG_DATA_CHECKSUM_INPROGRESS_VERSION is defined to 2, which I agree
> is convenient, but is there some strategy what to do about this in case
> the PG_DATA_CHECKSUM_VERSION needs to be increased?

I don’t think this has been discussed in any thread dealing with enabling
checksums in an online cluster, where using the version was mentioned (or my
archive search is failing me).  Iff another algorithm was added I assume we’d
need something like the proposed checksumhelper to allow migrations from
version 1.  Just as we now use version 2 to indicate that we are going from
version 0 to version 1, the same can be used for going from version 1 to 3 as
long as there is a source and destination version recorded.

I don’t know what such a process would look like, but I don’t see that we are
blocking a future new checksum algorithm by using version 2 for inprogress
(although I agree that using the version here is closer to convenient than
elegant).

Attached v4 of this patch, which addresses this review, and flipping status
back in the CF app back to Needs Review.

cheers ./daniel


Вложения

Re: Online enabling of checksums

От
Michael Banck
Дата:
Hi,

On Thu, Mar 15, 2018 at 02:01:26PM +0100, Daniel Gustafsson wrote:
> > On 10 Mar 2018, at 16:09, Michael Banck <michael.banck@credativ.de> wrote:
> > I am aware that this is discussed already, but as-is the user experience
> > is pretty bad, I think pg_enable_data_checksums() should either bail
> > earlier if it cannot connect to all databases, or it should be better
> > documented.
> 
> Personally I think we should first attempt to solve the "allow-connections in
> background workers” issue which has been raised on another thread.  For now I’m
> documenting this better.

I had a look at that thread and it seems stalled, I am a bit worried
that this will not be solved before the end of the CF.

So I think unless the above gets solved, pg_enable_data_checksums()
should error out with the hint.  I've had a quick look and it seems one
can partly duplicate the check from BuildDatabaseList() (or optionally
move it) to the beginning of StartChecksumHelperLauncher(), see
attached.

That results in:

postgres=# SELECT pg_enable_data_checksums();
ERROR:  Database "template0" does not allow connections.
HINT:  Allow connections using ALTER DATABASE and try again.
postgres=# 

Which I think is much nice than what we have right now:

postgres=# SELECT pg_enable_data_checksums();
 pg_enable_data_checksums 
--------------------------
 t
(1 row)

postgres=# \q
postgres@fock:~$ tail -3 pg1.log 
2018-03-18 14:00:08.512 CET [25514] ERROR:  Database "template0" does not allow connections.
2018-03-18 14:00:08.512 CET [25514] HINT:  Allow connections using ALTER DATABASE and try again.
2018-03-18 14:00:08.513 CET [24930] LOG:  background worker "checksum helper launcher" (PID 25514) exited with exit
code1
 

> Attached v4 of this patch, which addresses this review, and flipping status
> back in the CF app back to Needs Review.
 
Thanks!

The following errmsg() capitalize the error message without the first
word being a specific term, which I believe is not project style:

+            (errmsg("Checksum helper is currently running, cannot disable checksums"),
+                        (errmsg("Database \"%s\" dropped, skipping", db->dbname)));
+            (errmsg("Checksums enabled, checksumhelper launcher shutting down")));
+                    (errmsg("Database \"%s\" does not allow connections.", NameStr(pgdb->datname)),
+            (errmsg("Checksum worker starting for database oid %d", dboid)));
+            (errmsg("Checksum worker completed in database oid %d", dboid)));

Also, in src/backend/postmaster/checksumhelper.c there are few
multi-line comments (which are not function comments) that do not have a
full stop at the end, which I think is also project style:

+         * Failed to set means somebody else started

Could be changed to a one-line (/* ... */) comment?

+     * Force a checkpoint to get everything out to disk

Should have a full stop.

+         * checksummed, so skip

Should have a full stop.

+     * Enable vacuum cost delay, if any

Could be changed to a one-line comment?

+     * Create and set the vacuum strategy as our buffer strategy

Could be changed to a one-line comment?

Apart from that, I previously complained about the error in
pg_verify_checksums:

+                               fprintf(stderr, _("%s: %s, block %d, invalid checksum in file %X, calculated %X\n"),
+                                               progname, fn, blockno, header->pd_checksum, csum);

I still propose something like in backend/storage/page/bufpage.c's
PageIsVerified(), e.g.:

|fprintf(stderr, _("%s: checksum verification failed in file \"%s\", block %d: calculated checksum %X but expected
%X\n"),
|         progname, fn, blockno, csum, header->pd_checksum);

Otherwise, I had a quick look over v4 and found no further issues.
Hopefully I will be able to test it on some bigger test databases next
week.

I'm switching the state back to 'Waiting on Author'; if you think the
above points are moot, maybe switch it back to 'Needs Review' as Andrey
Borodin also marked himself down as reviewer and might want to have
another look as well.


Cheers,

Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Вложения

Re: Online enabling of checksums

От
Andrey Borodin
Дата:
Hi!

> 18 марта 2018 г., в 19:02, Michael Banck <michael.banck@credativ.de> написал(а):
>
> Otherwise, I had a quick look over v4 and found no further issues.
> Hopefully I will be able to test it on some bigger test databases next
> week.
>
> I'm switching the state back to 'Waiting on Author'; if you think the
> above points are moot, maybe switch it back to 'Needs Review' as Andrey
> Borodin also marked himself down as reviewer and might want to have
> another look as well.

Yep, i'm already doing another pass on the code again. Hope to finish tomorrow.
My 2 cents, there's typo in the word "connections"
+     <literal>template0</literal> is by default not accepting connetions, to

Best regards, Andrey Borodin.

Re: Online enabling of checksums

От
Daniel Gustafsson
Дата:
> On 18 Mar 2018, at 15:02, Michael Banck <michael.banck@credativ.de> wrote:
> On Thu, Mar 15, 2018 at 02:01:26PM +0100, Daniel Gustafsson wrote:
>>> On 10 Mar 2018, at 16:09, Michael Banck <michael.banck@credativ.de> wrote:
>>> I am aware that this is discussed already, but as-is the user experience
>>> is pretty bad, I think pg_enable_data_checksums() should either bail
>>> earlier if it cannot connect to all databases, or it should be better
>>> documented.
>>
>> Personally I think we should first attempt to solve the "allow-connections in
>> background workers” issue which has been raised on another thread.  For now I’m
>> documenting this better.
>
> I had a look at that thread and it seems stalled, I am a bit worried
> that this will not be solved before the end of the CF.
>
> So I think unless the above gets solved, pg_enable_data_checksums()
> should error out with the hint.  I've had a quick look and it seems one
> can partly duplicate the check from BuildDatabaseList() (or optionally
> move it) to the beginning of StartChecksumHelperLauncher(), see
> attached.

I’ve incorporated a slightly massaged version of your patch.  While it is a
little ugly to duplicate the logic, it’s hopefully a short-term fix, and I
agree that silently failing is even uglier. Thanks for the proposal!

It should be noted that a database may well be altered to not allow connections
between the checksum helper starts, or gets the database list, and when it
tries to checksum it, so we might still fail on this very issue with the checks
bypassed.  Still improves the UX to catch low hanging fruit of course.

> The following errmsg() capitalize the error message without the first
> word being a specific term, which I believe is not project style:

Fixed.

> Also, in src/backend/postmaster/checksumhelper.c there are few
> multi-line comments (which are not function comments) that do not have a
> full stop at the end, which I think is also project style:

I’ve addressed all of these, but I did leave one as a multi-line which I think
looks better as that.  I also did some spellchecking and general tidying up of
the error messages and comments in the checksum helper.

> Apart from that, I previously complained about the error in
> pg_verify_checksums:

Updated to your suggestion.  While in there I re-wrote a few other error
messages to be consistent in message and quoting.

> Otherwise, I had a quick look over v4 and found no further issues.
> Hopefully I will be able to test it on some bigger test databases next
> week.

Thanks a lot for your reviews!

cheers ./daniel


Вложения

Re: Online enabling of checksums

От
Daniel Gustafsson
Дата:
> On 18 Mar 2018, at 17:21, Andrey Borodin <x4mmm@yandex-team.ru> wrote:
>
>> 18 марта 2018 г., в 19:02, Michael Banck <michael.banck@credativ.de> написал(а):
>>
>> Otherwise, I had a quick look over v4 and found no further issues.
>> Hopefully I will be able to test it on some bigger test databases next
>> week.
>>
>> I'm switching the state back to 'Waiting on Author'; if you think the
>> above points are moot, maybe switch it back to 'Needs Review' as Andrey
>> Borodin also marked himself down as reviewer and might want to have
>> another look as well.
>
> Yep, i'm already doing another pass on the code again. Hope to finish tomorrow.
> My 2 cents, there's typo in the word "connections"
> +     <literal>template0</literal> is by default not accepting connetions, to

Fixed in patch just posted in 84693D0C-772F-45C2-88A1-85B4983A5780@yesql.se
(version 5). Thanks!

cheers ./daniel

Re: Online enabling of checksums

От
Heikki Linnakangas
Дата:
Hi,

The patch looks good to me at a high level. Some notes below. I didn't 
read through the whole thread, so sorry if some of these have been 
discussed already.

> +void
> +ShutdownChecksumHelperIfRunning(void)
> +{
> +    if (pg_atomic_unlocked_test_flag(&ChecksumHelperShmem->launcher_started))
> +    {
> +        /* Launcher not started, so nothing to shut down */
> +        return;
> +    }
> +
> +    ereport(ERROR,
> +            (errmsg("checksum helper is currently running, cannot disable checksums"),
> +             errhint("Restart the cluster or wait for the worker to finish.")));
> +}

Is there no way to stop the checksum helper once it's started? That 
seems rather user-unfriendly. I can imagine it being a pretty common 
mistake to call pg_enable_data_checksums() on a 10 TB cluster, only to 
realize that you forgot to set the cost limit, and that it's hurting 
queries too much. At that point, you want to abort.

> + * This is intended to create the worklist for the workers to go through, and
> + * as we are only concerned with already existing databases we need to ever
> + * rebuild this list, which simplifies the coding.

I can't parse this sentence.

> +extern bool DataChecksumsEnabledOrInProgress(void);
> +extern bool DataChecksumsInProgress(void);
> +extern bool DataChecksumsDisabled(void);

I find the name of the DataChecksumsEnabledOrInProgress() function a bit 
long. And doing this in PageIsVerified looks a bit weird:

 > if (DataChecksumsEnabledOrInProgress() && !DataChecksumsInProgress())

I think I'd prefer functions like:

/* checksums should be computed on write? */
bool DataChecksumNeedWrite()
/* checksum should be verified on read? */
bool DataChecksumNeedVerify()


> +     <literal>template0</literal> is by default not accepting connections, to
> +     enable checksums you'll need to temporarily make it accept connections.

This was already discussed, and I agree with the other comments that 
it's bad.

> +CREATE OR REPLACE FUNCTION pg_enable_data_checksums (
> +        cost_delay int DEFAULT 0, cost_limit int DEFAULT 100)
> +  RETURNS boolean STRICT VOLATILE LANGUAGE internal AS 'enable_data_checksums'
> +  PARALLEL RESTRICTED;

pg_[enable|disable]_checksums() functions return a bool. It's not clear 
to me when they would return what. I'd suggest marking them as 'void' 
instead.

> --- a/src/include/storage/bufpage.h
> +++ b/src/include/storage/bufpage.h
> @@ -194,6 +194,7 @@ typedef PageHeaderData *PageHeader;
>   */
>  #define PG_PAGE_LAYOUT_VERSION        4
>  #define PG_DATA_CHECKSUM_VERSION    1
> +#define PG_DATA_CHECKSUM_INPROGRESS_VERSION        2
>  

This seems like a weird place for these PG_DATA_CHECKSUM_* constants. 
They're not actually stored in the page header, as you might think. I 
think the idea was to keep PG_DATA_CHECKSUM_VERSION close to 
PG_PAGE_LAYOUT_VERSION, because the checksums affected the on-disk 
format. I think it was a bit weird even before this patch, but now it's 
worse. At least a better comment would be in order, or maybe move these 
elsewhere.

Feature request: it'd be nice to update the 'ps status' with 
set_ps_display, to show a rudimentary progress indicator. Like the name 
and block number of the relation being processed. It won't tell you how 
much is left to go, but at least it will give a warm fuzzy feeling to 
the DBA that something is happening.

I didn't review the offline tool, pg_verify_checksums.

- Heikki



Re: Online enabling of checksums

От
Andrey Borodin
Дата:
Hi, Daniel!

> 19 марта 2018 г., в 4:01, Daniel Gustafsson <daniel@yesql.se> написал(а):
>
> Fixed in patch just posted in 84693D0C-772F-45C2-88A1-85B4983A5780@yesql.se
> (version 5). Thanks!


I've been hacking a bit in neighboring thread.
And come across one interesting thing. There was a patch on this CF on enabling checksums for SLRU. The thing is CLOG
isnot protected with checksums right now. But the bad thing about it is that there's no reserved place for checksums in
SLRU.
And this conversion from page without checksum to page with checksum is quite impossible online.

If we commit online checksums before SLRU checksums, we will need very neat hacks if we decide to protect SLRU
eventually.

What do you think about this problem?

Best regards, Andrey Borodin.

Re: Online enabling of checksums

От
Magnus Hagander
Дата:
On Mon, Mar 19, 2018 at 11:40 AM, Andrey Borodin <x4mmm@yandex-team.ru> wrote:
Hi, Daniel!

> 19 марта 2018 г., в 4:01, Daniel Gustafsson <daniel@yesql.se> написал(а):
>
> Fixed in patch just posted in 84693D0C-772F-45C2-88A1-85B4983A5780@yesql.se
> (version 5). Thanks!


I've been hacking a bit in neighboring thread.
And come across one interesting thing. There was a patch on this CF on enabling checksums for SLRU. The thing is CLOG is not protected with checksums right now. But the bad thing about it is that there's no reserved place for checksums in SLRU.
And this conversion from page without checksum to page with checksum is quite impossible online.

If we commit online checksums before SLRU checksums, we will need very neat hacks if we decide to protect SLRU eventually.

What do you think about this problem?

One would be adjusted to work with the other, yes. It makes no sense to now allow online enabling once SLRU protection is in there, and it doesn't make sense for either of these patches to be blocking the other one for commit, though it would of course be best to get both included.


--

Re: Online enabling of checksums

От
Magnus Hagander
Дата:
On Mon, Mar 19, 2018 at 12:24 PM, Magnus Hagander <magnus@hagander.net> wrote:
On Mon, Mar 19, 2018 at 11:40 AM, Andrey Borodin <x4mmm@yandex-team.ru> wrote:
Hi, Daniel!

> 19 марта 2018 г., в 4:01, Daniel Gustafsson <daniel@yesql.se> написал(а):
>
> Fixed in patch just posted in 84693D0C-772F-45C2-88A1-85B4983A5780@yesql.se
> (version 5). Thanks!


I've been hacking a bit in neighboring thread.
And come across one interesting thing. There was a patch on this CF on enabling checksums for SLRU. The thing is CLOG is not protected with checksums right now. But the bad thing about it is that there's no reserved place for checksums in SLRU.
And this conversion from page without checksum to page with checksum is quite impossible online.

If we commit online checksums before SLRU checksums, we will need very neat hacks if we decide to protect SLRU eventually.

What do you think about this problem?

One would be adjusted to work with the other, yes. It makes no sense to now allow online enabling once SLRU protection is in there, and it doesn't make sense for either of these patches to be blocking the other one for commit, though it would of course be best to get both included.


Makes no sense to *not* allow it, of course. Meaning yes, that should be handled.

We don' t need to convert from "page format with no support for checksums" (pre-11) to "page format with support for checksums" (11+) online.

We do need to convert from "page format with support for checksums but no checksums enabled" (11+) to "checksums enabled" online.


--

Re: Online enabling of checksums

От
Andrey Borodin
Дата:
Hi Magnus!

19 марта 2018 г., в 16:57, Magnus Hagander <magnus@hagander.net> написал(а):

On Mon, Mar 19, 2018 at 12:24 PM, Magnus Hagander <magnus@hagander.net> wrote:
On Mon, Mar 19, 2018 at 11:40 AM, Andrey Borodin <x4mmm@yandex-team.ru> wrote:
Hi, Daniel!
If we commit online checksums before SLRU checksums, we will need very neat hacks if we decide to protect SLRU eventually.

What do you think about this problem?

One would be adjusted to work with the other, yes. It makes no sense to now allow online enabling once SLRU protection is in there, and it doesn't make sense for either of these patches to be blocking the other one for commit, though it would of course be best to get both included.


Makes no sense to *not* allow it, of course. Meaning yes, that should be handled.

We don' t need to convert from "page format with no support for checksums" (pre-11) to "page format with support for checksums" (11+) online.

We do need to convert from "page format with support for checksums but no checksums enabled" (11+) to "checksums enabled" online.

Oh, yes, you are right. Everything is fine, sorry for the noise.

Best regards, Andrey Borodin.

Re: Online enabling of checksums

От
Andrey Borodin
Дата:
Hi!

> 19 марта 2018 г., в 11:27, Heikki Linnakangas <hlinnaka@iki.fi> написал(а):
>
> Is there no way to stop the checksum helper once it's started? That seems rather user-unfriendly. I can imagine it
beinga pretty common mistake to call pg_enable_data_checksums() on a 10 TB cluster, only to realize that you forgot to
setthe cost limit, and that it's hurting queries too much. At that point, you want to abort. 
I've tried to pg_cancel_backend() and it worked.
But only if I cancel "checksum helper launcher" and then "checksum helper worker". If I cancel helper first - it spawns
new.

Magnus, Daniel, is it safe to cancel worker or launcher?

BTW, I have some questions on pg_verify_chechsums.
It does not check catalog version. It it true that it will work for any?
Also, pg_verify_chechsums will stop on short reads. But we do not stop on wrong checksum, may be we should not stop on
shortreads either? 

I agree with all of Heikki's comments.

Besides these I have no other questions, patch looks good.

Best regards, Andrey Borodin.

Re: Online enabling of checksums

От
Magnus Hagander
Дата:


On Tue, Mar 20, 2018 at 10:29 AM, Andrey Borodin <x4mmm@yandex-team.ru> wrote:
Hi!

> 19 марта 2018 г., в 11:27, Heikki Linnakangas <hlinnaka@iki.fi> написал(а):
>
> Is there no way to stop the checksum helper once it's started? That seems rather user-unfriendly. I can imagine it being a pretty common mistake to call pg_enable_data_checksums() on a 10 TB cluster, only to realize that you forgot to set the cost limit, and that it's hurting queries too much. At that point, you want to abort.
I've tried to pg_cancel_backend() and it worked.
But only if I cancel "checksum helper launcher" and then "checksum helper worker". If I cancel helper first - it spawns new.

Magnus, Daniel, is it safe to cancel worker or launcher?

It should be perfectly safe yes. However, it's not very user friendly, because you have to go around and cancel both (meaning you actually have to cancel them in the right order, or a new one will be launched).

Daniel is working on a proper way to stop things.

 

BTW, I have some questions on pg_verify_chechsums.
It does not check catalog version. It it true that it will work for any?

Yes. The actual page format for checksums has not changed since checksums were introduced. I have successfully used it to validate checksums on v10 clusters for example.

 
Also, pg_verify_chechsums will stop on short reads. But we do not stop on wrong checksum, may be we should not stop on short reads either?


These are very different scenarios though -- it's explicitly intended to validate checksums, and short reads is a different issue. I prefer the way it does it now, but I am not strongly locked into that position and can be convinced otherwise :)

--

Re: Online enabling of checksums

От
Magnus Hagander
Дата:


On Mon, Mar 19, 2018 at 7:27 AM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
Hi,

The patch looks good to me at a high level. Some notes below. I didn't read through the whole thread, so sorry if some of these have been discussed already.

+void
+ShutdownChecksumHelperIfRunning(void)
+{
+       if (pg_atomic_unlocked_test_flag(&ChecksumHelperShmem->launcher_started))
+       {
+               /* Launcher not started, so nothing to shut down */
+               return;
+       }
+
+       ereport(ERROR,
+                       (errmsg("checksum helper is currently running, cannot disable checksums"),
+                        errhint("Restart the cluster or wait for the worker to finish.")));
+}

Is there no way to stop the checksum helper once it's started? That seems rather user-unfriendly. I can imagine it being a pretty common mistake to call pg_enable_data_checksums() on a 10 TB cluster, only to realize that you forgot to set the cost limit, and that it's hurting queries too much. At that point, you want to abort.

Agreed. You could do it with pg_terminate_backend() but you had to do it in the right order, etc.

Attached patch fixes this by making it possible to abort the process by executing pg_disable_data_checksums() during the process. In this case the live workers will abort, and the checksums will be switched off again.
 


+extern bool DataChecksumsEnabledOrInProgress(void);
+extern bool DataChecksumsInProgress(void);
+extern bool DataChecksumsDisabled(void);

I find the name of the DataChecksumsEnabledOrInProgress() function a bit long. And doing this in PageIsVerified looks a bit weird:

> if (DataChecksumsEnabledOrInProgress() && !DataChecksumsInProgress())

I think I'd prefer functions like:

/* checksums should be computed on write? */
bool DataChecksumNeedWrite()
/* checksum should be verified on read? */
bool DataChecksumNeedVerify()

Agreed. We also need DataChecksumsInProgress() to make it work properly, but that makes the names a lot more clear. Adjusted as such. 

 
+     <literal>template0</literal> is by default not accepting connections, to
+     enable checksums you'll need to temporarily make it accept connections.

This was already discussed, and I agree with the other comments that it's bad.

Do you have any opinion on the thread about how to handle that one? As in which way to go about solving it? (The second thread linked from the CF entry - it wasn't linked before as intended, but it is now)



+CREATE OR REPLACE FUNCTION pg_enable_data_checksums (
+        cost_delay int DEFAULT 0, cost_limit int DEFAULT 100)
+  RETURNS boolean STRICT VOLATILE LANGUAGE internal AS 'enable_data_checksums'
+  PARALLEL RESTRICTED;

pg_[enable|disable]_checksums() functions return a bool. It's not clear to me when they would return what. I'd suggest marking them as 'void' instead.

Agreed, changed.


--- a/src/include/storage/bufpage.h
+++ b/src/include/storage/bufpage.h
@@ -194,6 +194,7 @@ typedef PageHeaderData *PageHeader;
  */
 #define PG_PAGE_LAYOUT_VERSION         4
 #define PG_DATA_CHECKSUM_VERSION       1
+#define PG_DATA_CHECKSUM_INPROGRESS_VERSION            2
 

This seems like a weird place for these PG_DATA_CHECKSUM_* constants. They're not actually stored in the page header, as you might think. I think the idea was to keep PG_DATA_CHECKSUM_VERSION close to PG_PAGE_LAYOUT_VERSION, because the checksums affected the on-disk format. I think it was a bit weird even before this patch, but now it's worse. At least a better comment would be in order, or maybe move these elsewhere.

True, but they apply to the page level? I'm not sure about the original reasoning to put them there,figured it wasn't the responsibility of this patch to move them. But we can certainly do that.

But what would be an appropriate place elsewhere? First thought would be pg_control.h, but that would then not be consistent with e.g. wal_level (which is declared in xlog.h and not pg_control.h..


Feature request: it'd be nice to update the 'ps status' with set_ps_display, to show a rudimentary progress indicator. Like the name and block number of the relation being processed. It won't tell you how much is left to go, but at least it will give a warm fuzzy feeling to the DBA that something is happening.

In the attached patch it sets this information in pg_stat_activity. I think that makes more sense than the ps display, and I think is more consistent with other ways we use them (e.g. autovacuum doesn't update it's ps title for every table, but it does update pg_stat_activity).


I didn't review the offline tool, pg_verify_checksums.

PFA a patch that takes into account these comments and we believe all other pending ones as well.


--
Вложения

Re: Online enabling of checksums

От
Tomas Vondra
Дата:
Hi,

I see enable_data_checksums() does this:

    if (cost_limit <= 0)
        ereport(ERROR,
                (errmsg("cost limit must be a positive value")));

Is there a reason not to allow -1 (no limit), just like for vacuum_cost?


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Online enabling of checksums

От
Magnus Hagander
Дата:
On Mon, Mar 26, 2018 at 10:09 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
Hi,

I see enable_data_checksums() does this:

    if (cost_limit <= 0)
        ereport(ERROR,
                (errmsg("cost limit must be a positive value")));

Is there a reason not to allow -1 (no limit), just like for vacuum_cost?


Eh. vaccum_cost_limit cannot be set to -1 (1 is the lowest). Neither can vacuum_cost_delay -- it is set to *0* to disable it (which is how the cost_delay parameter is handled here as well).

Are you thinking autovacuum_vacuum_cost_limit where -1 means "use vacuum_cost_limit"?

The reason to disallow cost_limit=0 is to avoid divide-by-zero. We could allow -1 and have it mean "use vacuum_cost_limit", but I'm not sure how relevant that really would be in this context?

--

Re: Online enabling of checksums

От
Tomas Vondra
Дата:
On 03/27/2018 08:56 AM, Magnus Hagander wrote:
> On Mon, Mar 26, 2018 at 10:09 PM, Tomas Vondra
> <tomas.vondra@2ndquadrant.com <mailto:tomas.vondra@2ndquadrant.com>> wrote:
> 
>     Hi,
> 
>     I see enable_data_checksums() does this:
> 
>         if (cost_limit <= 0)
>             ereport(ERROR,
>                     (errmsg("cost limit must be a positive value")));
> 
>     Is there a reason not to allow -1 (no limit), just like for vacuum_cost?
> 
> 
> Eh. vaccum_cost_limit cannot be set to -1 (1 is the lowest). Neither can
> vacuum_cost_delay -- it is set to *0* to disable it (which is how the
> cost_delay parameter is handled here as well).
> 
> Are you thinking autovacuum_vacuum_cost_limit where -1 means "use
> vacuum_cost_limit"?
> 
> The reason to disallow cost_limit=0 is to avoid divide-by-zero. We could
> allow -1 and have it mean "use vacuum_cost_limit", but I'm not sure how
> relevant that really would be in this context?
> 

D'oh! You're right, of course.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Online enabling of checksums

От
Tomas Vondra
Дата:
Hi,

I've just noticed the patch does not update

    src/backend/storage/page/README

which is in fact about checksums. Most of it remains valid, but it also
mentions that currently it's an initdb-time choice.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Online enabling of checksums

От
Tomas Vondra
Дата:
Hi,

I've been looking at the patch a bit more, and I think there are a
couple of fairly serious issues in the error handling.

Firstly ChecksumHelperLauncherMain spends quite a bit of effort on
skipping dropped databases, but ChecksumHelperWorkerMain does not do the
same thing with tables. I'm not exactly sure why, but I'd say dropped
tables are more likely than dropped databases (e.g. because of temporary
tables) and it's strange to gracefully handle the more rare case.

Now, when a table gets dropped after BuildRelationList() does it's work,
we end up calling ProcessSingleRelationByOid() on that OID. Which calls
relation_open(), which fails with elog(ERROR), terminating the whole
bgworker with an error like this:

    ERROR:  could not open relation with OID 16632
    LOG:  background worker "checksumhelper worker" (PID 27152) exited
          with exit code 1

Which however means the error handling in ChecksumHelperWorkerMain() has
no chance to kick in, because the bgworker dies right away. The code
looks like this:

    foreach(lc, RelationList)
    {
        ChecksumHelperRelation *rel
             = (ChecksumHelperRelation *) lfirst(lc);

        if (!ProcessSingleRelationByOid(rel->reloid, strategy))
        {
            ChecksumHelperShmem->success = ABORTED;
            break;
        }
        else
            ChecksumHelperShmem->success = SUCCESSFUL;
    }
    list_free_deep(RelationList);

Now, assume the first relation in the list still exists and gets
processed correctly, so "success" ends up being SUCCESSFUL. Then the
second OID is the dropped relation, which kills the bgworker ...

The launcher however does not realize anything went wrong, because the
flag still says SUCCESSFUL. And so it merrily switches checksums to
"on", leading to this on the rest of the relations:

    WARNING:  page verification failed, calculated checksum 58644 but
              expected 0
    ERROR:  invalid page in block 0 of relation base/16631/16653

Yikes!

IMHO this error handling is broken by design - two things need to
happen, I think: (a) graceful handling of dropped relations and (b)
proper error reporting from the bgworder.

(a) Should not be difficult to do, I think. We don't have relation_open
with a missing_ok flag, but implementing something like that should not
be difficult. Even a simple "does OID exist" should be enough.

(b) But just handling dropped relations is not enough, because I could
simply kill the bgworker directly, and it would have exactly the same
consequences. What needs to happen is something like this:

    ChecksumHelperResult local_success = SUCCESFUL;

    foreach(lc, RelationList)
    {
        ChecksumHelperRelation *rel
             = (ChecksumHelperRelation *) lfirst(lc);

        if (!ProcessSingleRelationByOid(rel->reloid, strategy))
        {
            local_success = ABORTED;
            break;
        }
    }
    list_free_deep(RelationList);

    ChecksumHelperShmem->success = local_success;

That is, leave the flag in shred memory set to FAILED until the very
last moment, and only when everything went fine set it to SUCCESSFUL.


BTW I don't think handling dropped relations by letting the bgworker
crash and restart is an acceptable approach. That would pretty much mean
any DDL changes are prohibited on the system while the checksum process
is running, which is not quite possible (e.g. for systems doing stuff
with temporary tables).

Which however reminds me I've also ran into a bug in the automated retry
system, because you may get messages like this:

    ERROR:  failed to enable checksums in "test", giving up (attempts
            639968292).

This happens because BuildDatabaseList() does just palloc() and does not
initialize the 'attempts' field. It may get initialized to 0 by chance,
but I'm running with -DRANDOMIZE_ALLOCATED_MEMORY, hence the insanely
high value.

BTW both ChecksumHelperRelation and ChecksumHelperDatabase have
'success' field which is actually unused (and uninitialized).

But wait - there is more ;-) BuildRelationList is using heap_beginscan
with the regular snapshot, so it does not see uncommitted transactions.
So if you do this:

  BEGIN;
  CREATE TABLE t AS SELECT i FROM generate_series(1,10000000) s(i);
  -- run pg_enable_data_checksums() from another session
  SELECT COUNT(*) FROM t;

then the table will be invisible to the checksum worker, it won't have
checksums updated and the cluster will get checksums enabled. Which
means this:

  test=# SELECT COUNT(*) FROM t;
  WARNING:  page verification failed, calculated checksum 27170 but
            expected 0
  ERROR:  invalid page in block 0 of relation base/16677/16683

Not sure what's the best way to fix this - maybe we could wait for all
running transactions to end, before starting the work.

And if you try this with a temporary table (not hidden in transaction,
so the bgworker can see it), the worker will fail with this:

  ERROR:  cannot access temporary tables of other sessions

But of course, this is just another way how to crash without updating
the result for the launcher, so checksums may end up being enabled anyway.

Not great, I guess :-(


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Online enabling of checksums

От
Magnus Hagander
Дата:
On Sat, Mar 31, 2018 at 2:08 AM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
Hi,

I've been looking at the patch a bit more, and I think there are a
couple of fairly serious issues in the error handling.

Thanks!
 

Firstly ChecksumHelperLauncherMain spends quite a bit of effort on
skipping dropped databases, but ChecksumHelperWorkerMain does not do the
same thing with tables. I'm not exactly sure why, but I'd say dropped
tables are more likely than dropped databases (e.g. because of temporary
tables) and it's strange to gracefully handle the more rare case.

Uh, yes. I could've sworn we had code for that, but I fully agree with your assessment that it's not there :)



Now, when a table gets dropped after BuildRelationList() does it's work,
we end up calling ProcessSingleRelationByOid() on that OID. Which calls
relation_open(), which fails with elog(ERROR), terminating the whole
bgworker with an error like this:

    ERROR:  could not open relation with OID 16632
    LOG:  background worker "checksumhelper worker" (PID 27152) exited
          with exit code 1

Yeah. We need to trap that error an not crash and burn.


Which however means the error handling in ChecksumHelperWorkerMain() has
no chance to kick in, because the bgworker dies right away. The code
looks like this:

    foreach(lc, RelationList)
    {
        ChecksumHelperRelation *rel
             = (ChecksumHelperRelation *) lfirst(lc);

        if (!ProcessSingleRelationByOid(rel->reloid, strategy))
        {
            ChecksumHelperShmem->success = ABORTED;
            break;
        }
        else
            ChecksumHelperShmem->success = SUCCESSFUL;
    }
    list_free_deep(RelationList);

Now, assume the first relation in the list still exists and gets
processed correctly, so "success" ends up being SUCCESSFUL. Then the
second OID is the dropped relation, which kills the bgworker ...

Indeed, that's just a very simple bug. It shouldn't be set to SUCCESSFUL until *all* tables have been processed. I believe the code needs to be this:


IMHO this error handling is broken by design - two things need to
happen, I think: (a) graceful handling of dropped relations and (b)
proper error reporting from the bgworder.

(a) Should not be difficult to do, I think. We don't have relation_open
with a missing_ok flag, but implementing something like that should not
be difficult. Even a simple "does OID exist" should be enough.

Not entirely sure what you mean with "even a simple does oid exist" means? I mean, if we check for the file, that won't help us -- there will still be a tiny race between the check and us opening it won't it?

However, we have try_relation_open().  Which is documented as:
 * Same as relation_open, except return NULL instead of failing
 * if the relation does not exist.

So I'm pretty sure it's just a matter of using try_relation_open() instead, and checking for NULL?


(b) But just handling dropped relations is not enough, because I could
simply kill the bgworker directly, and it would have exactly the same
consequences. What needs to happen is something like this:

<snip>
And now I see your code, which was below-fold when I first read. After having writing a very similar fix myself. I'm glad this code looks mostly identical to what I suggested above, so I think that's a good solution.

 
BTW I don't think handling dropped relations by letting the bgworker
crash and restart is an acceptable approach. That would pretty much mean
any DDL changes are prohibited on the system while the checksum process
is running, which is not quite possible (e.g. for systems doing stuff
with temporary tables).

No, I don't like that at all. We need to handle them gracefully, by skipping them - crash and restart is not acceptable for something that common.


Which however reminds me I've also ran into a bug in the automated retry
system, because you may get messages like this:

    ERROR:  failed to enable checksums in "test", giving up (attempts
            639968292).

This happens because BuildDatabaseList() does just palloc() and does not
initialize the 'attempts' field. It may get initialized to 0 by chance,
but I'm running with -DRANDOMIZE_ALLOCATED_MEMORY, hence the insanely
high value.

Eh. I don't have that "(attempts" part in my code at all. Is that either from some earlier version of the patch, or did you add that yourself for testing?



BTW both ChecksumHelperRelation and ChecksumHelperDatabase have
'success' field which is actually unused (and uninitialized).

Correct. These are old leftovers from the "partial restart" logic, and should be removed.


But wait - there is more ;-) BuildRelationList is using heap_beginscan
with the regular snapshot, so it does not see uncommitted transactions.
So if you do this:

  BEGIN;
  CREATE TABLE t AS SELECT i FROM generate_series(1,10000000) s(i);
  -- run pg_enable_data_checksums() from another session
  SELECT COUNT(*) FROM t;

then the table will be invisible to the checksum worker, it won't have
checksums updated and the cluster will get checksums enabled. Which
means this:

Ugh. Interestingly enough I just put that on my TODO list *yesterday* that I forgot to check that specific case :/



  test=# SELECT COUNT(*) FROM t;
  WARNING:  page verification failed, calculated checksum 27170 but
            expected 0
  ERROR:  invalid page in block 0 of relation base/16677/16683

Not sure what's the best way to fix this - maybe we could wait for all
running transactions to end, before starting the work.

I was thinking of that as one way to deal with it, yes.

I guess a reasonable way to do that would be to do it as part of BuildRelationList() -- basically have that one wait until there is no other running transaction in that specific database before we take the snapshot?

A first thought I had was to try to just take an access exclusive lock on pg_class for a very short time, but a transaction that does create table doesn't actually keep it's lock on that table so there is no conflict.


And if you try this with a temporary table (not hidden in transaction,
so the bgworker can see it), the worker will fail with this:

  ERROR:  cannot access temporary tables of other sessions

But of course, this is just another way how to crash without updating
the result for the launcher, so checksums may end up being enabled anyway.

Yeah, there will be plenty of side-effect issues from that crash-with-wrong-status case. Fixing that will at least make things safer -- in that checksums won't be enabled when not put on all pages. 

I have attached a patch that fixes the "easy" ones per your first comments. No solution for the open-transaction yet, but I wanted to put the rest out there -- especially if you have automated tests you can send it through.

--
Вложения

Re: Online enabling of checksums

От
Tomas Vondra
Дата:
Hi,

On 03/31/2018 02:02 PM, Magnus Hagander wrote:
> On Sat, Mar 31, 2018 at 2:08 AM, Tomas Vondra
> <tomas.vondra@2ndquadrant.com <mailto:tomas.vondra@2ndquadrant.com>> wrote:
>
> ...
> 
>     (a) Should not be difficult to do, I think. We don't have relation_open
>     with a missing_ok flag, but implementing something like that should not
>     be difficult. Even a simple "does OID exist" should be enough.
> 
> 
> Not entirely sure what you mean with "even a simple does oid exist"
> means? I mean, if we check for the file, that won't help us -- there
> will still be a tiny race between the check and us opening it won't it?
> 

I meant to say "even a simple check if the OID still exists" but it was
a bit too late / not enough caffeine issue. You're right there would be
a tiny window of race condition - it'd be much shorter, possibly enough
to make the error+restart approach acceptable.

> However, we have try_relation_open().  Which is documented as:
>  *Same as relation_open, except return NULL instead of failing
>  *if the relation does not exist.
> 
> So I'm pretty sure it's just a matter of using try_relation_open()
> instead, and checking for NULL?
> 

Oh, right. I thought we had a relation_open variant that handles this,
but have been looking for one with missing_ok flag and so I missed this.
try_relation_open should do the trick when it comes to dropped tables.

> 
>     (b) But just handling dropped relations is not enough, because I could
>     simply kill the bgworker directly, and it would have exactly the same
>     consequences. What needs to happen is something like this:
> 
> 
> <snip>
> And now I see your code, which was below-fold when I first read. After
> having writing a very similar fix myself. I'm glad this code looks
> mostly identical to what I suggested above, so I think that's a good
> solution.
> 

Good ;-)

>  
> 
>     BTW I don't think handling dropped relations by letting the bgworker
>     crash and restart is an acceptable approach. That would pretty much mean
>     any DDL changes are prohibited on the system while the checksum process
>     is running, which is not quite possible (e.g. for systems doing stuff
>     with temporary tables).
> 
> 
> No, I don't like that at all. We need to handle them gracefully, by
> skipping them - crash and restart is not acceptable for something that
> common.
> 

Yeah, I was just thinking aloud.

> 
>     Which however reminds me I've also ran into a bug in the automated retry
>     system, because you may get messages like this:
> 
>         ERROR:  failed to enable checksums in "test", giving up (attempts
>                 639968292).
> 
>     This happens because BuildDatabaseList() does just palloc() and does not
>     initialize the 'attempts' field. It may get initialized to 0 by chance,
>     but I'm running with -DRANDOMIZE_ALLOCATED_MEMORY, hence the insanely
>     high value.
> 
> 
> Eh. I don't have that "(attempts" part in my code at all. Is that either
> from some earlier version of the patch, or did you add that yourself for
> testing?
> 

Apologies, you're right I tweaked the message a bit (just adding the
number of attempts to it). The logic however remains the same, and the
bug is real.

>
>     But wait - there is more ;-) BuildRelationList is using heap_beginscan
>     with the regular snapshot, so it does not see uncommitted transactions.
>     So if you do this:
> 
>       BEGIN;
>       CREATE TABLE t AS SELECT i FROM generate_series(1,10000000) s(i);
>       -- run pg_enable_data_checksums() from another session
>       SELECT COUNT(*) FROM t;
> 
>     then the table will be invisible to the checksum worker, it won't have
>     checksums updated and the cluster will get checksums enabled. Which
>     means this:
> 
> 
> Ugh. Interestingly enough I just put that on my TODO list *yesterday*
> that I forgot to check that specific case :/
> 

But I was faster in reporting it ;-)

> 
>       test=# SELECT COUNT(*) FROM t;
>       WARNING:  page verification failed, calculated checksum 27170 but
>                 expected 0
>       ERROR:  invalid page in block 0 of relation base/16677/16683
> 
>     Not sure what's the best way to fix this - maybe we could wait for all
>     running transactions to end, before starting the work.
> 
> 
> I was thinking of that as one way to deal with it, yes.
> 
> I guess a reasonable way to do that would be to do it as part of
> BuildRelationList() -- basically have that one wait until there is no
> other running transaction in that specific database before we take the
> snapshot?
> 
> A first thought I had was to try to just take an access exclusive lock
> on pg_class for a very short time, but a transaction that does create
> table doesn't actually keep it's lock on that table so there is no conflict.
> 

Yeah, I don't think that's going to work, because you don't even know
you need to lock/wait for something.

I do think just waiting for all running transactions to complete is
fine, and it's not the first place where we use it - CREATE SUBSCRIPTION
does pretty much exactly the same thing (and CREATE INDEX CONCURRENTLY
too, to some extent). So we have a precedent / working code we can copy.

> 
>     And if you try this with a temporary table (not hidden in transaction,
>     so the bgworker can see it), the worker will fail with this:
> 
>       ERROR:  cannot access temporary tables of other sessions
> 
>     But of course, this is just another way how to crash without updating
>     the result for the launcher, so checksums may end up being enabled
>     anyway.
> 
> 
> Yeah, there will be plenty of side-effect issues from that
> crash-with-wrong-status case. Fixing that will at least make things
> safer -- in that checksums won't be enabled when not put on all pages. 
> 

Sure, the outcome with checksums enabled incorrectly is a consequence of
bogus status, and fixing that will prevent that. But that wasn't my main
point here - not articulated very clearly, though.

The bigger question is how to handle temporary tables gracefully, so
that it does not terminate the bgworker like this at all. This might be
even bigger issue than dropped relations, considering that temporary
tables are pretty common part of applications (and it also includes
CREATE/DROP).

For some clusters it might mean the online checksum enabling would
crash+restart infinitely (well, until reaching MAX_ATTEMPTS).

Unfortunately, try_relation_open() won't fix this, as the error comes
from ReadBufferExtended. And it's not a matter of simply creating a
ReadBuffer variant without that error check, because temporary tables
use local buffers.

I wonder if we could just go and set the checksums anyway, ignoring the
local buffers. If the other session does some changes, it'll overwrite
our changes, this time with the correct checksums. But it seems pretty
dangerous (I mean, what if they're writing stuff while we're updating
the checksums? Considering the various short-cuts for temporary tables,
I suspect that would be a boon for race conditions.)

Another option would be to do something similar to running transactions,
i.e. wait until all temporary tables (that we've seen at the beginning)
disappear. But we're starting to wait on more and more stuff.

If we do this, we should clearly log which backends we're waiting for,
so that the admins can go and interrupt them manually.

> I have attached a patch that fixes the "easy" ones per your first
> comments. No solution for the open-transaction yet, but I wanted to put
> the rest out there -- especially if you have automated tests you can
> send it through.
> 

I don't have automated tests, but I'll take a look.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Online enabling of checksums

От
Magnus Hagander
Дата:
On Sat, Mar 31, 2018 at 4:21 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:

On 03/31/2018 02:02 PM, Magnus Hagander wrote:
> On Sat, Mar 31, 2018 at 2:08 AM, Tomas Vondra
> <tomas.vondra@2ndquadrant.com <mailto:tomas.vondra@2ndquadrant.com>> wrote:
>
>
>     But wait - there is more ;-) BuildRelationList is using heap_beginscan
>     with the regular snapshot, so it does not see uncommitted transactions.
>     So if you do this:
>
>       BEGIN;
>       CREATE TABLE t AS SELECT i FROM generate_series(1,10000000) s(i);
>       -- run pg_enable_data_checksums() from another session
>       SELECT COUNT(*) FROM t;
>
>     then the table will be invisible to the checksum worker, it won't have
>     checksums updated and the cluster will get checksums enabled. Which
>     means this:
>
>
> Ugh. Interestingly enough I just put that on my TODO list *yesterday*
> that I forgot to check that specific case :/
>

But I was faster in reporting it ;-)

Indeed you were :)

 
>       test=# SELECT COUNT(*) FROM t;
>       WARNING:  page verification failed, calculated checksum 27170 but
>                 expected 0
>       ERROR:  invalid page in block 0 of relation base/16677/16683
>
>     Not sure what's the best way to fix this - maybe we could wait for all
>     running transactions to end, before starting the work.
>
>
> I was thinking of that as one way to deal with it, yes.
>
> I guess a reasonable way to do that would be to do it as part of
> BuildRelationList() -- basically have that one wait until there is no
> other running transaction in that specific database before we take the
> snapshot?
>
> A first thought I had was to try to just take an access exclusive lock
> on pg_class for a very short time, but a transaction that does create
> table doesn't actually keep it's lock on that table so there is no conflict.
>

Yeah, I don't think that's going to work, because you don't even know
you need to lock/wait for something. 

I do think just waiting for all running transactions to complete is
fine, and it's not the first place where we use it - CREATE SUBSCRIPTION
does pretty much exactly the same thing (and CREATE INDEX CONCURRENTLY
too, to some extent). So we have a precedent / working code we can copy.

Thinking again, I don't think it should be done as part of BuildRelationList(). We should just do it once in the launcher before starting, that'll be both easier and cleaner. Anything started after that will have checksums on it, so we should be fine.

PFA one that does this.


>     And if you try this with a temporary table (not hidden in transaction,
>     so the bgworker can see it), the worker will fail with this:
>
>       ERROR:  cannot access temporary tables of other sessions
>
>     But of course, this is just another way how to crash without updating
>     the result for the launcher, so checksums may end up being enabled
>     anyway.
>
>
> Yeah, there will be plenty of side-effect issues from that
> crash-with-wrong-status case. Fixing that will at least make things
> safer -- in that checksums won't be enabled when not put on all pages. 
>

Sure, the outcome with checksums enabled incorrectly is a consequence of
bogus status, and fixing that will prevent that. But that wasn't my main
point here - not articulated very clearly, though.

The bigger question is how to handle temporary tables gracefully, so
that it does not terminate the bgworker like this at all. This might be
even bigger issue than dropped relations, considering that temporary
tables are pretty common part of applications (and it also includes
CREATE/DROP).

For some clusters it might mean the online checksum enabling would
crash+restart infinitely (well, until reaching MAX_ATTEMPTS).

Unfortunately, try_relation_open() won't fix this, as the error comes
from ReadBufferExtended. And it's not a matter of simply creating a
ReadBuffer variant without that error check, because temporary tables
use local buffers.

I wonder if we could just go and set the checksums anyway, ignoring the
local buffers. If the other session does some changes, it'll overwrite
our changes, this time with the correct checksums. But it seems pretty
dangerous (I mean, what if they're writing stuff while we're updating
the checksums? Considering the various short-cuts for temporary tables,
I suspect that would be a boon for race conditions.)

Another option would be to do something similar to running transactions,
i.e. wait until all temporary tables (that we've seen at the beginning)
disappear. But we're starting to wait on more and more stuff.

If we do this, we should clearly log which backends we're waiting for,
so that the admins can go and interrupt them manually.


Yeah, waiting for all transactions at the beginning is pretty simple.

Making the worker simply ignore temporary tables would also be easy.

One of the bigger issues here is temporary tables are *session* scope and not transaction, so we'd actually need the other session to finish, not just the transaction.

I guess what we could do is something like this:

1. Don't process temporary tables in the checksumworker, period. Instead, build a list of any temporary tables that existed when the worker started in this particular database (basically anything that we got in our scan). Once we have processed the complete database, keep re-scanning pg_class until those particular tables are gone (search by oid).

That means that any temporary tables that are created *while* we are processing a database are ignored, but they should already be receiving checksums.

It definitely leads to a potential issue with long running temp tables. But as long as we look at the *actual tables* (by oid), we should be able to handle long-running sessions once they have dropped their temp tables.

Does that sound workable to you?

--
Вложения

Re: Online enabling of checksums

От
Tomas Vondra
Дата:
On 03/31/2018 05:05 PM, Magnus Hagander wrote:
> On Sat, Mar 31, 2018 at 4:21 PM, Tomas Vondra
> <tomas.vondra@2ndquadrant.com <mailto:tomas.vondra@2ndquadrant.com>> wrote:
> 
> ...
> 
>     I do think just waiting for all running transactions to complete is
>     fine, and it's not the first place where we use it - CREATE SUBSCRIPTION
>     does pretty much exactly the same thing (and CREATE INDEX CONCURRENTLY
>     too, to some extent). So we have a precedent / working code we can copy.
> 
> 
> Thinking again, I don't think it should be done as part of
> BuildRelationList(). We should just do it once in the launcher before
> starting, that'll be both easier and cleaner. Anything started after
> that will have checksums on it, so we should be fine.
> 
> PFA one that does this.
> 

Seems fine to me. I'd however log waitforxid, not the oldest one. If
you're a DBA and you want to make the checksumming to proceed, knowing
the oldest running XID is useless for that. If we log waitforxid, it can
be used to query pg_stat_activity and interrupt the sessions somehow.

> 
>     >     And if you try this with a temporary table (not hidden in transaction,
>     >     so the bgworker can see it), the worker will fail with this:
>     >
>     >       ERROR:  cannot access temporary tables of other sessions
>     >
>     >     But of course, this is just another way how to crash without updating
>     >     the result for the launcher, so checksums may end up being enabled
>     >     anyway.
>     >
>     >
>     > Yeah, there will be plenty of side-effect issues from that
>     > crash-with-wrong-status case. Fixing that will at least make things
>     > safer -- in that checksums won't be enabled when not put on all pages. 
>     >
> 
>     Sure, the outcome with checksums enabled incorrectly is a consequence of
>     bogus status, and fixing that will prevent that. But that wasn't my main
>     point here - not articulated very clearly, though.
> 
>     The bigger question is how to handle temporary tables gracefully, so
>     that it does not terminate the bgworker like this at all. This might be
>     even bigger issue than dropped relations, considering that temporary
>     tables are pretty common part of applications (and it also includes
>     CREATE/DROP).
> 
>     For some clusters it might mean the online checksum enabling would
>     crash+restart infinitely (well, until reaching MAX_ATTEMPTS).
> 
>     Unfortunately, try_relation_open() won't fix this, as the error comes
>     from ReadBufferExtended. And it's not a matter of simply creating a
>     ReadBuffer variant without that error check, because temporary tables
>     use local buffers.
> 
>     I wonder if we could just go and set the checksums anyway, ignoring the
>     local buffers. If the other session does some changes, it'll overwrite
>     our changes, this time with the correct checksums. But it seems pretty
>     dangerous (I mean, what if they're writing stuff while we're updating
>     the checksums? Considering the various short-cuts for temporary tables,
>     I suspect that would be a boon for race conditions.)
> 
>     Another option would be to do something similar to running transactions,
>     i.e. wait until all temporary tables (that we've seen at the beginning)
>     disappear. But we're starting to wait on more and more stuff.
> 
>     If we do this, we should clearly log which backends we're waiting for,
>     so that the admins can go and interrupt them manually.
> 
> 
> 
> Yeah, waiting for all transactions at the beginning is pretty simple.
> 
> Making the worker simply ignore temporary tables would also be easy.
> 
> One of the bigger issues here is temporary tables are *session* scope
> and not transaction, so we'd actually need the other session to finish,
> not just the transaction.
> 
> I guess what we could do is something like this:
> 
> 1. Don't process temporary tables in the checksumworker, period.
> Instead, build a list of any temporary tables that existed when the
> worker started in this particular database (basically anything that we
> got in our scan). Once we have processed the complete database, keep
> re-scanning pg_class until those particular tables are gone (search by oid).
> 
> That means that any temporary tables that are created *while* we are
> processing a database are ignored, but they should already be receiving
> checksums.
> 
> It definitely leads to a potential issue with long running temp tables.
> But as long as we look at the *actual tables* (by oid), we should be
> able to handle long-running sessions once they have dropped their temp
> tables.
> 
> Does that sound workable to you?
> 

Yes, that's pretty much what I meant by 'wait until all temporary tables
disappear'. Again, we need to make it easy to determine which OIDs are
we waiting for, which sessions may need DBA's attention.

I don't think it makes sense to log OIDs of the temporary tables. There
can be many of them, and in most cases the connection/session is managed
by the application, so the only thing you can do is kill the connection.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Online enabling of checksums

От
Magnus Hagander
Дата:
On Sat, Mar 31, 2018 at 5:38 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
On 03/31/2018 05:05 PM, Magnus Hagander wrote:
> On Sat, Mar 31, 2018 at 4:21 PM, Tomas Vondra
> <tomas.vondra@2ndquadrant.com <mailto:tomas.vondra@2ndquadrant.com>> wrote:
>
> ...
>
>     I do think just waiting for all running transactions to complete is
>     fine, and it's not the first place where we use it - CREATE SUBSCRIPTION
>     does pretty much exactly the same thing (and CREATE INDEX CONCURRENTLY
>     too, to some extent). So we have a precedent / working code we can copy.
>
>
> Thinking again, I don't think it should be done as part of
> BuildRelationList(). We should just do it once in the launcher before
> starting, that'll be both easier and cleaner. Anything started after
> that will have checksums on it, so we should be fine.
>
> PFA one that does this.
>

Seems fine to me. I'd however log waitforxid, not the oldest one. If
you're a DBA and you want to make the checksumming to proceed, knowing
the oldest running XID is useless for that. If we log waitforxid, it can
be used to query pg_stat_activity and interrupt the sessions somehow.

Yeah, makes sense. Updated.

 
>     >     And if you try this with a temporary table (not hidden in transaction,
>     >     so the bgworker can see it), the worker will fail with this:
>     >
>     >       ERROR:  cannot access temporary tables of other sessions
>     >
>     >     But of course, this is just another way how to crash without updating
>     >     the result for the launcher, so checksums may end up being enabled
>     >     anyway.
>     >
>     >
>     > Yeah, there will be plenty of side-effect issues from that
>     > crash-with-wrong-status case. Fixing that will at least make things
>     > safer -- in that checksums won't be enabled when not put on all pages. 
>     >
>
>     Sure, the outcome with checksums enabled incorrectly is a consequence of
>     bogus status, and fixing that will prevent that. But that wasn't my main
>     point here - not articulated very clearly, though.
>
>     The bigger question is how to handle temporary tables gracefully, so
>     that it does not terminate the bgworker like this at all. This might be
>     even bigger issue than dropped relations, considering that temporary
>     tables are pretty common part of applications (and it also includes
>     CREATE/DROP).
>
>     For some clusters it might mean the online checksum enabling would
>     crash+restart infinitely (well, until reaching MAX_ATTEMPTS).
>
>     Unfortunately, try_relation_open() won't fix this, as the error comes
>     from ReadBufferExtended. And it's not a matter of simply creating a
>     ReadBuffer variant without that error check, because temporary tables
>     use local buffers.
>
>     I wonder if we could just go and set the checksums anyway, ignoring the
>     local buffers. If the other session does some changes, it'll overwrite
>     our changes, this time with the correct checksums. But it seems pretty
>     dangerous (I mean, what if they're writing stuff while we're updating
>     the checksums? Considering the various short-cuts for temporary tables,
>     I suspect that would be a boon for race conditions.)
>
>     Another option would be to do something similar to running transactions,
>     i.e. wait until all temporary tables (that we've seen at the beginning)
>     disappear. But we're starting to wait on more and more stuff.
>
>     If we do this, we should clearly log which backends we're waiting for,
>     so that the admins can go and interrupt them manually.
>
>
>
> Yeah, waiting for all transactions at the beginning is pretty simple.
>
> Making the worker simply ignore temporary tables would also be easy.
>
> One of the bigger issues here is temporary tables are *session* scope
> and not transaction, so we'd actually need the other session to finish,
> not just the transaction.
>
> I guess what we could do is something like this:
>
> 1. Don't process temporary tables in the checksumworker, period.
> Instead, build a list of any temporary tables that existed when the
> worker started in this particular database (basically anything that we
> got in our scan). Once we have processed the complete database, keep
> re-scanning pg_class until those particular tables are gone (search by oid).
>
> That means that any temporary tables that are created *while* we are
> processing a database are ignored, but they should already be receiving
> checksums.
>
> It definitely leads to a potential issue with long running temp tables.
> But as long as we look at the *actual tables* (by oid), we should be
> able to handle long-running sessions once they have dropped their temp
> tables.
>
> Does that sound workable to you?
>

Yes, that's pretty much what I meant by 'wait until all temporary tables
disappear'. Again, we need to make it easy to determine which OIDs are
we waiting for, which sessions may need DBA's attention.

I don't think it makes sense to log OIDs of the temporary tables. There
can be many of them, and in most cases the connection/session is managed
by the application, so the only thing you can do is kill the connection.

Yeah, agreed. I think it makes sense to show the *number* of temp tables. That's also a predictable amount of information -- logging all temp tables may as you say lead to an insane amount of data.

PFA a patch that does this. I've also added some docs for it.

And I also noticed pg_verify_checksums wasn't installed, so fixed that too.

--
Вложения

Re: Online enabling of checksums

От
Magnus Hagander
Дата:
On Sun, Apr 1, 2018 at 2:04 PM, Magnus Hagander <magnus@hagander.net> wrote:
On Sat, Mar 31, 2018 at 5:38 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
On 03/31/2018 05:05 PM, Magnus Hagander wrote:
> On Sat, Mar 31, 2018 at 4:21 PM, Tomas Vondra
> <tomas.vondra@2ndquadrant.com <mailto:tomas.vondra@2ndquadrant.com>> wrote:
>
> ...
>
>     I do think just waiting for all running transactions to complete is
>     fine, and it's not the first place where we use it - CREATE SUBSCRIPTION
>     does pretty much exactly the same thing (and CREATE INDEX CONCURRENTLY
>     too, to some extent). So we have a precedent / working code we can copy.
>
>
> Thinking again, I don't think it should be done as part of
> BuildRelationList(). We should just do it once in the launcher before
> starting, that'll be both easier and cleaner. Anything started after
> that will have checksums on it, so we should be fine.
>
> PFA one that does this.
>

Seems fine to me. I'd however log waitforxid, not the oldest one. If
you're a DBA and you want to make the checksumming to proceed, knowing
the oldest running XID is useless for that. If we log waitforxid, it can
be used to query pg_stat_activity and interrupt the sessions somehow.

Yeah, makes sense. Updated.

 
>     >     And if you try this with a temporary table (not hidden in transaction,
>     >     so the bgworker can see it), the worker will fail with this:
>     >
>     >       ERROR:  cannot access temporary tables of other sessions
>     >
>     >     But of course, this is just another way how to crash without updating
>     >     the result for the launcher, so checksums may end up being enabled
>     >     anyway.
>     >
>     >
>     > Yeah, there will be plenty of side-effect issues from that
>     > crash-with-wrong-status case. Fixing that will at least make things
>     > safer -- in that checksums won't be enabled when not put on all pages. 
>     >
>
>     Sure, the outcome with checksums enabled incorrectly is a consequence of
>     bogus status, and fixing that will prevent that. But that wasn't my main
>     point here - not articulated very clearly, though.
>
>     The bigger question is how to handle temporary tables gracefully, so
>     that it does not terminate the bgworker like this at all. This might be
>     even bigger issue than dropped relations, considering that temporary
>     tables are pretty common part of applications (and it also includes
>     CREATE/DROP).
>
>     For some clusters it might mean the online checksum enabling would
>     crash+restart infinitely (well, until reaching MAX_ATTEMPTS).
>
>     Unfortunately, try_relation_open() won't fix this, as the error comes
>     from ReadBufferExtended. And it's not a matter of simply creating a
>     ReadBuffer variant without that error check, because temporary tables
>     use local buffers.
>
>     I wonder if we could just go and set the checksums anyway, ignoring the
>     local buffers. If the other session does some changes, it'll overwrite
>     our changes, this time with the correct checksums. But it seems pretty
>     dangerous (I mean, what if they're writing stuff while we're updating
>     the checksums? Considering the various short-cuts for temporary tables,
>     I suspect that would be a boon for race conditions.)
>
>     Another option would be to do something similar to running transactions,
>     i.e. wait until all temporary tables (that we've seen at the beginning)
>     disappear. But we're starting to wait on more and more stuff.
>
>     If we do this, we should clearly log which backends we're waiting for,
>     so that the admins can go and interrupt them manually.
>
>
>
> Yeah, waiting for all transactions at the beginning is pretty simple.
>
> Making the worker simply ignore temporary tables would also be easy.
>
> One of the bigger issues here is temporary tables are *session* scope
> and not transaction, so we'd actually need the other session to finish,
> not just the transaction.
>
> I guess what we could do is something like this:
>
> 1. Don't process temporary tables in the checksumworker, period.
> Instead, build a list of any temporary tables that existed when the
> worker started in this particular database (basically anything that we
> got in our scan). Once we have processed the complete database, keep
> re-scanning pg_class until those particular tables are gone (search by oid).
>
> That means that any temporary tables that are created *while* we are
> processing a database are ignored, but they should already be receiving
> checksums.
>
> It definitely leads to a potential issue with long running temp tables.
> But as long as we look at the *actual tables* (by oid), we should be
> able to handle long-running sessions once they have dropped their temp
> tables.
>
> Does that sound workable to you?
>

Yes, that's pretty much what I meant by 'wait until all temporary tables
disappear'. Again, we need to make it easy to determine which OIDs are
we waiting for, which sessions may need DBA's attention.

I don't think it makes sense to log OIDs of the temporary tables. There
can be many of them, and in most cases the connection/session is managed
by the application, so the only thing you can do is kill the connection.

Yeah, agreed. I think it makes sense to show the *number* of temp tables. That's also a predictable amount of information -- logging all temp tables may as you say lead to an insane amount of data.

PFA a patch that does this. I've also added some docs for it.

And I also noticed pg_verify_checksums wasn't installed, so fixed that too.


PFA a rebase on top of the just committed verify-checksums patch. 

--
Вложения

Re: Online enabling of checksums

От
Tomas Vondra
Дата:
On 04/03/2018 02:05 PM, Magnus Hagander wrote:
> On Sun, Apr 1, 2018 at 2:04 PM, Magnus Hagander <magnus@hagander.net
> <mailto:magnus@hagander.net>> wrote:
> 
>     On Sat, Mar 31, 2018 at 5:38 PM, Tomas Vondra
>     <tomas.vondra@2ndquadrant.com <mailto:tomas.vondra@2ndquadrant.com>>
>     wrote:
> 
>         On 03/31/2018 05:05 PM, Magnus Hagander wrote:
>         > On Sat, Mar 31, 2018 at 4:21 PM, Tomas Vondra
>         > <tomas.vondra@2ndquadrant.com
>         <mailto:tomas.vondra@2ndquadrant.com>
>         <mailto:tomas.vondra@2ndquadrant.com
>         <mailto:tomas.vondra@2ndquadrant.com>>> wrote:
>         >
>         > ...
>         >
>         >     I do think just waiting for all running transactions to complete is
>         >     fine, and it's not the first place where we use it - CREATE SUBSCRIPTION
>         >     does pretty much exactly the same thing (and CREATE INDEX CONCURRENTLY
>         >     too, to some extent). So we have a precedent / working code we can copy.
>         >
>         >
>         > Thinking again, I don't think it should be done as part of
>         > BuildRelationList(). We should just do it once in the launcher before
>         > starting, that'll be both easier and cleaner. Anything started after
>         > that will have checksums on it, so we should be fine.
>         >
>         > PFA one that does this.
>         >
> 
>         Seems fine to me. I'd however log waitforxid, not the oldest one. If
>         you're a DBA and you want to make the checksumming to proceed,
>         knowing
>         the oldest running XID is useless for that. If we log
>         waitforxid, it can
>         be used to query pg_stat_activity and interrupt the sessions
>         somehow.
> 
> 
>     Yeah, makes sense. Updated.
> 
>      
> 
>         >     >     And if you try this with a temporary table (not
>         hidden in transaction,
>         >     >     so the bgworker can see it), the worker will fail
>         with this:
>         >     >
>         >     >       ERROR:  cannot access temporary tables of other
>         sessions
>         >     >
>         >     >     But of course, this is just another way how to crash
>         without updating
>         >     >     the result for the launcher, so checksums may end up
>         being enabled
>         >     >     anyway.
>         >     >
>         >     >
>         >     > Yeah, there will be plenty of side-effect issues from that
>         >     > crash-with-wrong-status case. Fixing that will at least
>         make things
>         >     > safer -- in that checksums won't be enabled when not put
>         on all pages. 
>         >     >
>         >
>         >     Sure, the outcome with checksums enabled incorrectly is a
>         consequence of
>         >     bogus status, and fixing that will prevent that. But that
>         wasn't my main
>         >     point here - not articulated very clearly, though.
>         >
>         >     The bigger question is how to handle temporary tables
>         gracefully, so
>         >     that it does not terminate the bgworker like this at all.
>         This might be
>         >     even bigger issue than dropped relations, considering that
>         temporary
>         >     tables are pretty common part of applications (and it also
>         includes
>         >     CREATE/DROP).
>         >
>         >     For some clusters it might mean the online checksum
>         enabling would
>         >     crash+restart infinitely (well, until reaching MAX_ATTEMPTS).
>         >
>         >     Unfortunately, try_relation_open() won't fix this, as the
>         error comes
>         >     from ReadBufferExtended. And it's not a matter of simply
>         creating a
>         >     ReadBuffer variant without that error check, because
>         temporary tables
>         >     use local buffers.
>         >
>         >     I wonder if we could just go and set the checksums anyway,
>         ignoring the
>         >     local buffers. If the other session does some changes,
>         it'll overwrite
>         >     our changes, this time with the correct checksums. But it
>         seems pretty
>         >     dangerous (I mean, what if they're writing stuff while
>         we're updating
>         >     the checksums? Considering the various short-cuts for
>         temporary tables,
>         >     I suspect that would be a boon for race conditions.)
>         >
>         >     Another option would be to do something similar to running
>         transactions,
>         >     i.e. wait until all temporary tables (that we've seen at
>         the beginning)
>         >     disappear. But we're starting to wait on more and more stuff.
>         >
>         >     If we do this, we should clearly log which backends we're
>         waiting for,
>         >     so that the admins can go and interrupt them manually.
>         >
>         >
>         >
>         > Yeah, waiting for all transactions at the beginning is pretty
>         simple.
>         >
>         > Making the worker simply ignore temporary tables would also be
>         easy.
>         >
>         > One of the bigger issues here is temporary tables are
>         *session* scope
>         > and not transaction, so we'd actually need the other session
>         to finish,
>         > not just the transaction.
>         >
>         > I guess what we could do is something like this:
>         >
>         > 1. Don't process temporary tables in the checksumworker, period.
>         > Instead, build a list of any temporary tables that existed
>         when the
>         > worker started in this particular database (basically anything
>         that we
>         > got in our scan). Once we have processed the complete
>         database, keep
>         > re-scanning pg_class until those particular tables are gone
>         (search by oid).
>         >
>         > That means that any temporary tables that are created *while*
>         we are
>         > processing a database are ignored, but they should already be
>         receiving
>         > checksums.
>         >
>         > It definitely leads to a potential issue with long running
>         temp tables.
>         > But as long as we look at the *actual tables* (by oid), we
>         should be
>         > able to handle long-running sessions once they have dropped
>         their temp
>         > tables.
>         >
>         > Does that sound workable to you?
>         >
> 
>         Yes, that's pretty much what I meant by 'wait until all
>         temporary tables
>         disappear'. Again, we need to make it easy to determine which
>         OIDs are
>         we waiting for, which sessions may need DBA's attention.
> 
>         I don't think it makes sense to log OIDs of the temporary
>         tables. There
>         can be many of them, and in most cases the connection/session is
>         managed
>         by the application, so the only thing you can do is kill the
>         connection.
> 
> 
>     Yeah, agreed. I think it makes sense to show the *number* of temp
>     tables. That's also a predictable amount of information -- logging
>     all temp tables may as you say lead to an insane amount of data.
> 
>     PFA a patch that does this. I've also added some docs for it.
> 
>     And I also noticed pg_verify_checksums wasn't installed, so fixed
>     that too.
> 
> 
> PFA a rebase on top of the just committed verify-checksums patch. 
> 

This seems OK in terms of handling errors in the worker and passing it
to the launcher. I haven't managed to do any crash testing today, but
code-wise it seems sane.

It however still fails to initialize the attempts field after allocating
the db entry in BuildDatabaseList, so if you try running with
-DRANDOMIZE_ALLOCATED_MEMORY it'll get initialized to values like this:

 WARNING:  attempts = -1684366952
 WARNING:  attempts = 1010514489
 WARNING:  attempts = -1145390664
 WARNING:  attempts = 1162101570

I guess those are not the droids we're looking for?

Likewise, I don't see where ChecksumHelperShmemStruct->abort gets
initialized. I think it only ever gets set in launcher_exit(), but that
does not seem sufficient. I suspect it's the reason for this behavior:

    test=# select pg_enable_data_checksums(10, 10);
    ERROR:  database "template0" does not allow connections
    HINT:  Allow connections using ALTER DATABASE and try again.
    test=# alter database template0 allow_connections = true;
    ALTER DATABASE
    test=# select pg_enable_data_checksums(10, 10);
    ERROR:  could not start checksumhelper: already running
    test=# select pg_disable_data_checksums();
     pg_disable_data_checksums
    ---------------------------

    (1 row)

    test=# select pg_enable_data_checksums(10, 10);
    ERROR:  could not start checksumhelper: has been cancelled

At which point the only thing you can do is restarting the cluster,
which seems somewhat unnecessary. But perhaps it's intentional?

Attached is a diff with a couple of minor comment tweaks, and correct
initialization of the attempts field.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вложения

Re: Online enabling of checksums

От
Michael Banck
Дата:
Hi,

On Tue, Apr 03, 2018 at 02:05:04PM +0200, Magnus Hagander wrote:
> PFA a rebase on top of the just committed verify-checksums patch.

For the record, I am on vacation this week and won't be able to do
further in-depth review or testing of this patch before the end of the
commitfest, sorry.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer


Re: Online enabling of checksums

От
Magnus Hagander
Дата:


On Wed, Apr 4, 2018 at 12:11 AM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
On 04/03/2018 02:05 PM, Magnus Hagander wrote:
> On Sun, Apr 1, 2018 at 2:04 PM, Magnus Hagander <magnus@hagander.net
> <mailto:magnus@hagander.net>> wrote:
>
>     On Sat, Mar 31, 2018 at 5:38 PM, Tomas Vondra
>     <tomas.vondra@2ndquadrant.com <mailto:tomas.vondra@2ndquadrant.com>>
>     wrote:
   >     >     And if you try this with a temporary table (not
>         hidden in transaction,
>         >     >     so the bgworker can see it), the worker will fail
>         with this:
>         >     >
>         >     >       ERROR:  cannot access temporary tables of other
>         sessions
>         >     >
>         >     >     But of course, this is just another way how to crash
>         without updating
>         >     >     the result for the launcher, so checksums may end up
>         being enabled
>         >     >     anyway.
>         >     >
>         >     >
>         >     > Yeah, there will be plenty of side-effect issues from that
>         >     > crash-with-wrong-status case. Fixing that will at least
>         make things
>         >     > safer -- in that checksums won't be enabled when not put
>         on all pages. 
>         >     >
>         >
>         >     Sure, the outcome with checksums enabled incorrectly is a
>         consequence of
>         >     bogus status, and fixing that will prevent that. But that
>         wasn't my main
>         >     point here - not articulated very clearly, though.
>         >
>         >     The bigger question is how to handle temporary tables
>         gracefully, so
>         >     that it does not terminate the bgworker like this at all.
>         This might be
>         >     even bigger issue than dropped relations, considering that
>         temporary
>         >     tables are pretty common part of applications (and it also
>         includes
>         >     CREATE/DROP).
>         >
>         >     For some clusters it might mean the online checksum
>         enabling would
>         >     crash+restart infinitely (well, until reaching MAX_ATTEMPTS).
>         >
>         >     Unfortunately, try_relation_open() won't fix this, as the
>         error comes
>         >     from ReadBufferExtended. And it's not a matter of simply
>         creating a
>         >     ReadBuffer variant without that error check, because
>         temporary tables
>         >     use local buffers.
>         >
>         >     I wonder if we could just go and set the checksums anyway,
>         ignoring the
>         >     local buffers. If the other session does some changes,
>         it'll overwrite
>         >     our changes, this time with the correct checksums. But it
>         seems pretty
>         >     dangerous (I mean, what if they're writing stuff while
>         we're updating
>         >     the checksums? Considering the various short-cuts for
>         temporary tables,
>         >     I suspect that would be a boon for race conditions.)
>         >
>         >     Another option would be to do something similar to running
>         transactions,
>         >     i.e. wait until all temporary tables (that we've seen at
>         the beginning)
>         >     disappear. But we're starting to wait on more and more stuff.
>         >
>         >     If we do this, we should clearly log which backends we're
>         waiting for,
>         >     so that the admins can go and interrupt them manually.
>         >
>         >
>         >
>         > Yeah, waiting for all transactions at the beginning is pretty
>         simple.
>         >
>         > Making the worker simply ignore temporary tables would also be
>         easy.
>         >
>         > One of the bigger issues here is temporary tables are
>         *session* scope
>         > and not transaction, so we'd actually need the other session
>         to finish,
>         > not just the transaction.
>         >
>         > I guess what we could do is something like this:
>         >
>         > 1. Don't process temporary tables in the checksumworker, period.
>         > Instead, build a list of any temporary tables that existed
>         when the
>         > worker started in this particular database (basically anything
>         that we
>         > got in our scan). Once we have processed the complete
>         database, keep
>         > re-scanning pg_class until those particular tables are gone
>         (search by oid).
>         >
>         > That means that any temporary tables that are created *while*
>         we are
>         > processing a database are ignored, but they should already be
>         receiving
>         > checksums.
>         >
>         > It definitely leads to a potential issue with long running
>         temp tables.
>         > But as long as we look at the *actual tables* (by oid), we
>         should be
>         > able to handle long-running sessions once they have dropped
>         their temp
>         > tables.
>         >
>         > Does that sound workable to you?
>         >
>
>         Yes, that's pretty much what I meant by 'wait until all
>         temporary tables
>         disappear'. Again, we need to make it easy to determine which
>         OIDs are
>         we waiting for, which sessions may need DBA's attention.
>
>         I don't think it makes sense to log OIDs of the temporary
>         tables. There
>         can be many of them, and in most cases the connection/session is
>         managed
>         by the application, so the only thing you can do is kill the
>         connection.
>
>
>     Yeah, agreed. I think it makes sense to show the *number* of temp
>     tables. That's also a predictable amount of information -- logging
>     all temp tables may as you say lead to an insane amount of data.
>
>     PFA a patch that does this. I've also added some docs for it.
>
>     And I also noticed pg_verify_checksums wasn't installed, so fixed
>     that too.
>
>
> PFA a rebase on top of the just committed verify-checksums patch. 
>

This seems OK in terms of handling errors in the worker and passing it
to the launcher. I haven't managed to do any crash testing today, but
code-wise it seems sane.

It however still fails to initialize the attempts field after allocating
the db entry in BuildDatabaseList, so if you try running with
-DRANDOMIZE_ALLOCATED_MEMORY it'll get initialized to values like this:

 WARNING:  attempts = -1684366952
 WARNING:  attempts = 1010514489
 WARNING:  attempts = -1145390664
 WARNING:  attempts = 1162101570

I guess those are not the droids we're looking for?

When looking at that and after a quick discussion, we just decided it's better to completely remove the retry logic. As you mentioned in some earlier mail, we had all this logic to retry databases (unlikely) but not relations (likely). Attached patch simplifies it to only detect the "database was dropped" case (which is fine), and consider every other kind of failure a permanent one and just not turn on checksums in those cases.



Likewise, I don't see where ChecksumHelperShmemStruct->abort gets
initialized. I think it only ever gets set in launcher_exit(), but that
does not seem sufficient. I suspect it's the reason for this behavior:

It's supposed to get initialized in ChecksumHelperShmemInit() -- fixed. (The whole memset-to-zero)


    test=# select pg_enable_data_checksums(10, 10);
    ERROR:  database "template0" does not allow connections
    HINT:  Allow connections using ALTER DATABASE and try again.
    test=# alter database template0 allow_connections = true;
    ALTER DATABASE
    test=# select pg_enable_data_checksums(10, 10);
    ERROR:  could not start checksumhelper: already running
    test=# select pg_disable_data_checksums();
     pg_disable_data_checksums
    ---------------------------

    (1 row)

    test=# select pg_enable_data_checksums(10, 10);
    ERROR:  could not start checksumhelper: has been cancelled

Turns out that wasn't the problem. The problem was that we *set* it before erroring out with the "does not allow connections", but never cleared it. In that case, it would be listed as launcher-is-running even though the launcher was never started.

Basically the check for datallowconn was put in the wrong place. That check should go away completely once we merge (because we should also merge the part that allows us to bypass it), but for now I have moved the check to the correct place.


At which point the only thing you can do is restarting the cluster,
which seems somewhat unnecessary. But perhaps it's intentional?

Not at all.



Attached is a diff with a couple of minor comment tweaks, and correct
initialization of the attempts field.

Thanks. This is included in the attached update, along with the above fixes and some other small touches from Daniel. 



--
Вложения

Re: Online enabling of checksums

От
Tomas Vondra
Дата:

On 4/5/18 11:07 AM, Magnus Hagander wrote:
> 
> 
> On Wed, Apr 4, 2018 at 12:11 AM, Tomas Vondra
> <tomas.vondra@2ndquadrant.com <mailto:tomas.vondra@2ndquadrant.com>> wrote:
>
> ...
>
>     It however still fails to initialize the attempts field after allocating
>     the db entry in BuildDatabaseList, so if you try running with
>     -DRANDOMIZE_ALLOCATED_MEMORY it'll get initialized to values like this:
> 
>      WARNING:  attempts = -1684366952
>      WARNING:  attempts = 1010514489
>      WARNING:  attempts = -1145390664
>      WARNING:  attempts = 1162101570
> 
>     I guess those are not the droids we're looking for?
> 
> 
> When looking at that and after a quick discussion, we just decided it's
> better to completely remove the retry logic. As you mentioned in some
> earlier mail, we had all this logic to retry databases (unlikely) but
> not relations (likely). Attached patch simplifies it to only detect the
> "database was dropped" case (which is fine), and consider every other
> kind of failure a permanent one and just not turn on checksums in those
> cases.
> 

OK, works for me.

> 
> 
>     Likewise, I don't see where ChecksumHelperShmemStruct->abort gets
>     initialized. I think it only ever gets set in launcher_exit(), but that
>     does not seem sufficient. I suspect it's the reason for this behavior:
> 
> 
> It's supposed to get initialized in ChecksumHelperShmemInit() -- fixed.
> (The whole memset-to-zero)
> 

OK, seems fine now.

> 
>         test=# select pg_enable_data_checksums(10, 10);
>         ERROR:  database "template0" does not allow connections
>         HINT:  Allow connections using ALTER DATABASE and try again.
>         test=# alter database template0 allow_connections = true;
>         ALTER DATABASE
>         test=# select pg_enable_data_checksums(10, 10);
>         ERROR:  could not start checksumhelper: already running
>         test=# select pg_disable_data_checksums();
>          pg_disable_data_checksums
>         ---------------------------
> 
>         (1 row)
> 
>         test=# select pg_enable_data_checksums(10, 10);
>         ERROR:  could not start checksumhelper: has been cancelled
> 
> 
> Turns out that wasn't the problem. The problem was that we *set* it
> before erroring out with the "does not allow connections", but never
> cleared it. In that case, it would be listed as launcher-is-running even
> though the launcher was never started.
> 
> Basically the check for datallowconn was put in the wrong place. That
> check should go away completely once we merge (because we should also
> merge the part that allows us to bypass it), but for now I have moved
> the check to the correct place.
> 

Ah, OK. I was just guessing.

> 
> 
>     Attached is a diff with a couple of minor comment tweaks, and correct
>     initialization of the attempts field.
> 
> 
> Thanks. This is included in the attached update, along with the above
> fixes and some other small touches from Daniel. 
> 

This patch version seems fine to me. I'm inclined to mark it RFC.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Online enabling of checksums

От
Andrey Borodin
Дата:

> 5 апр. 2018 г., в 14:33, Tomas Vondra <tomas.vondra@2ndquadrant.com> написал(а):
>
> This patch version seems fine to me. I'm inclined to mark it RFC.
+1
The patch works fine for me. I've tried different combinations of backend cancelation and the only suspicious thing I
foundis that you can start multiple workers by cancelling launcher and not cancelling worker. Is it problematic
behavior?If we run pg_enable_data_checksums() it checks for existing launcher for a reason, m.b. it should check for
workertoo? 

I think it worth to capitalize WAL in "re-write the page to wal".

Best regards, Andrey Borodin.

Re: Online enabling of checksums

От
Magnus Hagander
Дата:


On Thu, Apr 5, 2018 at 4:55 PM, Andrey Borodin <x4mmm@yandex-team.ru> wrote:


> 5 апр. 2018 г., в 14:33, Tomas Vondra <tomas.vondra@2ndquadrant.com> написал(а):
>
> This patch version seems fine to me. I'm inclined to mark it RFC.
+1
The patch works fine for me. I've tried different combinations of backend cancelation and the only suspicious thing I found is that you can start multiple workers by cancelling launcher and not cancelling worker. Is it problematic behavior? If we run pg_enable_data_checksums() it checks for existing launcher for a reason, m.b. it should check for worker too?

I don't think it's a problem in itself -- it will cause pointless work, but not actually cause any poroblems I think (whereas duplicate launchers could cause interesting things to happen).

How did you actually cancel the launcher to end up in this situation?

I think it worth to capitalize WAL in "re-write the page to wal".

In the comment, right? Yeah, easy fix.,

--

Re: Online enabling of checksums

От
Andrey Borodin
Дата:

> 5 апр. 2018 г., в 19:58, Magnus Hagander <magnus@hagander.net> написал(а):
>
>
>
> On Thu, Apr 5, 2018 at 4:55 PM, Andrey Borodin <x4mmm@yandex-team.ru> wrote:
>
>
> > 5 апр. 2018 г., в 14:33, Tomas Vondra <tomas.vondra@2ndquadrant.com> написал(а):
> >
> > This patch version seems fine to me. I'm inclined to mark it RFC.
> +1
> The patch works fine for me. I've tried different combinations of backend cancelation and the only suspicious thing I
foundis that you can start multiple workers by cancelling launcher and not cancelling worker. Is it problematic
behavior?If we run pg_enable_data_checksums() it checks for existing launcher for a reason, m.b. it should check for
workertoo? 
>
> I don't think it's a problem in itself -- it will cause pointless work, but not actually cause any poroblems I think
(whereasduplicate launchers could cause interesting things to happen). 
>
> How did you actually cancel the launcher to end up in this situation?
select pg_enable_data_checksums(10000,1);
select pg_sleep(0.1);
select pg_cancel_backend(pid),backend_type from pg_stat_activity where backend_type ~ 'checksumhelper launcher' ;
select pg_enable_data_checksums(10000,1);
select pg_sleep(0.1);
select pg_cancel_backend(pid),backend_type from pg_stat_activity where backend_type ~ 'checksumhelper launcher' ;
select pg_enable_data_checksums(10000,1);
select pg_sleep(0.1);
select pg_cancel_backend(pid),backend_type from pg_stat_activity where backend_type ~ 'checksumhelper launcher' ;

select pid,backend_type from pg_stat_activity where backend_type ~'checks';
  pid  |     backend_type
-------+-----------------------
 98587 | checksumhelper worker
 98589 | checksumhelper worker
 98591 | checksumhelper worker
(3 rows)

There is a way to shoot yourself in a leg then by calling pg_disable_data_checksums(), but this is extremely stupid for
auser. 

Best regards, Andrey Borodin.

Re: Online enabling of checksums

От
Magnus Hagander
Дата:
On Thu, Apr 5, 2018 at 5:08 PM, Andrey Borodin <x4mmm@yandex-team.ru> wrote:


> 5 апр. 2018 г., в 19:58, Magnus Hagander <magnus@hagander.net> написал(а):
>
>
>
> On Thu, Apr 5, 2018 at 4:55 PM, Andrey Borodin <x4mmm@yandex-team.ru> wrote:
>
>
> > 5 апр. 2018 г., в 14:33, Tomas Vondra <tomas.vondra@2ndquadrant.com> написал(а):
> >
> > This patch version seems fine to me. I'm inclined to mark it RFC.
> +1
> The patch works fine for me. I've tried different combinations of backend cancelation and the only suspicious thing I found is that you can start multiple workers by cancelling launcher and not cancelling worker. Is it problematic behavior? If we run pg_enable_data_checksums() it checks for existing launcher for a reason, m.b. it should check for worker too?
>
> I don't think it's a problem in itself -- it will cause pointless work, but not actually cause any poroblems I think (whereas duplicate launchers could cause interesting things to happen).
>
> How did you actually cancel the launcher to end up in this situation?
select pg_enable_data_checksums(10000,1);
select pg_sleep(0.1);
select pg_cancel_backend(pid),backend_type from pg_stat_activity where backend_type ~ 'checksumhelper launcher' ;
select pg_enable_data_checksums(10000,1);
select pg_sleep(0.1);
select pg_cancel_backend(pid),backend_type from pg_stat_activity where backend_type ~ 'checksumhelper launcher' ;
select pg_enable_data_checksums(10000,1);
select pg_sleep(0.1);
select pg_cancel_backend(pid),backend_type from pg_stat_activity where backend_type ~ 'checksumhelper launcher' ;

select pid,backend_type from pg_stat_activity where backend_type ~'checks';
  pid  |     backend_type
-------+-----------------------
 98587 | checksumhelper worker
 98589 | checksumhelper worker
 98591 | checksumhelper worker
(3 rows)

There is a way to shoot yourself in a leg then by calling pg_disable_data_checksums(), but this is extremely stupid for a user


Ah, didn't consider query cancel. I'm not sure how much  we should actually care about it, but it's easy enough to trap that signal and just do a clean shutdown on it, so I've done that.

PFA a patch that does that, and also rebased over the datallowconn patch just landed (which also removes some docs).
 


--
Вложения

Re: Online enabling of checksums

От
Magnus Hagander
Дата:
On Thu, Apr 5, 2018 at 7:30 PM, Magnus Hagander <magnus@hagander.net> wrote:
On Thu, Apr 5, 2018 at 5:08 PM, Andrey Borodin <x4mmm@yandex-team.ru> wrote:


> 5 апр. 2018 г., в 19:58, Magnus Hagander <magnus@hagander.net> написал(а):
>
>
>
> On Thu, Apr 5, 2018 at 4:55 PM, Andrey Borodin <x4mmm@yandex-team.ru> wrote:
>
>
> > 5 апр. 2018 г., в 14:33, Tomas Vondra <tomas.vondra@2ndquadrant.com> написал(а):
> >
> > This patch version seems fine to me. I'm inclined to mark it RFC.
> +1
> The patch works fine for me. I've tried different combinations of backend cancelation and the only suspicious thing I found is that you can start multiple workers by cancelling launcher and not cancelling worker. Is it problematic behavior? If we run pg_enable_data_checksums() it checks for existing launcher for a reason, m.b. it should check for worker too?
>
> I don't think it's a problem in itself -- it will cause pointless work, but not actually cause any poroblems I think (whereas duplicate launchers could cause interesting things to happen).
>
> How did you actually cancel the launcher to end up in this situation?
select pg_enable_data_checksums(10000,1);
select pg_sleep(0.1);
select pg_cancel_backend(pid),backend_type from pg_stat_activity where backend_type ~ 'checksumhelper launcher' ;
select pg_enable_data_checksums(10000,1);
select pg_sleep(0.1);
select pg_cancel_backend(pid),backend_type from pg_stat_activity where backend_type ~ 'checksumhelper launcher' ;
select pg_enable_data_checksums(10000,1);
select pg_sleep(0.1);
select pg_cancel_backend(pid),backend_type from pg_stat_activity where backend_type ~ 'checksumhelper launcher' ;

select pid,backend_type from pg_stat_activity where backend_type ~'checks';
  pid  |     backend_type
-------+-----------------------
 98587 | checksumhelper worker
 98589 | checksumhelper worker
 98591 | checksumhelper worker
(3 rows)

There is a way to shoot yourself in a leg then by calling pg_disable_data_checksums(), but this is extremely stupid for a user


Ah, didn't consider query cancel. I'm not sure how much  we should actually care about it, but it's easy enough to trap that signal and just do a clean shutdown on it, so I've done that.

PFA a patch that does that, and also rebased over the datallowconn patch just landed (which also removes some docs).
 

I have now pushed this latest version with some minor text adjustments and a catversion bump.

Thanks for all the reviews! 


--

Re: Online enabling of checksums

От
Andres Freund
Дата:
On 2018-04-05 22:06:36 +0200, Magnus Hagander wrote:
> I have now pushed this latest version with some minor text adjustments and
> a catversion bump.
> 
> Thanks for all the reviews!

I want to be on the record that I think merging a nontrival feature that
got submitted 2018-02-21, just before the start of the last last CF, is
an abuse of process, and not cool.  We've other people working hard to
follow the process, and circumventing it like this just signals to
people trying to follow the rules that they're fools.

Merging ~2kloc patches like that is going to cause pain. And even if
not, it causes procedual damage.

- Andres


Re: Online enabling of checksums

От
Andres Freund
Дата:
On 2018-04-05 13:12:08 -0700, Andres Freund wrote:
> On 2018-04-05 22:06:36 +0200, Magnus Hagander wrote:
> > I have now pushed this latest version with some minor text adjustments and
> > a catversion bump.
> > 
> > Thanks for all the reviews!
> 
> I want to be on the record that I think merging a nontrival feature that
> got submitted 2018-02-21, just before the start of the last last CF, is
> an abuse of process, and not cool.  We've other people working hard to
> follow the process, and circumventing it like this just signals to
> people trying to follow the rules that they're fools.
> 
> Merging ~2kloc patches like that is going to cause pain. And even if
> not, it causes procedual damage.

And even worse, without even announcing an intent to commit and giving
people a chance to object.

Greetings,

Andres Freund


Re: Online enabling of checksums

От
Magnus Hagander
Дата:
On Thu, Apr 5, 2018 at 10:14 PM, Andres Freund <andres@anarazel.de> wrote:
On 2018-04-05 13:12:08 -0700, Andres Freund wrote:
> On 2018-04-05 22:06:36 +0200, Magnus Hagander wrote:
> > I have now pushed this latest version with some minor text adjustments and
> > a catversion bump.
> >
> > Thanks for all the reviews!
>
> I want to be on the record that I think merging a nontrival feature that
> got submitted 2018-02-21, just before the start of the last last CF, is
> an abuse of process, and not cool.  We've other people working hard to
> follow the process, and circumventing it like this just signals to
> people trying to follow the rules that they're fools.
>
> Merging ~2kloc patches like that is going to cause pain. And even if
> not, it causes procedual damage.

I can understand those arguments, and if that's the view of others as well, I can of course revert that.


And even worse, without even announcing an intent to commit and giving
people a chance to object.

At least this patch was posted on the lists before commit, unlike many others from many different people. And AFAIK there has never been such a rule.

--

Re: Online enabling of checksums

От
Andres Freund
Дата:

On April 5, 2018 1:20:52 PM PDT, Magnus Hagander <magnus@hagander.net> wrote:
>On Thu, Apr 5, 2018 at 10:14 PM, Andres Freund <andres@anarazel.de>
>wrote:
>
>And even worse, without even announcing an intent to commit and giving
>> people a chance to object.
>>
>
>At least this patch was posted on the lists before commit, unlike many
>others from many different people. And AFAIK there has never been such
>a
>rule.

The more debatable a decision is, the more important it IMO becomes to give people a chance to object. Don't think
thereneeds to be a hard rule to always announce an intent to commit. 

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.


Re: Online enabling of checksums

От
"Joshua D. Drake"
Дата:
On 04/05/2018 01:12 PM, Andres Freund wrote:
> On 2018-04-05 22:06:36 +0200, Magnus Hagander wrote:
>> I have now pushed this latest version with some minor text adjustments and
>> a catversion bump.
>>
>> Thanks for all the reviews!
> I want to be on the record that I think merging a nontrival feature that
> got submitted 2018-02-21, just before the start of the last last CF, is
> an abuse of process, and not cool.  We've other people working hard to
> follow the process, and circumventing it like this just signals to
> people trying to follow the rules that they're fools.
>
> Merging ~2kloc patches like that is going to cause pain. And even if
> not, it causes procedual damage.

Perhaps I am missing something but there has been a lot of public 
discussion on this feature for the last 7 weeks of which you barely 
participated. I certainly understand wanting some notice before commit 
but there has been lots of discussion, multiple people publicly 
commenting on the patch and Magnus has been very receptive to all 
feedback (that I have seen). Perhaps we are being a sensitive because of 
another patch that is actually ramrodding the process and we need to 
take a step back?

Thanks,

JD


>
> - Andres
>

-- 
Command Prompt, Inc. || http://the.postgres.company/ || @cmdpromptinc
***  A fault and talent of mine is to tell it exactly how it is.  ***
PostgreSQL centered full stack support, consulting and development.
Advocate: @amplifypostgres || Learn: https://postgresconf.org
*****     Unless otherwise stated, opinions are my own.   *****



Re: Online enabling of checksums

От
Andres Freund
Дата:
On 2018-04-05 13:51:41 -0700, Joshua D. Drake wrote:
> On 04/05/2018 01:12 PM, Andres Freund wrote:
> > I want to be on the record that I think merging a nontrival feature that
> > got submitted 2018-02-21, just before the start of the last last CF, is
> > an abuse of process, and not cool.  We've other people working hard to
> > follow the process, and circumventing it like this just signals to
> > people trying to follow the rules that they're fools.
> > 
> > Merging ~2kloc patches like that is going to cause pain. And even if
> > not, it causes procedual damage.
> 
> Perhaps I am missing something but there has been a lot of public discussion
> on this feature for the last 7 weeks of which you barely participated.

I've commented weeks ago about my doubts, and Robert concurred:
http://archives.postgresql.org/message-id/CA%2BTgmoZPRfMqZoK_Fbo_tD9OH9PdPFcPBsi-sdGZ6Jg8OMM2PA%40mail.gmail.com


> I certainly understand wanting some notice before commit but there has
> been lots of discussion, multiple people publicly commenting on the
> patch and Magnus has been very receptive to all feedback (that I have
> seen).

It's perfectly reasonable to continue review / improvement cycles of a
patch, even if it's not going to get in the current release. What does
that have to do with what I am concerned about?


> Perhaps we are being a sensitive because of another patch that is
> actually ramrodding the process and we need to take a step back?

No. See link above.

Please don't use "we" in this childishness implying fashion.

- Andres


Re: Online enabling of checksums

От
"Joshua D. Drake"
Дата:
On 04/05/2018 02:01 PM, Andres Freund wrote:
>
> No. See link above.
>
> Please don't use "we" in this childishness implying fashion.

The term "we" was used on purpose because I too was annoyed and I was 
trying to be objective, non-combative and productive.

JD


-- 
Command Prompt, Inc. || http://the.postgres.company/ || @cmdpromptinc
***  A fault and talent of mine is to tell it exactly how it is.  ***
PostgreSQL centered full stack support, consulting and development.
Advocate: @amplifypostgres || Learn: https://postgresconf.org
*****     Unless otherwise stated, opinions are my own.   *****



Re: Online enabling of checksums

От
Peter Geoghegan
Дата:
On Thu, Apr 5, 2018 at 1:27 PM, Andres Freund <andres@anarazel.de> wrote:
>>At least this patch was posted on the lists before commit, unlike many
>>others from many different people. And AFAIK there has never been such
>>a
>>rule.

The rules cannot possibly anticipate every situation or subtlety. The
letter of the law is a slightly distinct thing to its spirit.

> The more debatable a decision is, the more important it IMO becomes to give people a chance to object. Don't think
thereneeds to be a hard rule to always announce an intent to commit.
 

+1

Andres' remarks need to be seen in the context of the past couple of
weeks, and in the context of this being a relatively high risk patch
that was submitted quite late in the cycle.

-- 
Peter Geoghegan


Re: Online enabling of checksums

От
Magnus Hagander
Дата:
On Thu, Apr 5, 2018 at 11:09 PM, Peter Geoghegan <pg@bowt.ie> wrote:
On Thu, Apr 5, 2018 at 1:27 PM, Andres Freund <andres@anarazel.de> wrote:
>>At least this patch was posted on the lists before commit, unlike many
>>others from many different people. And AFAIK there has never been such
>>a
>>rule.

The rules cannot possibly anticipate every situation or subtlety. The
letter of the law is a slightly distinct thing to its spirit.

> The more debatable a decision is, the more important it IMO becomes to give people a chance to object. Don't think there needs to be a hard rule to always announce an intent to commit.

+1

Andres' remarks need to be seen in the context of the past couple of
weeks, and in the context of this being a relatively high risk patch
that was submitted quite late in the cycle.

I would argue that this is a pretty isolated patch. A large majority of the code is completely isolated from the rest. And I would argue that this reduces the risk of the patch substantially.

(And yes, we've noticed it's failing in isolationtester on a number of boxes -- Daniel is currently investigating)

--

Re: Online enabling of checksums

От
Peter Geoghegan
Дата:
On Thu, Apr 5, 2018 at 1:51 PM, Joshua D. Drake <jd@commandprompt.com> wrote:
> Perhaps I am missing something but there has been a lot of public discussion
> on this feature for the last 7 weeks of which you barely participated. I
> certainly understand wanting some notice before commit but there has been
> lots of discussion, multiple people publicly commenting on the patch and
> Magnus has been very receptive to all feedback (that I have seen). Perhaps
> we are being a sensitive because of another patch that is actually
> ramrodding the process and we need to take a step back?

I wish it was just one patch. I can think of several.

-- 
Peter Geoghegan


Re: Online enabling of checksums

От
Andres Freund
Дата:
Hi,

On 2018-04-05 22:06:36 +0200, Magnus Hagander wrote:
> I have now pushed this latest version with some minor text adjustments and
> a catversion bump.

Is there any sort of locking that guarantees that worker processes see
an up2date value of
DataChecksumsNeedWrite()/ControlFile->data_checksum_version? Afaict
there's not. So you can afaict end up with checksums being computed by
the worker, but concurrent writes missing them.  The window is going to
be at most one missed checksum per process (as the unlocking of the page
is a barrier) and is probably not easy to hit, but that's dangerous
enough.

Just plonking a barrier into DataChecksumsNeedWrite() etc is a
possibility, but it's also not free...

Greetings,

Andres Freund


Re: Online enabling of checksums

От
Magnus Hagander
Дата:


On Thu, Apr 5, 2018 at 11:23 PM, Andres Freund <andres@anarazel.de> wrote:
Hi,

On 2018-04-05 22:06:36 +0200, Magnus Hagander wrote:
> I have now pushed this latest version with some minor text adjustments and
> a catversion bump.

Is there any sort of locking that guarantees that worker processes see
an up2date value of
DataChecksumsNeedWrite()/ControlFile->data_checksum_version? Afaict
there's not. So you can afaict end up with checksums being computed by
the worker, but concurrent writes missing them.  The window is going to
be at most one missed checksum per process (as the unlocking of the page
is a barrier) and is probably not easy to hit, but that's dangerous
enough.

So just to be clear of the case you're worried about. It's basically:
Session #1 - sets checksums to inprogress
Session #1 - starts dynamic background worker ("launcher")
Launcher reads and enumerates pg_database
Launcher starts worker in first database
Worker processes first block of data in database
And at this point, Session #2 has still not seen the "checksums inprogress" flag and continues to write without checksums?
 
That seems like quite a long time to me -- is that really a problem? I'm guessing you're seeing a shorter path between the two that I can't see right now (I'll blame the late evning...)?

--

Re: Online enabling of checksums

От
Andres Freund
Дата:
Hi,

On 2018-04-05 23:32:19 +0200, Magnus Hagander wrote:
> On Thu, Apr 5, 2018 at 11:23 PM, Andres Freund <andres@anarazel.de> wrote:
> > Is there any sort of locking that guarantees that worker processes see
> > an up2date value of
> > DataChecksumsNeedWrite()/ControlFile->data_checksum_version? Afaict
> > there's not. So you can afaict end up with checksums being computed by
> > the worker, but concurrent writes missing them.  The window is going to
> > be at most one missed checksum per process (as the unlocking of the page
> > is a barrier) and is probably not easy to hit, but that's dangerous
> > enough.
> >
> 
> So just to be clear of the case you're worried about. It's basically:
> Session #1 - sets checksums to inprogress
> Session #1 - starts dynamic background worker ("launcher")
> Launcher reads and enumerates pg_database
> Launcher starts worker in first database
> Worker processes first block of data in database
> And at this point, Session #2 has still not seen the "checksums inprogress"
> flag and continues to write without checksums?

Yes.  I think there are some variations of that, but yes, that's pretty
much it.


> That seems like quite a long time to me -- is that really a problem?

We don't generally build locking models that are only correct based on
likelihood. Especially not without a lengthy comment explaining that
analysis.


> I'm guessing you're seeing a shorter path between the two that I can't
> see right now (I'll blame the late evning...)?

I don't think it matters terribly much how long that path is.

Greetings,

Andres Freund


Re: Online enabling of checksums

От
Tom Lane
Дата:
Magnus Hagander <magnus@hagander.net> writes:
> I have now pushed this latest version with some minor text adjustments and
> a catversion bump.

crake is not happy --- it's failing cross-version upgrade tests because:


Performing Consistency Checks
-----------------------------
Checking cluster versions                                   ok

old cluster uses data checksums but the new one does not
Failure, exiting


This seems to indicate that you broke pg_upgrade's detection of
checksumming status, or that this patch changed the default
checksum state (which it surely isn't described as doing).

            regards, tom lane


Re: Online enabling of checksums

От
Magnus Hagander
Дата:


On Thu, Apr 5, 2018 at 11:48 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Magnus Hagander <magnus@hagander.net> writes:
> I have now pushed this latest version with some minor text adjustments and
> a catversion bump.

crake is not happy --- it's failing cross-version upgrade tests because:


Performing Consistency Checks
-----------------------------
Checking cluster versions                                   ok

old cluster uses data checksums but the new one does not
Failure, exiting


This seems to indicate that you broke pg_upgrade's detection of
checksumming status, or that this patch changed the default
checksum state (which it surely isn't described as doing).

It's not supposed to.

Without checking into it (just about off to bed now), one guess is that it's actually a leftover from a previous stage -- what state is the cluster actually in when it does that upgrade? Because the specific checksums tests do leave their cluster with checksums on, which I think would perhaps be the outcome of the testmodules-install-check-C test. The actual definition of those tests are somewhere in the buildfarm client code, right?

In that case, the easy fix is probably to have the checksums tests actually turn off the checksums again when they're done.
 
--

Re: Online enabling of checksums

От
Daniel Gustafsson
Дата:
> On 05 Apr 2018, at 23:13, Magnus Hagander <magnus@hagander.net> wrote:

> (And yes, we've noticed it's failing in isolationtester on a number of boxes -- Daniel is currently investigating)

Looking into the isolationtester failure on piculet, which builds using
--disable-atomics, and locust which doesn’t have atomics, the code for
pg_atomic_test_set_flag seems a bit odd.

TAS() is defined to return zero if successful, and pg_atomic_test_set_flag()
defined to return True if it could set.  When running without atomics, don’t we
need to do something like the below diff to make these APIs match?  :

--- a/src/backend/port/atomics.c
+++ b/src/backend/port/atomics.c
@@ -73,7 +73,7 @@ pg_atomic_init_flag_impl(volatile pg_atomic_flag *ptr)
 bool
 pg_atomic_test_set_flag_impl(volatile pg_atomic_flag *ptr)
 {
-       return TAS((slock_t *) &ptr->sema);
+       return TAS((slock_t *) &ptr->sema) == 0;
 }

Applying this makes the _cancel test pass, moving the failure instead to the
following _enable test (which matches what coypu and mylodon are seeing).

AFAICT there are no other callers of this than the online checksum code, and
this is not executed by the tests when running without atomics, which could
explain why nothing else is broken.

Before continuing the debugging, does this theory hold any water?  This isn’t
code I’m deeply familiar with so would appreciate any pointers.

cheers ./daniel

Re: Online enabling of checksums

От
Magnus Hagander
Дата:


On Thu, Apr 5, 2018 at 11:41 PM, Andres Freund <andres@anarazel.de> wrote:
Hi,

On 2018-04-05 23:32:19 +0200, Magnus Hagander wrote:
> On Thu, Apr 5, 2018 at 11:23 PM, Andres Freund <andres@anarazel.de> wrote:
> > Is there any sort of locking that guarantees that worker processes see
> > an up2date value of
> > DataChecksumsNeedWrite()/ControlFile->data_checksum_version? Afaict
> > there's not. So you can afaict end up with checksums being computed by
> > the worker, but concurrent writes missing them.  The window is going to
> > be at most one missed checksum per process (as the unlocking of the page
> > is a barrier) and is probably not easy to hit, but that's dangerous
> > enough.
> >
>
> So just to be clear of the case you're worried about. It's basically:
> Session #1 - sets checksums to inprogress
> Session #1 - starts dynamic background worker ("launcher")
> Launcher reads and enumerates pg_database
> Launcher starts worker in first database
> Worker processes first block of data in database
> And at this point, Session #2 has still not seen the "checksums inprogress"
> flag and continues to write without checksums?

Yes.  I think there are some variations of that, but yes, that's pretty
much it.


> That seems like quite a long time to me -- is that really a problem?

We don't generally build locking models that are only correct based on
likelihood. Especially not without a lengthy comment explaining that
analysis.

Oh, that's not my intention either -- I just wanted to make sure I was thinking about the same issue you were.

Since you know a lot more about that type of interlocks than I do :) We already wait for all running transactions to finish before we start doing anything. Obviously transactions != buffer writes (and we have things like the checkpointer/bgwriter to consider). Is there something else that we could safely just *wait* for? I have no problem whatsoever if this is a long wait (given the total time). I mean to the point of "what if we just stick a sleep(10) in there" level waiting.

Or can that somehow be cleanly solved using some of the new atomic operators? Or is that likely to cause the same kind of overhead as throwing a barrier in there?


-- 

Re: Online enabling of checksums

От
Tomas Vondra
Дата:

On 04/06/2018 11:25 AM, Magnus Hagander wrote:
> 
> 
> On Thu, Apr 5, 2018 at 11:41 PM, Andres Freund <andres@anarazel.de
> <mailto:andres@anarazel.de>> wrote:
> 
>     Hi,
> 
>     On 2018-04-05 23:32:19 +0200, Magnus Hagander wrote:
>     > On Thu, Apr 5, 2018 at 11:23 PM, Andres Freund <andres@anarazel.de <mailto:andres@anarazel.de>> wrote:
>     > > Is there any sort of locking that guarantees that worker processes see
>     > > an up2date value of
>     > > DataChecksumsNeedWrite()/ControlFile->data_checksum_version? Afaict
>     > > there's not. So you can afaict end up with checksums being computed by
>     > > the worker, but concurrent writes missing them.  The window is going to
>     > > be at most one missed checksum per process (as the unlocking of the page
>     > > is a barrier) and is probably not easy to hit, but that's dangerous
>     > > enough.
>     > >
>     >
>     > So just to be clear of the case you're worried about. It's basically:
>     > Session #1 - sets checksums to inprogress
>     > Session #1 - starts dynamic background worker ("launcher")
>     > Launcher reads and enumerates pg_database
>     > Launcher starts worker in first database
>     > Worker processes first block of data in database
>     > And at this point, Session #2 has still not seen the "checksums inprogress"
>     > flag and continues to write without checksums?
> 
>     Yes.  I think there are some variations of that, but yes, that's pretty
>     much it.
> 
> 
>     > That seems like quite a long time to me -- is that really a problem?
> 
>     We don't generally build locking models that are only correct based on
>     likelihood. Especially not without a lengthy comment explaining that
>     analysis.
> 
> 
> Oh, that's not my intention either -- I just wanted to make sure I
> was thinking about the same issue you were.
> 

I agree we shouldn't rely on chance here - if we might read a stale
value, we need to fix that of course.

I'm not quite sure I fully understand the issue, though. I assume both
LockBufHdr and UnlockBufHdr are memory barriers, so for bad things to
happen the process would need to be already past LockBufHdr when the
checksum version is updated. In which case it can use a stale version
when writing the buffer out. Correct?

I wonder if that's actually a problem, considering the checksum worker
will then overwrite all data with correct checksums anyway. So the other
process would have to overwrite the buffer after checksum worker, at
which point it'll have to go through LockBufHdr.

So I'm not sure I see the problem here. But perhaps LockBufHdr is not a
memory barrier?


> Since you know a lot more about that type of interlocks than I do :) We
> already wait for all running transactions to finish before we start
> doing anything. Obviously transactions != buffer writes (and we have
> things like the checkpointer/bgwriter to consider). Is there something
> else that we could safely just *wait* for? I have no problem whatsoever
> if this is a long wait (given the total time). I mean to the point of
> "what if we just stick a sleep(10) in there" level waiting.
> 
> Or can that somehow be cleanly solved using some of the new atomic
> operators? Or is that likely to cause the same kind of overhead as
> throwing a barrier in there?
> 

Perhaps the easiest thing we could do is walk shared buffers and do
LockBufHdr/UnlockBufHdr, which would guarantee no session is the process
of writing out a buffer with possibly stale checksum flag. Of course,
it's a bit brute-force-ish, but it's not that different from the waits
for running transactions and temporary tables.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Online enabling of checksums

От
Andres Freund
Дата:
On 2018-04-06 11:25:59 +0200, Magnus Hagander wrote:
> Since you know a lot more about that type of interlocks than I do :) We
> already wait for all running transactions to finish before we start doing
> anything. Obviously transactions != buffer writes (and we have things like
> the checkpointer/bgwriter to consider). Is there something else that we
> could safely just *wait* for? I have no problem whatsoever if this is a
> long wait (given the total time). I mean to the point of "what if we just
> stick a sleep(10) in there" level waiting.

I don't think anything just related to "time" is resonable in any sort
of way. On a overloaded system you can see long long stalls of processes
that have done a lot of work. Locking protocols should be correct, and
that's that.


> Or can that somehow be cleanly solved using some of the new atomic
> operators? Or is that likely to cause the same kind of overhead as throwing
> a barrier in there?

Worse.

I wonder if we could introduce "MegaExpensiveRareMemoryBarrier()" that
goes through pgproc and signals every process with a signal that
requiers the other side to do an operation implying a memory barrier.

That's actually not hard to do (e.g. every latch operation qualifies),
the problem is that signal delivery isn't synchronous. So you need some
acknowledgement protocol.

I think you could introduce a procsignal message that does a memory
barrier and then sets PGPROC->barrierGeneration to
ProcArrayStruct->barrierGeneration. MegaExpensiveRareMemoryBarrier()
increments ProcArrayStruct->barrierGeneration, signals everyone, and
then waits till every PGPROC->barrierGeneration has surpassed
ProcArrayStruct->barrierGeneration.

Greetings,

Andres Freund


Re: Online enabling of checksums

От
Andres Freund
Дата:
Hi,

On 2018-04-06 14:34:43 +0200, Tomas Vondra wrote:
> > Oh, that's not my intention either -- I just wanted to make sure I
> > was thinking about the same issue you were.

> I agree we shouldn't rely on chance here - if we might read a stale
> value, we need to fix that of course.

It's perfectly possible that some side-conditions mitigate this. What
concerns me that
a) Nobody appears to have raised this issue beforehand, besides an
   unlocked read of a critical variable being a fairly obvious
   issue. This kind of thing needs to be carefully thought about.
b) If there's some "side channel" interlock, it's not documented.

I noticed the issue because of an IM question about the general feature,
and I did a three minute skim and saw the read without a comment.


> I'm not quite sure I fully understand the issue, though. I assume both
> LockBufHdr and UnlockBufHdr are memory barriers, so for bad things to
> happen the process would need to be already past LockBufHdr when the
> checksum version is updated. In which case it can use a stale version
> when writing the buffer out. Correct?

Yes, they're are memory barriers.


> I wonder if that's actually a problem, considering the checksum worker
> will then overwrite all data with correct checksums anyway. So the other
> process would have to overwrite the buffer after checksum worker, at
> which point it'll have to go through LockBufHdr.

Again, I'm not sure if there's some combination of issues that make this
not a problem in practice. I just *asked* if there's something
preventing this from being a problem.

The really problematic case would be if it is possible for some process
to wait long enough, without executing a barrier implying operation,
that it'd try to write out a page that the checksum worker has already
passed over.

Greetings,

Andres Freund


Re: Online enabling of checksums

От
Tomas Vondra
Дата:
On 04/06/2018 07:22 PM, Andres Freund wrote:
> Hi,
> 
> On 2018-04-06 14:34:43 +0200, Tomas Vondra wrote:
>>> Oh, that's not my intention either -- I just wanted to make sure I
>>> was thinking about the same issue you were.
> 
>> I agree we shouldn't rely on chance here - if we might read a stale
>> value, we need to fix that of course.
> 
> It's perfectly possible that some side-conditions mitigate this. What
> concerns me that
> a) Nobody appears to have raised this issue beforehand, besides an
>    unlocked read of a critical variable being a fairly obvious
>    issue. This kind of thing needs to be carefully thought about.
> b) If there's some "side channel" interlock, it's not documented.
> 
> I noticed the issue because of an IM question about the general feature,
> and I did a three minute skim and saw the read without a comment.
> 

All I can say is that I did consider this issue while reviewing the
patch, and I've managed to convince myself it's not an issue (using the
logic that I've just outlined here). Which is why I haven't raised it as
an issue, because I don't think it is.

You're right it might have been mentioned explicitly, of course.

In any case, I wouldn't call LockBufHdr/UnlockBufHdr a "side channel"
interlock. It's a pretty direct and intentional interlock, I think.

> 
>> I'm not quite sure I fully understand the issue, though. I assume both
>> LockBufHdr and UnlockBufHdr are memory barriers, so for bad things to
>> happen the process would need to be already past LockBufHdr when the
>> checksum version is updated. In which case it can use a stale version
>> when writing the buffer out. Correct?
> 
> Yes, they're are memory barriers.
> 

Phew! ;-)

> 
>> I wonder if that's actually a problem, considering the checksum worker
>> will then overwrite all data with correct checksums anyway. So the other
>> process would have to overwrite the buffer after checksum worker, at
>> which point it'll have to go through LockBufHdr.
> 
> Again, I'm not sure if there's some combination of issues that make this
> not a problem in practice. I just *asked* if there's something
> preventing this from being a problem.
> 
> The really problematic case would be if it is possible for some process
> to wait long enough, without executing a barrier implying operation,
> that it'd try to write out a page that the checksum worker has already
> passed over.
> 

Sure. But what would that be? I can't think of anything. A process that
modifies a buffer (or any other piece of shared state) without holding
some sort of lock seems broken by default.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Online enabling of checksums

От
Andres Freund
Дата:
On 2018-04-06 19:40:59 +0200, Tomas Vondra wrote:
> In any case, I wouldn't call LockBufHdr/UnlockBufHdr a "side channel"
> interlock. It's a pretty direct and intentional interlock, I think.

I mean it's a side-channel as far as DataChecksumsNeedWrite() is
concerned. You're banking on all callers using a barrier implying
operation around it.


> Sure. But what would that be? I can't think of anything. A process that
> modifies a buffer (or any other piece of shared state) without holding
> some sort of lock seems broken by default.

You can quite possibly already *hold* a lock if it's not an exclusive
one.

Greetings,

Andres Freund


Re: Online enabling of checksums

От
Tomas Vondra
Дата:

On 04/06/2018 07:46 PM, Andres Freund wrote:
> On 2018-04-06 19:40:59 +0200, Tomas Vondra wrote:
>> In any case, I wouldn't call LockBufHdr/UnlockBufHdr a "side channel"
>> interlock. It's a pretty direct and intentional interlock, I think.
> 
> I mean it's a side-channel as far as DataChecksumsNeedWrite() is 
> concerned. You're banking on all callers using a barrier implying 
> operation around it.

Ah, OK.

> 
>> Sure. But what would that be? I can't think of anything. A process that
>> modifies a buffer (or any other piece of shared state) without holding
>> some sort of lock seems broken by default.
> 
> You can quite possibly already *hold* a lock if it's not an exclusive
> one.
> 

Sure, but if you're holding the buffer lock when the checksum version is
changed, then the checksumhelper is obviously not running yet. In which
case it will update the checksum on the buffer later.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Online enabling of checksums

От
Andres Freund
Дата:
On 2018-04-06 19:59:17 +0200, Tomas Vondra wrote:
> On 04/06/2018 07:46 PM, Andres Freund wrote:
> >> Sure. But what would that be? I can't think of anything. A process that
> >> modifies a buffer (or any other piece of shared state) without holding
> >> some sort of lock seems broken by default.
> > 
> > You can quite possibly already *hold* a lock if it's not an exclusive
> > one.
> > 
> 
> Sure, but if you're holding the buffer lock when the checksum version is
> changed, then the checksumhelper is obviously not running yet. In which
> case it will update the checksum on the buffer later.

The buffer content lock itself doesn't generally give any such guarantee
afaict, as it's required that the content lock is held in shared mode
during IO. ProcessSingleRelationFork() happens to use exclusive mode
(which could and possibly should be optimized), so that's probably
sufficient from that end though.

I'm mainly disconcerted this isn't well discussed & documented.

Greetings,

Andres Freund


Re: Online enabling of checksums

От
Robert Haas
Дата:
On Thu, Apr 5, 2018 at 5:01 PM, Andres Freund <andres@anarazel.de> wrote:
> I've commented weeks ago about my doubts, and Robert concurred:
> http://archives.postgresql.org/message-id/CA%2BTgmoZPRfMqZoK_Fbo_tD9OH9PdPFcPBsi-sdGZ6Jg8OMM2PA%40mail.gmail.com

Yes, and I expressed some previous reservations as well.  Granted, my
reservations about this patch are less than for MERGE, and granted,
too, what is Magnus supposed to do about a couple of committers
expressing doubts about whether something really ought to be
committed?  Is that an absolute bar?  It wasn't phrased as such, nor
do we really have the authority.  At the same time, those concerns
didn't generate much discussion and, at least in my case, are not
withdrawn merely because time has passed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Online enabling of checksums

От
Andres Freund
Дата:
On 2018-04-06 14:14:40 -0400, Robert Haas wrote:
> and granted, too, what is Magnus supposed to do about a couple of
> committers expressing doubts about whether something really ought to
> be committed?  Is that an absolute bar?  It wasn't phrased as such,
> nor do we really have the authority.  At the same time, those concerns
> didn't generate much discussion and, at least in my case, are not
> withdrawn merely because time has passed.

Yea, I don't think they're an absolute blocker. But imo more than
sufficient reason to give the list a head up of a day or two that the
patch is intended to be committed.

I'd only pointed the message out because JD said something about me not
having participated in the earlier discussion.

Greetings,

Andres Freund


Re: Online enabling of checksums

От
Tomas Vondra
Дата:
On 04/06/2018 08:13 PM, Andres Freund wrote:
> On 2018-04-06 19:59:17 +0200, Tomas Vondra wrote:
>> On 04/06/2018 07:46 PM, Andres Freund wrote:
>>>> Sure. But what would that be? I can't think of anything. A process that
>>>> modifies a buffer (or any other piece of shared state) without holding
>>>> some sort of lock seems broken by default.
>>>
>>> You can quite possibly already *hold* a lock if it's not an exclusive
>>> one.
>>>
>>
>> Sure, but if you're holding the buffer lock when the checksum version is
>> changed, then the checksumhelper is obviously not running yet. In which
>> case it will update the checksum on the buffer later.
> 
> The buffer content lock itself doesn't generally give any such
> guarantee afaict, as it's required that the content lock is held in
> shared mode during IO. ProcessSingleRelationFork() happens to use
> exclusive mode (which could and possibly should be optimized), so
> that's probably sufficient from that end though.
> 

Yes.

> I'm mainly disconcerted this isn't well discussed & documented.
> 

Agreed, no argument here.


-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Online enabling of checksums

От
Tomas Vondra
Дата:

On 04/06/2018 08:13 PM, Andres Freund wrote:
> On 2018-04-06 19:59:17 +0200, Tomas Vondra wrote:
>> On 04/06/2018 07:46 PM, Andres Freund wrote:
>>>> Sure. But what would that be? I can't think of anything. A process that
>>>> modifies a buffer (or any other piece of shared state) without holding
>>>> some sort of lock seems broken by default.
>>>
>>> You can quite possibly already *hold* a lock if it's not an exclusive
>>> one.
>>>
>>
>> Sure, but if you're holding the buffer lock when the checksum version is
>> changed, then the checksumhelper is obviously not running yet. In which
>> case it will update the checksum on the buffer later.
> 
> The buffer content lock itself doesn't generally give any such guarantee
> afaict, as it's required that the content lock is held in shared mode
> during IO. ProcessSingleRelationFork() happens to use exclusive mode
> (which could and possibly should be optimized), so that's probably
> sufficient from that end though.
> 

Oh, I've just realized the phrasing of my previous message was rather
confusing. What I meant to say is this:

  Sure, but the checksum version is changed before the checksumhelper
  launcher/worker is even started. So if you're holding the buffer lock
  at that time, then the buffer is essentially guaranteed to be updated
  by the worker later.

Sorry if it seemed I'm suggesting the buffer lock itself guarantees
something about the worker startup.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Online enabling of checksums

От
Andres Freund
Дата:
On 2018-04-06 02:28:17 +0200, Daniel Gustafsson wrote:
> Looking into the isolationtester failure on piculet, which builds using
> --disable-atomics, and locust which doesn’t have atomics, the code for
> pg_atomic_test_set_flag seems a bit odd.
> 
> TAS() is defined to return zero if successful, and pg_atomic_test_set_flag()
> defined to return True if it could set.  When running without atomics, don’t we
> need to do something like the below diff to make these APIs match?  :
> 
> --- a/src/backend/port/atomics.c
> +++ b/src/backend/port/atomics.c
> @@ -73,7 +73,7 @@ pg_atomic_init_flag_impl(volatile pg_atomic_flag *ptr)
>  bool
>  pg_atomic_test_set_flag_impl(volatile pg_atomic_flag *ptr)
>  {
> -       return TAS((slock_t *) &ptr->sema);
> +       return TAS((slock_t *) &ptr->sema) == 0;
>  }

Yes, this looks wrong.

Greetings,

Andres Freund


Re: Online enabling of checksums

От
Andres Freund
Дата:
Hi,

On 2018-04-06 02:28:17 +0200, Daniel Gustafsson wrote:
> Applying this makes the _cancel test pass, moving the failure instead to the
> following _enable test (which matches what coypu and mylodon are seeing).

FWIW, I'm somewhat annoyed that I'm now spending time debugging this to
get the buildfarm green again.

I'm fairly certain that the bug here is a simple race condition in the
test (not the main code!):

The flag informing whether the worker has started is cleared via an
on_shmem_exit() hook:

static void
launcher_exit(int code, Datum arg)
{
    ChecksumHelperShmem->abort = false;
    pg_atomic_clear_flag(&ChecksumHelperShmem->launcher_started);
}

but the the wait in the test is done via functions like:

    CREATE OR REPLACE FUNCTION test_checksums_on() RETURNS boolean AS $$
    DECLARE
        enabled boolean;
    BEGIN
        LOOP
            SELECT setting = 'on' INTO enabled FROM pg_catalog.pg_settings WHERE name = 'data_checksums';
            IF enabled THEN
                EXIT;
            END IF;
            PERFORM pg_sleep(1);
        END LOOP;
        RETURN enabled;
    END;
    $$ LANGUAGE plpgsql;

    INSERT INTO t1 (b, c) VALUES (generate_series(1,10000), 'starting values');

    CREATE OR REPLACE FUNCTION test_checksums_off() RETURNS boolean AS $$
    DECLARE
        enabled boolean;
    BEGIN
        PERFORM pg_sleep(1);
        SELECT setting = 'off' INTO enabled FROM pg_catalog.pg_settings WHERE name = 'data_checksums';
        RETURN enabled;
    END;
    $$ LANGUAGE plpgsql;

which just waits for setting checksums to have finished.  It's
exceedingly unsurprising that a 'pg_sleep(1)' is not a reliable way to
make sure that a process has finished exiting.  Then followup tests fail
because the process is still running

Also:
    CREATE OR REPLACE FUNCTION reader_loop() RETURNS boolean AS $$
    DECLARE
        counter integer;
    BEGIN
        FOR counter IN 1..30 LOOP
            PERFORM count(a) FROM t1;
            PERFORM pg_sleep(0.2);
        END LOOP;
        RETURN True;
    END;
    $$ LANGUAGE plpgsql;
}

really?  Let's just force the test take at least 6s purely from
sleeping?

Greetings,

Andres Freund


Re: Online enabling of checksums

От
Andres Freund
Дата:
On 2018-04-06 14:33:48 -0700, Andres Freund wrote:
> On 2018-04-06 02:28:17 +0200, Daniel Gustafsson wrote:
> > Looking into the isolationtester failure on piculet, which builds using
> > --disable-atomics, and locust which doesn’t have atomics, the code for
> > pg_atomic_test_set_flag seems a bit odd.
> > 
> > TAS() is defined to return zero if successful, and pg_atomic_test_set_flag()
> > defined to return True if it could set.  When running without atomics, don’t we
> > need to do something like the below diff to make these APIs match?  :
> > 
> > --- a/src/backend/port/atomics.c
> > +++ b/src/backend/port/atomics.c
> > @@ -73,7 +73,7 @@ pg_atomic_init_flag_impl(volatile pg_atomic_flag *ptr)
> >  bool
> >  pg_atomic_test_set_flag_impl(volatile pg_atomic_flag *ptr)
> >  {
> > -       return TAS((slock_t *) &ptr->sema);
> > +       return TAS((slock_t *) &ptr->sema) == 0;
> >  }
> 
> Yes, this looks wrong.

And the reason the tests fail reliably after is because the locking
model around ChecksumHelperShmem->launcher_started arguably is broken:

    /* If the launcher isn't started, there is nothing to shut down */
    if (pg_atomic_unlocked_test_flag(&ChecksumHelperShmem->launcher_started))
        return;

This uses a non-concurrency safe primitive. Which then spuriously
triggers:

#define PG_HAVE_ATOMIC_UNLOCKED_TEST_FLAG
static inline bool
pg_atomic_unlocked_test_flag_impl(volatile pg_atomic_flag *ptr)
{
    /*
     * Can't do this efficiently in the semaphore based implementation - we'd
     * have to try to acquire the semaphore - so always return true. That's
     * correct, because this is only an unlocked test anyway. Do this in the
     * header so compilers can optimize the test away.
     */
    return true;
}

no one can entirely quibble with the rationale that this is ok (I'll
post a patch cleaning up the atomics simulation of flags in a bit), but
this is certainly not a correct locking strategy.

Greetings,

Andres Freund


Re: Online enabling of checksums

От
Robert Haas
Дата:
On Fri, Apr 6, 2018 at 6:56 PM, Andres Freund <andres@anarazel.de> wrote:
> no one can entirely quibble with the rationale that this is ok (I'll
> post a patch cleaning up the atomics simulation of flags in a bit), but
> this is certainly not a correct locking strategy.

I think we have enough evidence at this point to conclude that this
patch, along with MERGE, should be reverted.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Online enabling of checksums

От
Daniel Gustafsson
Дата:
> On 07 Apr 2018, at 00:23, Andres Freund <andres@anarazel.de> wrote:

> On 2018-04-06 02:28:17 +0200, Daniel Gustafsson wrote:
>> Applying this makes the _cancel test pass, moving the failure instead to the
>> following _enable test (which matches what coypu and mylodon are seeing).
>
> FWIW, I'm somewhat annoyed that I'm now spending time debugging this to
> get the buildfarm green again.

Sorry about that, I’m a bit slow due to various $family situations at the
moment.

> I'm fairly certain that the bug here is a simple race condition in the
> test (not the main code!):

I wonder if it may perhaps be a case of both?

> It's
> exceedingly unsurprising that a 'pg_sleep(1)' is not a reliable way to
> make sure that a process has finished exiting.  Then followup tests fail
> because the process is still running

I can reproduce the error when building with --disable-atomics, and it seems
that all the failing members either do that, lack atomic.h, lack atomics or a
combination.  checksumhelper cancellation use pg_atomic_unlocked_test_flag() to
test if it’s running when asked to abort, something which seems unsafe to do in
semaphore simulation as it always returns true.  If I for debugging synthesize
a flag test with testset/clear, the tests pass green (with the upstream patch
for pg_atomic_test_set_flag_impl() applied).  Cancelling with semaphore sim is
thus doomed to never work IIUC.  Or it’s a red herring.

As Magnus mentioned upstream, rewriting to not use an atomic flag is probably
the best option, once the current failure is understood.

> really?  Let's just force the test take at least 6s purely from
> sleeping?

The test needs continuous reading in a session to try and trigger any bugs in
read access on the cluster during checksumming, is there a good way to do that
in the isolationtester?  I have failed to find a good way to repeat a step like
that, but I might be missing something.

cheers ./daniel

Re: Online enabling of checksums

От
Andres Freund
Дата:
On 2018-04-07 01:04:50 +0200, Daniel Gustafsson wrote:
> > I'm fairly certain that the bug here is a simple race condition in the
> > test (not the main code!):
> 
> I wonder if it may perhaps be a case of both?

See my other message about the atomic fallback bit.


> > It's
> > exceedingly unsurprising that a 'pg_sleep(1)' is not a reliable way to
> > make sure that a process has finished exiting.  Then followup tests fail
> > because the process is still running
> 
> I can reproduce the error when building with --disable-atomics, and it seems
> that all the failing members either do that, lack atomic.h, lack atomics or a
> combination.

atomics.h isn't important, it's just relevant for solaris (IIRC).  Only
one of the failing ones lacks atomics afaict. See

On 2018-04-06 14:19:09 -0700, Andres Freund wrote:
> Is that an explanation for
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&dt=2018-04-06%2019%3A18%3A11
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lousyjack&dt=2018-04-06%2016%3A03%3A01
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sungazer&dt=2018-04-06%2015%3A46%3A16
> ? Those all don't seem fall under that? Having proper atomics?

So there it's the timing. Note that they didn't always fail either.


> > really?  Let's just force the test take at least 6s purely from
> > sleeping?
> 
> The test needs continuous reading in a session to try and trigger any bugs in
> read access on the cluster during checksumming, is there a good way to do that
> in the isolationtester?  I have failed to find a good way to repeat a step like
> that, but I might be missing something.

IDK, I know this isn't right.

Greetings,

Andres Freund


Re: Online enabling of checksums

От
Daniel Gustafsson
Дата:
> On 07 Apr 2018, at 01:13, Andres Freund <andres@anarazel.de> wrote:
> 
> On 2018-04-07 01:04:50 +0200, Daniel Gustafsson wrote:
>>> I'm fairly certain that the bug here is a simple race condition in the
>>> test (not the main code!):
>> 
>> I wonder if it may perhaps be a case of both?
> 
> See my other message about the atomic fallback bit.

Yep, my MUA pulled it down just as I had sent this.  Thanks for confirming my
suspicion.

cheers ./daniel


Re: Online enabling of checksums

От
Stephen Frost
Дата:
Greetings,

* Robert Haas (robertmhaas@gmail.com) wrote:
> On Fri, Apr 6, 2018 at 6:56 PM, Andres Freund <andres@anarazel.de> wrote:
> > no one can entirely quibble with the rationale that this is ok (I'll
> > post a patch cleaning up the atomics simulation of flags in a bit), but
> > this is certainly not a correct locking strategy.
>
> I think we have enough evidence at this point to conclude that this
> patch, along with MERGE, should be reverted.

I'm not sure that I see some issues around getting the locking correct
when starting/stopping the process is really evidence of a major problem
with the patch- yes, it obviously needs to be fixed and it would have
been unfortuante if we hadn't caught it, but a good bit of effort
appears to have been taken to ensure that exactly this is tested (which
is in part why the buildfarm is failing) and this evidently found an
existing bug, which is hardly this patch's fault.

In short, I don't agree (yet..) that this needs reverting.

I'm quite sure that bringing up MERGE in this thread and saying it needs
to be reverted without even having the committer of that feature on the
CC list isn't terribly useful and conflates two otherwise unrelated
patches and efforts.  Let's try to use the threads the way they're
intended and keep our responses to each on their respective threads.

Thanks!

Stephen

Вложения

Re: Online enabling of checksums

От
Andres Freund
Дата:
On 2018-04-06 19:31:56 -0400, Stephen Frost wrote:
> Greetings,
> 
> * Robert Haas (robertmhaas@gmail.com) wrote:
> > On Fri, Apr 6, 2018 at 6:56 PM, Andres Freund <andres@anarazel.de> wrote:
> > > no one can entirely quibble with the rationale that this is ok (I'll
> > > post a patch cleaning up the atomics simulation of flags in a bit), but
> > > this is certainly not a correct locking strategy.
> > 
> > I think we have enough evidence at this point to conclude that this
> > patch, along with MERGE, should be reverted.
> 
> I'm not sure that I see some issues around getting the locking correct
> when starting/stopping the process is really evidence of a major problem
> with the patch-

Note that there've been several other things mentioned in the
thread. I'll add some more in a bit.


> yes, it obviously needs to be fixed and it would have been unfortuante
> if we hadn't caught it, but a good bit of effort appears to have been
> taken to ensure that exactly this is tested (which is in part why the
> buildfarm is failing) and this evidently found an existing bug, which
> is hardly this patch's fault.

THAT is the problem. It costs people that haven't been involved in the
feature time. I've friggin started debugging this because nobody else
could be bothered. Even though I'd planned to spend that time on other
patches that have been submitted far ahead in time.


> I'm quite sure that bringing up MERGE in this thread and saying it needs
> to be reverted without even having the committer of that feature on the
> CC list isn't terribly useful and conflates two otherwise unrelated
> patches and efforts.

Robert also mentioned it on the other thread, so... And no, they're not
unrelated matters, in that it's pushing half baked stuff.

Greetings,

Andres Freund


Re: Online enabling of checksums

От
Andres Freund
Дата:
On 2018-04-07 01:27:13 +0200, Daniel Gustafsson wrote:
> > On 07 Apr 2018, at 01:13, Andres Freund <andres@anarazel.de> wrote:
> > 
> > On 2018-04-07 01:04:50 +0200, Daniel Gustafsson wrote:
> >>> I'm fairly certain that the bug here is a simple race condition in the
> >>> test (not the main code!):
> >> 
> >> I wonder if it may perhaps be a case of both?
> > 
> > See my other message about the atomic fallback bit.
> 
> Yep, my MUA pulled it down just as I had sent this.  Thanks for confirming my
> suspicion.

But note it fails because the code using it is WRONG. There's a reason
there's "unlocked" in the name. But even leaving that aside, it probably
*still* be wrong if it were locked.

It seems *extremely* dubious that we'll allow to re-enable the checksums
while a worker is still doing stuff for the old cycle in the
background.  Consider what happens if the checksum helper is currently
doing RequestCheckpoint() (something that can certainly take a *LONG*)
while. Another process disables checksums. Pages get written out
without checksums. Yet another process re-enables checksums. Helper
process does SetDataChecksumsOn(). Which succeeds because

    if (ControlFile->data_checksum_version != PG_DATA_CHECKSUM_INPROGRESS_VERSION)
    {
        LWLockRelease(ControlFileLock);
        elog(ERROR, "Checksums not in inprogress mode");
    }

succeeds.  Boom. Cluster with partially set checksums but marked as
valid.


Greetings,

Andres Freund


Re: Online enabling of checksums

От
Stephen Frost
Дата:
Andres,

* Andres Freund (andres@anarazel.de) wrote:
> On 2018-04-06 19:31:56 -0400, Stephen Frost wrote:
> > I'm quite sure that bringing up MERGE in this thread and saying it needs
> > to be reverted without even having the committer of that feature on the
> > CC list isn't terribly useful and conflates two otherwise unrelated
> > patches and efforts.
>
> Robert also mentioned it on the other thread, so... And no, they're not
> unrelated matters, in that it's pushing half baked stuff.

Apparently I've missed where he specifically called for it to be
reverted then, which is fine, and my apologies for missing it amongst
the depth of that particular thread.  I do think that specifically
asking for it to be reverted is distinct from expressing concerns about
it.

Thanks!

Stephen

Вложения

Re: Online enabling of checksums

От
Andres Freund
Дата:
Here's a pass through the patch:
@@ -1033,7 +1034,7 @@ XLogInsertRecord(XLogRecData *rdata,
         Assert(RedoRecPtr < Insert->RedoRecPtr);
         RedoRecPtr = Insert->RedoRecPtr;
     }
-    doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites);
+    doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites || DataChecksumsInProgress());

     if (fpw_lsn != InvalidXLogRecPtr && fpw_lsn <= RedoRecPtr && doPageWrites)

Why does this need an in-progress specific addition? Given that we
unconditionally log FPWs for all pages, and

#define XLogHintBitIsNeeded() (DataChecksumsNeedWrite() || wal_log_hints)

it's not clear what this achieves?  At the very least needs a comment.


@@ -4748,12 +4745,90 @@ GetMockAuthenticationNonce(void)
  * Are checksums enabled for data pages?
  */
 bool
-DataChecksumsEnabled(void)
+DataChecksumsNeedWrite(void)
 {
     Assert(ControlFile != NULL);
     return (ControlFile->data_checksum_version > 0);
 }

+bool
+DataChecksumsNeedVerify(void)
+{
+    Assert(ControlFile != NULL);
+
+    /*
+     * Only verify checksums if they are fully enabled in the cluster. In
+     * inprogress state they are only updated, not verified.
+     */
+    return (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION);
+}
+
+bool
+DataChecksumsInProgress(void)
+{
+    Assert(ControlFile != NULL);
+    return (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_VERSION);
+}

As previously mentioned, the locking model around this is unclear.  It's
probably fine due to to surrounding memory barriers, but that needs to
be very very explicitly documented.


+void
+SetDataChecksumsOn(void)
+{
+    Assert(ControlFile != NULL);
+
+    LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+    if (ControlFile->data_checksum_version != PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+    {
+        LWLockRelease(ControlFileLock);
+        elog(ERROR, "Checksums not in inprogress mode");
+    }
+
+    ControlFile->data_checksum_version = PG_DATA_CHECKSUM_VERSION;
+    UpdateControlFile();
+    LWLockRelease(ControlFileLock);
+
+    XlogChecksums(PG_DATA_CHECKSUM_VERSION);


As I've explained in
https://www.postgresql.org/message-id/20180406235126.d4sg4dtgicdpucnj@alap3.anarazel.de
this appears to be unsafe.  There's no guarantee that the
ControlFile->data_checksum_version hasn't intermittently set to 0.


@@ -7788,6 +7863,16 @@ StartupXLOG(void)
      */
     CompleteCommitTsInitialization();

+    /*
+     * If we reach this point with checksums in inprogress state, we notify
+     * the user that they need to manually restart the process to enable
+     * checksums.
+     */
+    if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+        ereport(WARNING,
+                (errmsg("checksum state is \"inprogress\" with no worker"),
+                 errhint("Either disable or enable checksums by calling the pg_disable_data_checksums() or
pg_enable_data_checksums()functions.")));
 
+

Hm, so one manually has to take action here. Any reason we can't just
re-start the worker? Also, this'll be issued on standbys etc too, that
seems misleading?

+
+/*
+ * Disables checksums for the cluster, unless already disabled.
+ *
+ * Has immediate effect - the checksums are set to off right away.
+ */
+Datum
+disable_data_checksums(PG_FUNCTION_ARGS)
+{
+    /*
+     * If we don't need to write new checksums, then clearly they are already
+     * disabled.
+     */
+    if (!DataChecksumsNeedWrite())
+        ereport(ERROR,
+                (errmsg("data checksums already disabled")));
+
+    ShutdownChecksumHelperIfRunning();
+
+    SetDataChecksumsOff();
+
+    PG_RETURN_VOID();

Unsafe, see SetDataChecksumsOn comment above.

Shouldn't this be named with a pg_? We normally do that for SQL callable
functions, no? See the preceding functions.

This function is marked PROPARALLEL_SAFE. That can't be right?

Also, shouldn't this refuse to work if called in recovery mode? Erroring
out with
ERROR:  XX000: cannot make new WAL entries during recovery
doesn't seem right.


+/*
+ * Enables checksums for the cluster, unless already enabled.
+ *
+ * Supports vacuum-like cost-based throttling, to limit system load.
+ * Starts a background worker that updates checksums on existing data.
+ */
+Datum
+enable_data_checksums(PG_FUNCTION_ARGS)

This is PROPARALLEL_RESTRICTED. That doesn't strike me right, shouldn't
they be PROPARALLEL_UNSAFE? It might be fine, but I'd not want to rely
on it.


+/*
+ * Main entry point for checksumhelper launcher process.
+ */
+bool
+StartChecksumHelperLauncher(int cost_delay, int cost_limit)

entry point sounds a bit like it's the bgw invoked routine...


+/*
+ * ShutdownChecksumHelperIfRunning
+ *        Request shutdown of the checksumhelper
+ *
+ * This does not turn off processing immediately, it signals the checksum
+ * process to end when done with the current block.
+ */
+void
+ShutdownChecksumHelperIfRunning(void)
+{
+    /* If the launcher isn't started, there is nothing to shut down */
+    if (pg_atomic_unlocked_test_flag(&ChecksumHelperShmem->launcher_started))
+        return;

Using an unlocked op here without docs seems wrong.  See also mail
referenced above.


+static bool
+ProcessSingleRelationFork(Relation reln, ForkNumber forkNum, BufferAccessStrategy strategy)
...
+    BlockNumber numblocks = RelationGetNumberOfBlocksInFork(reln, forkNum);

Shouldn't this have a comment explaining that this is safe because all
pages that concurrently get added to the end are going to be checksummed
by the extendor?


+        /* Need to get an exclusive lock before we can flag as dirty */
+        LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE);

Hm. So we'll need exclusive locks on all pages in the database. On busy
inner tables that's going to be painful. It shouldn't be too hard to
reduce this to share locks.


+        START_CRIT_SECTION();
+        MarkBufferDirty(buf);
+        log_newpage_buffer(buf, false);

Hm. So we always log buffers as non-standard ones? That's going to cause
quite the increase in FPW space.


+    elog(DEBUG2, "Checksumhelper done with relation %d: %s",
+         relationId, (aborted ? "aborted" : "finished"));

Hm. Wouldn't it be advisable to include actual relation names?  It's
pretty annoying to keep track this way.


+void
+ChecksumHelperLauncherMain(Datum arg)
....

+    /*
+     * Create a database list.  We don't need to concern ourselves with
+     * rebuilding this list during runtime since any database created after
+     * this process started will be running with checksums turned on from the
+     * start.
+     */

Why is this true? What if somebody runs CREATE DATABASE while the
launcher / worker are processing a different database? It'll copy the
template database on the filesystem level, and it very well might not
yet have checksums set?  Afaict the second time we go through this list
that's not cought.


+        if (processing == SUCCESSFUL)
+        {
+            pfree(db->dbname);
+            pfree(db);
...

Why bother with that and other deallocations here and in other places?
Launcher's going to quit anyway...


+        else if (processing == FAILED)
+        {
+            /*
+             * Put failed databases on the remaining list.
+             */
+            remaining = lappend(remaining, db);

Uh. So we continue checksumming the entire cluster if one database
failed? Given there's no restart capability at all, that seems hard to
defend?


+    CurrentDatabases = BuildDatabaseList();
+
+    foreach(lc, remaining)
+    {
+        ChecksumHelperDatabase *db = (ChecksumHelperDatabase *) lfirst(lc);
+        bool        found = false;
+
+        foreach(lc2, CurrentDatabases)
+        {
+            ChecksumHelperDatabase *db2 = (ChecksumHelperDatabase *) lfirst(lc2);
+
+            if (db->dboid == db2->dboid)
+            {
+                found = true;

This is an O(N^2) comparison logic? There's clusters with tens of
thousands of databases... The comparison costs are low, but still.


+        /*
+         * Foreign tables have by definition no local storage that can be
+         * checksummed, so skip.
+         */
+        if (pgc->relkind == RELKIND_FOREIGN_TABLE)
+            continue;
+

That strikes me as a dangerous form of test. Shouldn't we instead check
whether a relfilenode exists? I'll note that this test currently
includes plain views.  It's just the smgrexist() test that makes it
work.


- Andres


Re: Online enabling of checksums

От
Andres Freund
Дата:
On 2018-04-06 17:59:28 -0700, Andres Freund wrote:
> +    /*
> +     * Create a database list.  We don't need to concern ourselves with
> +     * rebuilding this list during runtime since any database created after
> +     * this process started will be running with checksums turned on from the
> +     * start.
> +     */
> 
> Why is this true? What if somebody runs CREATE DATABASE while the
> launcher / worker are processing a different database? It'll copy the
> template database on the filesystem level, and it very well might not
> yet have checksums set?  Afaict the second time we go through this list
> that's not cought.

*caught

It's indeed trivial to reproduce this, just slowing down a checksum run
and copying the database yields:
./pg_verify_checksums -D /srv/dev/pgdev-dev
pg_verify_checksums: checksum verification failed in file "/srv/dev/pgdev-dev/base/16385/2703", block 0: calculated
checksum45A7 but expected 0
 
pg_verify_checksums: checksum verification failed in file "/srv/dev/pgdev-dev/base/16385/2703", block 1: calculated
checksum8C7D but expected 0
 



further complaints:

The new isolation test cannot be re-run on an existing cluster. That's
because the first test expects isolationtests to be disabled. As even
remarked upon:
# The checksum_enable suite will enable checksums for the cluster so should
# not run before anything expecting the cluster to have checksums turned off

How's that ok? You can leave database wide objects around, but the
cluster-wide stuff needs to be cleaned up.


The tests don't actually make sure that no checksum launcher / apply is
running anymore. They just assume that it's gone once the GUC shows
checksums have been set.  If you wanted to make the tests stable, you'd
need to wait for that to show true *and* then check that no workers are
around anymore.


If it's not obvious: This isn't ready, should be reverted, cleaned up,
and re-submitted for v12.


Greetings,

Andres Freund


Re: Online enabling of checksums

От
Magnus Hagander
Дата:


On Sat, Apr 7, 2018 at 6:26 AM, Andres Freund <andres@anarazel.de> wrote:
On 2018-04-06 17:59:28 -0700, Andres Freund wrote:
> +     /*
> +      * Create a database list.  We don't need to concern ourselves with
> +      * rebuilding this list during runtime since any database created after
> +      * this process started will be running with checksums turned on from the
> +      * start.
> +      */
>
> Why is this true? What if somebody runs CREATE DATABASE while the
> launcher / worker are processing a different database? It'll copy the
> template database on the filesystem level, and it very well might not
> yet have checksums set?  Afaict the second time we go through this list
> that's not cought.

*caught

It's indeed trivial to reproduce this, just slowing down a checksum run
and copying the database yields:
./pg_verify_checksums -D /srv/dev/pgdev-dev
pg_verify_checksums: checksum verification failed in file "/srv/dev/pgdev-dev/base/16385/2703", block 0: calculated checksum 45A7 but expected 0
pg_verify_checksums: checksum verification failed in file "/srv/dev/pgdev-dev/base/16385/2703", block 1: calculated checksum 8C7D but expected 0



further complaints:

The new isolation test cannot be re-run on an existing cluster. That's
because the first test expects isolationtests to be disabled. As even
remarked upon:
# The checksum_enable suite will enable checksums for the cluster so should
# not run before anything expecting the cluster to have checksums turned off

How's that ok? You can leave database wide objects around, but the
cluster-wide stuff needs to be cleaned up.


The tests don't actually make sure that no checksum launcher / apply is
running anymore. They just assume that it's gone once the GUC shows
checksums have been set.  If you wanted to make the tests stable, you'd
need to wait for that to show true *and* then check that no workers are
around anymore.


If it's not obvious: This isn't ready, should be reverted, cleaned up,
and re-submitted for v12.

While I do think that it's still definitely fixable in time for 11, I won't argue for it.Will revert.

Note however that I'm sans-laptop until Sunday, so I will revert it then or possibly Monday. 

--

Re: Online enabling of checksums

От
Michael Banck
Дата:
Hi,

On Sat, Apr 07, 2018 at 08:57:03AM +0200, Magnus Hagander wrote:
> On Sat, Apr 7, 2018 at 6:26 AM, Andres Freund <andres@anarazel.de> wrote:
> > If it's not obvious: This isn't ready, should be reverted, cleaned up,
> > and re-submitted for v12.
> 
> While I do think that it's still definitely fixable in time for 11, I won't
> argue for it.Will revert.

:(

Can the pg_verify_checksums command be kept at least, please? 

AFAICT this one is not contentious, the code is isolated, it's really
useful, orthogonal to online checksum activation and argueably could've
been committed as a separate patch anyway.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer


Re: Online enabling of checksums

От
Andres Freund
Дата:
Hi,

On 2018-04-07 10:14:49 +0200, Michael Banck wrote:
> Can the pg_verify_checksums command be kept at least, please? 
> 
> AFAICT this one is not contentious, the code is isolated, it's really
> useful, orthogonal to online checksum activation and argueably could've
> been committed as a separate patch anyway.

I've not looked at it in any meaningful amount of detail, but it does
seem a lot lower risk from here.

Greetings,

Andres Freund


Re: Online enabling of checksums

От
Andres Freund
Дата:
Hi,

On 2018-04-07 08:57:03 +0200, Magnus Hagander wrote:
> Note however that I'm sans-laptop until Sunday, so I will revert it then or
> possibly Monday.

I'll deactive the isolationtester tests until then. They've been
intermittently broken for days now and prevent other tests from being
exercised.

Greetings,

Andres Freund


Re: Online enabling of checksums

От
Magnus Hagander
Дата:
On Sat, Apr 7, 2018 at 6:22 PM, Andres Freund <andres@anarazel.de> wrote:
Hi,

On 2018-04-07 08:57:03 +0200, Magnus Hagander wrote:
> Note however that I'm sans-laptop until Sunday, so I will revert it then or
> possibly Monday.

I'll deactive the isolationtester tests until then. They've been
intermittently broken for days now and prevent other tests from being
exercised.

Thanks.

I've pushed the revert now, and left the pg_verify_checksums in place for the time being. 

--

Re: Online enabling of checksums

От
Robert Haas
Дата:
On Fri, Apr 6, 2018 at 8:59 PM, Andres Freund <andres@anarazel.de> wrote:
> This is PROPARALLEL_RESTRICTED. That doesn't strike me right, shouldn't
> they be PROPARALLEL_UNSAFE? It might be fine, but I'd not want to rely
> on it.

Just a fine-grained note on this particular point:

It's totally fine for parallel-restricted operations to write WAL,
write to the filesystem, or launch nukes at ${ENEMY_NATION}.  Well, I
mean, the last one might be a bad idea for geopolitical reasons, but
it's not a problem for parallel query.  It is a problem to insert or
update heap tuples because it might extend the relation; mutual
exclusion doesn't work properly there yet (there was a patch to fix
that, but you had some concerns and it didn't go in).  It is a problem
to update or delete heap tuples which might create new combo CIDs; not
all workers will have the same view (there's no patch for this yet
AFAIK, but the fix probably doesn't look that different from
cc5f81366c36b3dd8f02bd9be1cf75b2cc8482bd and could probably use most
of the same infrastructure).

TL;DR: Writing pages (e.g. to set a checksum) doesn't make something
non-parallel-safe.  Writing heap tuples makes it parallel-unsafe.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Online enabling of checksums

От
Magnus Hagander
Дата:


On Tue, Apr 10, 2018 at 6:18 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Apr 6, 2018 at 8:59 PM, Andres Freund <andres@anarazel.de> wrote:
> This is PROPARALLEL_RESTRICTED. That doesn't strike me right, shouldn't
> they be PROPARALLEL_UNSAFE? It might be fine, but I'd not want to rely
> on it.

Just a fine-grained note on this particular point:

It's totally fine for parallel-restricted operations to write WAL,
write to the filesystem, or launch nukes at ${ENEMY_NATION}.  Well, I
mean, the last one might be a bad idea for geopolitical reasons, but
it's not a problem for parallel query.  It is a problem to insert or
update heap tuples because it might extend the relation; mutual
exclusion doesn't work properly there yet (there was a patch to fix
that, but you had some concerns and it didn't go in).  It is a problem
to update or delete heap tuples which might create new combo CIDs; not
all workers will have the same view (there's no patch for this yet
AFAIK, but the fix probably doesn't look that different from
cc5f81366c36b3dd8f02bd9be1cf75b2cc8482bd and could probably use most
of the same infrastructure).

TL;DR: Writing pages (e.g. to set a checksum) doesn't make something
non-parallel-safe.  Writing heap tuples makes it parallel-unsafe.


That's a good summary, thanks!

Just to be clear in this case though -- the function itself doesn't write out *anything*. It only starts a background worker that later does it. The background worker itself is not parallelized, so the risk in  this particular usecase would be that we ended up starting multiple workers (or just failed), I think.

But the summary is very good to have regardless! :) 

--

Re: Online enabling of checksums

От
Magnus Hagander
Дата:


On Mon, Apr 9, 2018 at 7:22 PM, Magnus Hagander <magnus@hagander.net> wrote:
On Sat, Apr 7, 2018 at 6:22 PM, Andres Freund <andres@anarazel.de> wrote:
Hi,

On 2018-04-07 08:57:03 +0200, Magnus Hagander wrote:
> Note however that I'm sans-laptop until Sunday, so I will revert it then or
> possibly Monday.

I'll deactive the isolationtester tests until then. They've been
intermittently broken for days now and prevent other tests from being
exercised.

Thanks.

I've pushed the revert now, and left the pg_verify_checksums in place for the time being. 


PFA an updated version of the patch for the next CF. We believe this one takes care of all the things pointed out so far. 

For this version, we "implemented" the MegaExpensiveRareMemoryBarrier() by simply requiring a restart of PostgreSQL to initiate the conversion background. That is definitely going to guarantee a memory barrier. It's certainly not ideal, but restarting the cluster is still a *lot* better than having to do the entire conversion offline. This can of course be improved upon in the future, but for now we stuck to the safe way.

The concurrent create-database-from-one-that-had-no-checksums is handled by simply looping over the list of databases as long as new databases show up, and waiting for all open transactions to finish at the right moment to ensure there is no concurrently running one as we get the database list.

Since the worker is now a regular background worker started from postmaster, the cost-delay parameters had to be made GUCs instead of function arguments.

(And the more or less broken isolation tests are simply removed)

--
Вложения

Re: Online enabling of checksums

От
Sergei Kornilov
Дата:
Hello

I tried build this patch and got error during make docs
> postgres.sgml:19626: element xref: validity error : IDREF attribute linkend references an unknown ID
"runtime-checksumhelper-cost-limit"
> postgres.sgml:19625: element xref: validity error : IDREF attribute linkend references an unknown ID
"runtime-checksumhelper-cost-delay"

Both new GUC checksumhelper_cost_delay and checksumhelper_cost_limit mentioned in postgresql.conf with special value -1
(-1to use vacuum_cost_limit), but this value was not mentioned in docs. I noticed that the code and documentation
describedifferent defaults.
 
Also i found one "<literal>in progress</literal>" in pg_enable_data_checksums() description. In other places status is
called"inprogress" (without space).
 

>    VacuumPageHit = 0;
>    VacuumPageMiss = 0;
>    VacuumPageDirty = 0;
Hm, why these settings are set to 0 in checksumhelper process?

> /*
> * Force a checkpoint to get everything out to disk. XXX: this should
> * probably not be an IMMEDIATE checkpoint, but leave it there for now for
> * testing
> */
> RequestCheckpoint(CHECKPOINT_FORCE | CHECKPOINT_WAIT | CHECKPOINT_IMMEDIATE);
We need not forget that.

regards, Sergei


Re: Online enabling of checksums

От
Sergei Kornilov
Дата:
The following review has been posted through the commitfest application:
make installcheck-world:  tested, failed
Implements feature:       not tested
Spec compliant:           not tested
Documentation:            tested, failed

Hello
As i wrote few weeks ago i can not build documentation due errors:
> postgres.sgml:19625: element xref: validity error : IDREF attribute linkend references an unknown ID
"runtime-checksumhelper-cost-delay"
> postgres.sgml:19626: element xref: validity error : IDREF attribute linkend references an unknown ID
"runtime-checksumhelper-cost-limit"

After remove such xref for test purposes patch pass check-world.

regards, Sergei

The new status of this patch is: Waiting on Author

Re: Online enabling of checksums

От
Daniel Gustafsson
Дата:
> On 24 Jul 2018, at 11:05, Sergei Kornilov <sk@zsrv.org> wrote:
>
> The following review has been posted through the commitfest application:
> make installcheck-world:  tested, failed
> Implements feature:       not tested
> Spec compliant:           not tested
> Documentation:            tested, failed
>
> Hello
> As i wrote few weeks ago i can not build documentation due errors:
>> postgres.sgml:19625: element xref: validity error : IDREF attribute linkend references an unknown ID
"runtime-checksumhelper-cost-delay"
>> postgres.sgml:19626: element xref: validity error : IDREF attribute linkend references an unknown ID
"runtime-checksumhelper-cost-limit"
>
> After remove such xref for test purposes patch pass check-world.

Hi!,

Thanks for reviewing, I’ve updated the patch with the above mentioned incorrect
linkends as well as fixed the comments you made in a previous review.

The CF-builder-bot is red, but it’s because it’s trying to apply the already
committed patch which is in the attached datallowconn thread.

cheers ./daniel


Вложения

Re: Online enabling of checksums

От
Sergei Kornilov
Дата:
Hello
Thank you for update! I did only quick test now: patch applied and build clean. But i have reproducible error during
check-world:

> t/001_standby_checksum.pl .. 6/10 
> #   Failed test 'ensure checksums are enabled on standby'
> #   at t/001_standby_checksum.pl line 84.
> #          got: 'inprogress'
> #     expected: 'on'

In stanby log i found error:
> 2018-07-25 13:13:05.463 MSK [16544] FATAL:  could not receive data from WAL stream: ERROR:  requested WAL segment
000000010000000000000003has already been removed
 

Checksumhelper obviously writes lot of wal. Test pass if i change restart order to slave first:
> $node_standby_1->restart();
> $node_master->restart();
Or we need replication slot setup.

Also we have log record after start:
> data checksums in pending state, starting background worker to enable
even in recovery state with actual start background worker only at recovery end (or promote). I think better using
DEBUGereport in postmaster and LOG in checksumhelper.
 

regards, Sergei

25.07.2018, 12:35, "Daniel Gustafsson" <daniel@yesql.se>:
>>  On 24 Jul 2018, at 11:05, Sergei Kornilov <sk@zsrv.org> wrote:
>>
>>  The following review has been posted through the commitfest application:
>>  make installcheck-world: tested, failed
>>  Implements feature: not tested
>>  Spec compliant: not tested
>>  Documentation: tested, failed
>>
>>  Hello
>>  As i wrote few weeks ago i can not build documentation due errors:
>>>  postgres.sgml:19625: element xref: validity error : IDREF attribute linkend references an unknown ID
"runtime-checksumhelper-cost-delay"
>>>  postgres.sgml:19626: element xref: validity error : IDREF attribute linkend references an unknown ID
"runtime-checksumhelper-cost-limit"
>>
>>  After remove such xref for test purposes patch pass check-world.
>
> Hi!,
>
> Thanks for reviewing, I’ve updated the patch with the above mentioned incorrect
> linkends as well as fixed the comments you made in a previous review.
>
> The CF-builder-bot is red, but it’s because it’s trying to apply the already
> committed patch which is in the attached datallowconn thread.
>
> cheers ./daniel


Re: Online enabling of checksums

От
Robert Haas
Дата:
On Tue, Jun 26, 2018 at 7:45 AM, Magnus Hagander <magnus@hagander.net> wrote:
> PFA an updated version of the patch for the next CF. We believe this one
> takes care of all the things pointed out so far.
>
> For this version, we "implemented" the MegaExpensiveRareMemoryBarrier() by
> simply requiring a restart of PostgreSQL to initiate the conversion
> background. That is definitely going to guarantee a memory barrier. It's
> certainly not ideal, but restarting the cluster is still a *lot* better than
> having to do the entire conversion offline. This can of course be improved
> upon in the future, but for now we stuck to the safe way.

Honestly, I feel like the bar for this feature ought to be higher than that.

(I half-expect a vigorous discussion of whether I have set the bar for
the features I've developed in the right place or not, but I think
that's not really a fair response.  If somebody thinks some feature I
implemented should've been more baked, they might be right, but that's
not what this thread is about.  I'm giving you MY opinion about THIS
patch, nothing more or less.)

Why can't we do better?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Online enabling of checksums

От
Andres Freund
Дата:

On July 26, 2018 10:03:39 AM PDT, Robert Haas <robertmhaas@gmail.com> wrote:
>On Tue, Jun 26, 2018 at 7:45 AM, Magnus Hagander <magnus@hagander.net>
>wrote:
>> PFA an updated version of the patch for the next CF. We believe this
>one
>> takes care of all the things pointed out so far.
>>
>> For this version, we "implemented" the
>MegaExpensiveRareMemoryBarrier() by
>> simply requiring a restart of PostgreSQL to initiate the conversion
>> background. That is definitely going to guarantee a memory barrier.
>It's
>> certainly not ideal, but restarting the cluster is still a *lot*
>better than
>> having to do the entire conversion offline. This can of course be
>improved
>> upon in the future, but for now we stuck to the safe way.
>
>Honestly, I feel like the bar for this feature ought to be higher than
>that.
>
>(I half-expect a vigorous discussion of whether I have set the bar for
>the features I've developed in the right place or not, but I think
>that's not really a fair response.  If somebody thinks some feature I
>implemented should've been more baked, they might be right, but that's
>not what this thread is about.  I'm giving you MY opinion about THIS
>patch, nothing more or less.)

+1

>Why can't we do better?

I don't think it's that hard to do better. IIRC I even outlined something before the freeze. If not, o certainly can
(sketch:use procsignal based acknowledgment protocol, using a 64 bit integer. Useful for plenty other things). 

Andres


--
Sent from my Android device with K-9 Mail. Please excuse my brevity.


Re: Online enabling of checksums

От
Bruce Momjian
Дата:
On Wed, Jul 25, 2018 at 11:35:31AM +0200, Daniel Gustafsson wrote:
> > On 24 Jul 2018, at 11:05, Sergei Kornilov <sk@zsrv.org> wrote:
> > 
> > The following review has been posted through the commitfest application:
> > make installcheck-world:  tested, failed
> > Implements feature:       not tested
> > Spec compliant:           not tested
> > Documentation:            tested, failed
> > 
> > Hello
> > As i wrote few weeks ago i can not build documentation due errors:
> >> postgres.sgml:19625: element xref: validity error : IDREF attribute linkend references an unknown ID
"runtime-checksumhelper-cost-delay"
> >> postgres.sgml:19626: element xref: validity error : IDREF attribute linkend references an unknown ID
"runtime-checksumhelper-cost-limit"
> > 
> > After remove such xref for test purposes patch pass check-world.
> 
> Hi!,
> 
> Thanks for reviewing, I’ve updated the patch with the above mentioned incorrect
> linkends as well as fixed the comments you made in a previous review.
> 
> The CF-builder-bot is red, but it’s because it’s trying to apply the already
> committed patch which is in the attached datallowconn thread.

I think checksumhelper_cost_delay should be checksum_helper_cost_delay.
                                                    ^

Is "helper" the right word?

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +


Re: Online enabling of checksums

От
"Joshua D. Drake"
Дата:
On 07/31/2018 12:45 PM, Bruce Momjian wrote:
>
>> Hi!,
>>
>> Thanks for reviewing, I’ve updated the patch with the above mentioned incorrect
>> linkends as well as fixed the comments you made in a previous review.
>>
>> The CF-builder-bot is red, but it’s because it’s trying to apply the already
>> committed patch which is in the attached datallowconn thread.
> I think checksumhelper_cost_delay should be checksum_helper_cost_delay.
>                                                      ^
>
> Is "helper" the right word?

Based on other terminology within the postgresql.conf should it be 
"checksum_worker_cost_delay"?

JD

>

-- 
Command Prompt, Inc. || http://the.postgres.company/ || @cmdpromptinc
***  A fault and talent of mine is to tell it exactly how it is.  ***
PostgreSQL centered full stack support, consulting and development.
Advocate: @amplifypostgres || Learn: https://postgresconf.org
*****     Unless otherwise stated, opinions are my own.   *****



Re: Online enabling of checksums

От
Bruce Momjian
Дата:
On Tue, Jul 31, 2018 at 12:52:40PM -0700, Joshua Drake wrote:
> On 07/31/2018 12:45 PM, Bruce Momjian wrote:
> >
> >>Hi!,
> >>
> >>Thanks for reviewing, I’ve updated the patch with the above mentioned incorrect
> >>linkends as well as fixed the comments you made in a previous review.
> >>
> >>The CF-builder-bot is red, but it’s because it’s trying to apply the already
> >>committed patch which is in the attached datallowconn thread.
> >I think checksumhelper_cost_delay should be checksum_helper_cost_delay.
> >                                                     ^
> >
> >Is "helper" the right word?
> 
> Based on other terminology within the postgresql.conf should it be
> "checksum_worker_cost_delay"?

+1> 

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +


Re: Online enabling of checksums

От
Daniel Gustafsson
Дата:
> On 31 Jul 2018, at 21:52, Joshua D. Drake <jd@commandprompt.com> wrote:
>
> On 07/31/2018 12:45 PM, Bruce Momjian wrote:

>>> Thanks for reviewing, I’ve updated the patch with the above mentioned incorrect
>>> linkends as well as fixed the comments you made in a previous review.
>>>
>>> The CF-builder-bot is red, but it’s because it’s trying to apply the already
>>> committed patch which is in the attached datallowconn thread.
>> I think checksumhelper_cost_delay should be checksum_helper_cost_delay.
>>                                                     ^
>> Is "helper" the right word?

IIRC, “helper” was chosen to signal that it’s a single process where “worker”
may be thought of as a process of which there can be many.

> Based on other terminology within the postgresql.conf should it be "checksum_worker_cost_delay”?

Yes, I think it makes sense to rename it “worker” to align better with the
postgres nomenclature. Will fix.

cheers ./daniel

Re: Online enabling of checksums

От
Daniel Gustafsson
Дата:
> On 26 Jul 2018, at 19:35, Andres Freund <andres@anarazel.de> wrote:
> On July 26, 2018 10:03:39 AM PDT, Robert Haas <robertmhaas@gmail.com <mailto:robertmhaas@gmail.com>> wrote:

>> Why can't we do better?
>
> I don't think it's that hard to do better. IIRC I even outlined something before the freeze. If not, o certainly can
(sketch:use procsignal based acknowledgment protocol, using a 64 bit integer. Useful for plenty other things). 

Not really arguing for or against, but just to understand the reasoning before
starting hacking.  Why do we feel that a restart (intended for safety here) in
this case is a burden on a use-once process?  Is it from a usability or
technical point of view?  Just want to make sure we are on the same page before
digging in to not hack on this patch in a direction which isn’t what is
requested.

cheers ./daniel

Re: Online enabling of checksums

От
Andres Freund
Дата:
Hi,

On 2018-07-31 23:20:27 +0200, Daniel Gustafsson wrote:
> > On 26 Jul 2018, at 19:35, Andres Freund <andres@anarazel.de> wrote:
> > On July 26, 2018 10:03:39 AM PDT, Robert Haas <robertmhaas@gmail.com <mailto:robertmhaas@gmail.com>> wrote:
> 
> >> Why can't we do better?
> > 
> > I don't think it's that hard to do better. IIRC I even outlined something before the freeze. If not, o certainly
can(sketch: use procsignal based acknowledgment protocol, using a 64 bit integer. Useful for plenty other things).
 
> 
> Not really arguing for or against, but just to understand the reasoning before
> starting hacking.  Why do we feel that a restart (intended for safety here) in
> this case is a burden on a use-once process?  Is it from a usability or
> technical point of view?  Just want to make sure we are on the same page before
> digging in to not hack on this patch in a direction which isn’t what is
> requested.

Having, at some arbitrary seeming point in the middle of enabling
checksums to restart the server makes it harder to use and to schedule.
The restart is only needed to fix a relatively small issue, and doesn't
save that much code.

Greetings,

Andres Freund


Re: Online enabling of checksums

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> On 2018-07-31 23:20:27 +0200, Daniel Gustafsson wrote:
>> Not really arguing for or against, but just to understand the reasoning before
>> starting hacking.  Why do we feel that a restart (intended for safety here) in
>> this case is a burden on a use-once process?  Is it from a usability or
>> technical point of view?  Just want to make sure we are on the same page before
>> digging in to not hack on this patch in a direction which isn’t what is
>> requested.

> Having, at some arbitrary seeming point in the middle of enabling
> checksums to restart the server makes it harder to use and to schedule.
> The restart is only needed to fix a relatively small issue, and doesn't
> save that much code.

Without taking a position on the merits ... I don't see how you can
claim "it doesn't save that much code" when we don't have a patch to
compare to that doesn't require the restart.  Maybe it will turn out
not to be much code, but we don't know that now.

            regards, tom lane


Re: Online enabling of checksums

От
Andres Freund
Дата:
On 2018-07-31 17:28:41 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2018-07-31 23:20:27 +0200, Daniel Gustafsson wrote:
> >> Not really arguing for or against, but just to understand the reasoning before
> >> starting hacking.  Why do we feel that a restart (intended for safety here) in
> >> this case is a burden on a use-once process?  Is it from a usability or
> >> technical point of view?  Just want to make sure we are on the same page before
> >> digging in to not hack on this patch in a direction which isn’t what is
> >> requested.
> 
> > Having, at some arbitrary seeming point in the middle of enabling
> > checksums to restart the server makes it harder to use and to schedule.
> > The restart is only needed to fix a relatively small issue, and doesn't
> > save that much code.
> 
> Without taking a position on the merits ... I don't see how you can
> claim "it doesn't save that much code" when we don't have a patch to
> compare to that doesn't require the restart.  Maybe it will turn out
> not to be much code, but we don't know that now.

IIRC I outlined a solution around the feature freeze, and I've since
offered to go into further depth if needed. And I'd pointed out the
issue at hand. So while I'd obviously not want to predict a specific
linecount, I'm fairly sure I have a reasonable guesstimate about the
complexity.

- Andres


Re: Online enabling of checksums

От
Alvaro Herrera
Дата:
On 2018-Jul-31, Tom Lane wrote:

> Andres Freund <andres@anarazel.de> writes:
> > On 2018-07-31 23:20:27 +0200, Daniel Gustafsson wrote:
> >> Not really arguing for or against, but just to understand the reasoning before
> >> starting hacking.  Why do we feel that a restart (intended for safety here) in
> >> this case is a burden on a use-once process?  Is it from a usability or
> >> technical point of view?  Just want to make sure we are on the same page before
> >> digging in to not hack on this patch in a direction which isn’t what is
> >> requested.
> 
> > Having, at some arbitrary seeming point in the middle of enabling
> > checksums to restart the server makes it harder to use and to schedule.
> > The restart is only needed to fix a relatively small issue, and doesn't
> > save that much code.
> 
> Without taking a position on the merits ... I don't see how you can
> claim "it doesn't save that much code" when we don't have a patch to
> compare to that doesn't require the restart.  Maybe it will turn out
> not to be much code, but we don't know that now.

The ability to get checksums enabled is a killer feature; the ability to
do it with no restart ... okay, it's better than requiring a restart,
but it's not *that* big a deal.

In the spirit of supporting incremental development, I think it's quite
sensible to get the current thing done, then see what it takes to get
the next thing done.  Each is an improvement on its own merits.  And it
doesn't have to be made by the same people.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Online enabling of checksums

От
Andres Freund
Дата:
Hi,

On 2018-07-31 18:56:29 -0400, Alvaro Herrera wrote:
> In the spirit of supporting incremental development, I think it's quite
> sensible to get the current thing done, then see what it takes to get
> the next thing done.  Each is an improvement on its own merits.  And it
> doesn't have to be made by the same people.

I just don't buy this. An earlier version of this feature was committed
to v11 without the restart, over objections. There's now extra state in
the control file to support the restart based system, there's extra
tests, extra docs. And it'd not be much code to just make it work
without the restart. The process around this patchset is just plain
weird.

Greetings,

Andres Freund


Re: Online enabling of checksums

От
Michael Banck
Дата:
Hi,

Am Dienstag, den 31.07.2018, 18:56 -0400 schrieb Alvaro Herrera:
> The ability to get checksums enabled is a killer feature; the ability to
> do it with no restart ... okay, it's better than requiring a restart,
> but it's not *that* big a deal.

Well, it's a downtime and service interruption from the client's POV,
and arguably we should remove the 'online' from the patch subject then.
You can activate checksums on an offline instance already via the
pg_checksums extensions to pg_verify_checksums that Michael Paquier and
I wrote independently; of course, that downtime will be linerarily
longer the more data you have.

If this was one week before feature freeze, I would agree with you that
it makes sense to ship it with the restart requirement rather than not
shipping it at all. But we're several commitfests away from v12, so
making an effort to having this work without a downtime looks like a
reasonable requirement to me.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz


Re: Online enabling of checksums

От
Tomas Vondra
Дата:
On 08/01/2018 10:40 AM, Michael Banck wrote:
> Hi,
> 
> Am Dienstag, den 31.07.2018, 18:56 -0400 schrieb Alvaro Herrera:
>> The ability to get checksums enabled is a killer feature; the
>> ability to do it with no restart ... okay, it's better than
>> requiring a restart, but it's not *that* big a deal.
> 
> Well, it's a downtime and service interruption from the client's POV,
> and arguably we should remove the 'online' from the patch subject
> then. You can activate checksums on an offline instance already via
> the pg_checksums extensions to pg_verify_checksums that Michael
> Paquier and I wrote independently; of course, that downtime will be
> linerarily longer the more data you have.
> 

IMHO the main work still happens while the instance is running, so I
don't see why the restart would make it "not online".

But keeping or not keeping "online" is not the main dilemma faced by
this patch, I think. That is, if we drop "online" from the name I doubt
it'll make it any more acceptable for those objecting to having to
restart the cluster.

> If this was one week before feature freeze, I would agree with you 
> that it makes sense to ship it with the restart requirement rather 
> than not shipping it at all. But we're several commitfests away from 
> v12, so making an effort to having this work without a downtime
> looks like a reasonable requirement to me.
> 

Why would all those pieces had to be committed at once? Why not to
commit what we have now (with the restart) and then remove the
restriction in a later commit?

I understand the desire to be able to enable checksums without a
restart, but kinda agree with Alvaro regarding incremental development.

In a way, the question is how far can we reasonably push the patch
author(s) to implement stuff we consider desirable, but he/she/they
decided it's not worth the time investment at this point.

To me, it seems like an immensely useful feature even with the restart,
and I don't think the restart is a major burden for most systems (it can
be, if your system has no maintenance windows, or course).

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Online enabling of checksums

От
Sergei Kornilov
Дата:
Hello
I think one restart is acceptable for such feature. I doubt user want often disable-enable checksums. In many cases
checksumswill be enabled one time during all cluster life. We need more downtimes for minor updates (4/year) and
changesin config PGC_POSTMASTER (max_connections or autovacuum workers, for example).
 

regards, Sergei


Re: Online enabling of checksums

От
Andres Freund
Дата:
Hi,

On 2018-08-01 10:40:24 +0200, Michael Banck wrote:
> If this was one week before feature freeze, I would agree with you that
> it makes sense to ship it with the restart requirement rather than not
> shipping it at all. But we're several commitfests away from v12, so
> making an effort to having this work without a downtime looks like a
> reasonable requirement to me.

My problem isn't just that I shouldn't think this should be committed
without at least a firm committement to do better, my problem is that I
think the "restart" approach is just using the entirely wrong hammer to
solve the problem at hand.  At the very least it's very problematic in
respect to replicas, which need to know about the setting too, and can
have similar problems the restart on the primary is supposed to
prevent.

Greetings,

Andres Freund


Re: Online enabling of checksums

От
Andres Freund
Дата:
Hi,

On 2018-08-01 11:15:38 +0200, Tomas Vondra wrote:
> On 08/01/2018 10:40 AM, Michael Banck wrote:
> > If this was one week before feature freeze, I would agree with you 
> > that it makes sense to ship it with the restart requirement rather 
> > than not shipping it at all. But we're several commitfests away from 
> > v12, so making an effort to having this work without a downtime
> > looks like a reasonable requirement to me.
> > 
> 
> Why would all those pieces had to be committed at once? Why not to
> commit what we have now (with the restart) and then remove the
> restriction in a later commit?

Sure, if all the pieces existed in various degrees of solidness (with
the earlier pieces committable, but later ones needing work), I'd feel
*much* less concerned about it.


> In a way, the question is how far can we reasonably push the patch
> author(s) to implement stuff we consider desirable, but he/she/they
> decided it's not worth the time investment at this point.

We push people to only implement something really consistent all the
time.


> To me, it seems like an immensely useful feature even with the restart,
> and I don't think the restart is a major burden for most systems (it can
> be, if your system has no maintenance windows, or course).

I think it a problem, my problem is more that I don't think it's really
a solution for the problem.

Greetings,

Andres Freund


Re: Online enabling of checksums

От
Alvaro Herrera
Дата:
Hello

On 2018-Aug-01, Andres Freund wrote:

> My problem isn't just that I shouldn't think this should be committed
> without at least a firm committement to do better,

I take "I think this shouldn't be committed" is what you meant.

I'm not sure I agree with this line of argument.  The reality is that
real life or diverging priorities preclude people from working on
$stuff.  This is a useful feature-1 we have here, and if we stall it
until we have feature-2, we may not get either until a year later.
That's not a great outcome.  We didn't wait for partitioning, parallel
query, DDL progress reporting, logical replication, JIT, wait events (to
name but a few) to solve world's hunger in order to start getting
committed.  We move forward step by step, and that's a good thing.

Firm commitments are great things to have, and if the firmness leads to
feature-2 being part of the same release, great, but if it's not firm
enough, we can have feature-2 the next release (or whenever).  Even if
there's no such commitment, feature-1 is useful on its own.

> my problem is that I think the "restart" approach is just using the
> entirely wrong hammer to solve the problem at hand.  At the very least
> it's very problematic in respect to replicas, which need to know about
> the setting too, and can have similar problems the restart on the
> primary is supposed to prevent.

If we define "restart" to mean taking all the servers down
simultaneously, that can be planned.  For users that cannot do that,
that's too bad, they'll have to wait to the next release in order to
enable checksums (assuming they fund the necessary development).  But
there are many systems where it *is* possible to take everything down
for five seconds, then back up.  They can definitely take advantage of
checksummed data.

Currently, the only way to enable checksums is to initdb and create a
new copy of the data from a logical backup, which could take hours or
even days if data is large, or use logical replication.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Online enabling of checksums

От
Tomas Vondra
Дата:
On 08/01/2018 05:58 PM, Andres Freund wrote:
> Hi,
> 
> On 2018-08-01 11:15:38 +0200, Tomas Vondra wrote:
>> On 08/01/2018 10:40 AM, Michael Banck wrote:
>>> If this was one week before feature freeze, I would agree with you
>>> that it makes sense to ship it with the restart requirement rather
>>> than not shipping it at all. But we're several commitfests away from
>>> v12, so making an effort to having this work without a downtime
>>> looks like a reasonable requirement to me.
>>>
>>
>> Why would all those pieces had to be committed at once? Why not to
>> commit what we have now (with the restart) and then remove the
>> restriction in a later commit?
> 
> Sure, if all the pieces existed in various degrees of solidness (with
> the earlier pieces committable, but later ones needing work), I'd feel
> *much* less concerned about it.
> 

That's not what I meant, sorry for not being clearer. My point was that 
I see the "without restart" as desirable but optional, and would not 
mind treating it as a future improvement.

> 
>> In a way, the question is how far can we reasonably push the patch
>> author(s) to implement stuff we consider desirable, but he/she/they
>> decided it's not worth the time investment at this point.
> 
> We push people to only implement something really consistent all the
> time.
> 

Sure, but it's somewhat subjective matter - to me this limitation does 
not make this particular patch inconsistent. If we can remove it, great. 
If not, it's still immensely useful improvement.

> 
>> To me, it seems like an immensely useful feature even with the restart,
>> and I don't think the restart is a major burden for most systems (it can
>> be, if your system has no maintenance windows, or course).
> 
> I think it a problem, my problem is more that I don't think it's really
> a solution for the problem.
> 

Sure, if there are issues with this approach, that would make it 
unacceptable. I'm not sure why would it be an issue for replicas (which 
is what you mention elsewhere), considering those don't write data and 
so can't fail to update a checksum?


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Online enabling of checksums

От
Andres Freund
Дата:
On 2018-08-01 12:20:12 -0400, Alvaro Herrera wrote:
> Hello
> 
> On 2018-Aug-01, Andres Freund wrote:
> 
> > My problem isn't just that I shouldn't think this should be committed
> > without at least a firm committement to do better,
> 
> I take "I think this shouldn't be committed" is what you meant.

Yep.


> I'm not sure I agree with this line of argument.  The reality is that
> real life or diverging priorities preclude people from working on
> $stuff.

Right.


> This is a useful feature-1 we have here, and if we stall it
> until we have feature-2, we may not get either until a year later.
> That's not a great outcome.  We didn't wait for partitioning, parallel
> query, DDL progress reporting, logical replication, JIT, wait events (to
> name but a few) to solve world's hunger in order to start getting
> committed.  We move forward step by step, and that's a good thing.

But we asked that they implement something consistent, and rejected many
that were not deemed to be that.


> > my problem is that I think the "restart" approach is just using the
> > entirely wrong hammer to solve the problem at hand.  At the very least
> > it's very problematic in respect to replicas, which need to know about
> > the setting too, and can have similar problems the restart on the
> > primary is supposed to prevent.
> 
> If we define "restart" to mean taking all the servers down
> simultaneously, that can be planned.

It really can't realistically. That'd essentially mean breaking PITR.
You'd have to schedule the restart of any replicas to happen after a
specific record.  And what if there's replicas that are on a delay? What
if there's data centers that are currently offline?

And again, this isn't hard to do properly. I don't get why we're talking
about an at least operationally complex workaround when the proper
solution isn't hard.

Greetings,

Andres Freund


Re: Online enabling of checksums

От
Andres Freund
Дата:
Hi,

On 2018-08-01 18:25:48 +0200, Tomas Vondra wrote:
> Sure, if there are issues with this approach, that would make it
> unacceptable. I'm not sure why would it be an issue for replicas (which is
> what you mention elsewhere), considering those don't write data and so can't
> fail to update a checksum?

Standbys compute checksums on writeout as well, no? We compute checksums
not at buffer modification, but at writeout time. And replay just marks
buffers dirty, it doesn't directly write to disk.

Architecturally there'd also be hint bits as a source, but I think we
probably neutered them enough for that not to be a problem during
replay.

And then there's also promotions.

Greetings,

Andres Freund


Re: Online enabling of checksums

От
Bruce Momjian
Дата:
On Tue, Jul 31, 2018 at 04:05:23PM -0700, Andres Freund wrote:
> Hi,
> 
> On 2018-07-31 18:56:29 -0400, Alvaro Herrera wrote:
> > In the spirit of supporting incremental development, I think it's quite
> > sensible to get the current thing done, then see what it takes to get
> > the next thing done.  Each is an improvement on its own merits.  And it
> > doesn't have to be made by the same people.
> 
> I just don't buy this. An earlier version of this feature was committed
> to v11 without the restart, over objections. There's now extra state in
> the control file to support the restart based system, there's extra
> tests, extra docs. And it'd not be much code to just make it work
> without the restart. The process around this patchset is just plain
> weird.               ----------------------------------------------
  -----

This patchset is weird because it is perhaps our first case of trying to
change the state of the server while it is running.  We just don't have
an established protocol for how to orchestrate that, so we are limping
along toward a solution.  Forcing a restart is probably part of that
primitive orchestration.  We will probably have similar challenges if we
ever allowed Postgres to change its data format on the fly.  These
challenges are one reason pg_upgrade only modifies the new cluster,
never the old one.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +


Re: Online enabling of checksums

От
Andres Freund
Дата:
On 2018-08-01 12:36:13 -0400, Bruce Momjian wrote:
> This patchset is weird because it is perhaps our first case of trying to
> change the state of the server while it is running.  We just don't have
> an established protocol for how to orchestrate that, so we are limping
> along toward a solution.

There's a number of GUCs that do this? Even in related areas,
cf. full_page_writes.

Greetings,

Andres Freund


Re: Online enabling of checksums

От
Bruce Momjian
Дата:
On Wed, Aug  1, 2018 at 09:39:43AM -0700, Andres Freund wrote:
> On 2018-08-01 12:36:13 -0400, Bruce Momjian wrote:
> > This patchset is weird because it is perhaps our first case of trying to
> > change the state of the server while it is running.  We just don't have
> > an established protocol for how to orchestrate that, so we are limping
> > along toward a solution.
> 
> There's a number of GUCs that do this? Even in related areas,
> cf. full_page_writes.

How is that taking the server from one state to the next in a
non-instantaneous way?

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +


Re: Online enabling of checksums

От
Andres Freund
Дата:
Hi,

On 2018-08-01 12:45:27 -0400, Bruce Momjian wrote:
> On Wed, Aug  1, 2018 at 09:39:43AM -0700, Andres Freund wrote:
> > On 2018-08-01 12:36:13 -0400, Bruce Momjian wrote:
> > > This patchset is weird because it is perhaps our first case of trying to
> > > change the state of the server while it is running.  We just don't have
> > > an established protocol for how to orchestrate that, so we are limping
> > > along toward a solution.
> > 
> > There's a number of GUCs that do this? Even in related areas,
> > cf. full_page_writes.
> 
> How is that taking the server from one state to the next in a
> non-instantaneous way?

Because it requires coordination around checkpoints, it can only really
fully take effect after the start of a checkpoint.  And it even both has
primary and replica differences...

Greetings,

Andres Freund


Re: Online enabling of checksums

От
"Joshua D. Drake"
Дата:
On 08/01/2018 09:20 AM, Alvaro Herrera wrote:
>
>> my problem is that I think the "restart" approach is just using the
>> entirely wrong hammer to solve the problem at hand.  At the very least
>> it's very problematic in respect to replicas, which need to know about
>> the setting too, and can have similar problems the restart on the
>> primary is supposed to prevent.
> If we define "restart" to mean taking all the servers down
> simultaneously, that can be planned.

People in mission critical environments do not "restart all servers". 
They fail over to a secondary to do maintenance on a primary. When you 
have a system where you literally lose thousands of dollars every minute 
the database is down you can't do what you are proposing. When you have 
a system that if the database is down for longer than X minutes, you 
actually lose a whole day because all of the fabricators have to 
revalidate before they begin work, you can't do that. Granted that is 
not the majority (which you mention) but let's not forget them.

The one place where a restart does happen and will continue to happen 
for around 5 (3 if you incorporate pg_logical and 9.6) more years is 
upgrades. Although we have logical replication for upgrades now, we are 
5 years away from the majority of users being on a version of PostgreSQL 
that supports logical replication for upgrades. So, I can see an 
argument for an incremental approach because people could enable 
checksums as part of their upgrade restart.

> For users that cannot do that,
> that's too bad, they'll have to wait to the next release in order to
> enable checksums (assuming they fund the necessary development).  But

I have to say, as a proponent of funded development for longer than most 
I like to see this refreshing take on the fact that this all does take 
money.

> there are many systems where it *is* possible to take everything down
> for five seconds, then back up.  They can definitely take advantage of
> checksummed data.

This is a good point.

> Currently, the only way to enable checksums is to initdb and create a
> new copy of the data from a logical backup, which could take hours or
> even days if data is large, or use logical replication.

Originally, I was going to -1 how this is being implemented. I too wish 
we had the "ALTER DATABASE ENABLE CHECKSUM" or equivalent without a 
restart. However, being able to just restart is a huge step forward from 
what we have now.

Lastly, I think Alvaro has a point with the incremental development and 
I also think some others on this thread need to, "show me the patch" 
instead of being armchair directors of development.

JD



-- 
Command Prompt, Inc. || http://the.postgres.company/ || @cmdpromptinc
***  A fault and talent of mine is to tell it exactly how it is.  ***
PostgreSQL centered full stack support, consulting and development.
Advocate: @amplifypostgres || Learn: https://postgresconf.org
*****     Unless otherwise stated, opinions are my own.   *****



Re: Online enabling of checksums

От
Andres Freund
Дата:
Hi,

On 2018-08-01 10:34:55 -0700, Joshua D. Drake wrote:
> Lastly, I think Alvaro has a point with the incremental development and I
> also think some others on this thread need to, "show me the patch" instead
> of being armchair directors of development.

Oh, FFS. I pointed out the issue that led to the restart being
introduced (reminder, in the committed but then reverted version that
didn't exist). I explained how the much less intrusive version would
work. I think it's absurd to describe that as "armchair directors of
development".

Greetings,

Andres Freund


Re: Online enabling of checksums

От
Sergei Kornilov
Дата:
Hi

> They fail over to a secondary to do maintenance on a primary.
But this is not problem even in current patch state. We can restart replica before failover and it works. I tested this
behaviorduring my review.
 
We can:
- call pg_enable_data_checksums() on master
- wait change data_checksums to inprogress on replica
- restart replica - we can restart replica before promote, right?
- promote this replica
- checksum helper is launched now and working on this promoted cluster

regards, Sergei


Re: Online enabling of checksums

От
Andres Freund
Дата:
Hi,

On 2018-08-01 21:03:22 +0300, Sergei Kornilov wrote:
> > They fail over to a secondary to do maintenance on a primary.
> But this is not problem even in current patch state. We can restart replica before failover and it works. I tested
thisbehavior during my review.
 
> We can:
> - call pg_enable_data_checksums() on master
> - wait change data_checksums to inprogress on replica

That's *precisely* the problem. What if your replicas are delayed
(e.g. recovery_min_apply_delay)? How would you schedule that restart
properly?  What if you later need to do PITR?


> - restart replica - we can restart replica before promote, right?
> - promote this replica
> - checksum helper is launched now and working on this promoted cluster

This doesn't test the consequences of the restart being skipped, nor
does it review on a code level the correctness.

Greetings,

Andres Freund


Re: Online enabling of checksums

От
Sergei Kornilov
Дата:
Hi

> This doesn't test the consequences of the restart being skipped, nor
> does it review on a code level the correctness.
I check not only one stuff during review. I look code too: bgworker checksumhelper.c registered with:
> bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
And then process the whole cluster (even if we run checksumhelper before, but exit before its completed). Or
BgWorkerStart_RecoveryFinisheddoes not guarantee start only after recovery finished?
 
Before start any real work (and after recovery end) checksumhelper checked current cluster status again:

> +     * If a standby was restarted when in pending state, a background worker
> +     * was registered to start. If it's later promoted after the master has
> +     * completed enabling checksums, we need to terminate immediately and not
> +     * do anything. If the cluster is still in pending state when promoted,
> +     * the background worker should start to complete the job.

> What if your replicas are delayed (e.g. recovery_min_apply_delay)?
> What if you later need to do PITR?
if we start after replay pg_enable_data_checksums and before it completed - we plan start bgworker on recovery finish.
if we replay checksumhelper finish - we _can_ start checksumhelper again and this is handled during checksumhelper
start.

Behavior seems correct for me. I miss something very wrong?

regards, Sergei


Re: Online enabling of checksums

От
Tomas Vondra
Дата:
Hi,

While looking at the online checksum verification patch (which I guess
will get committed before this one), it occurred to me that disabling
checksums may need to be more elaborate, to protect against someone
using the stale flag value (instead of simply switching to "off"
assuming that's fine).

The signals etc. seem good enough for our internal stuff, but what if
someone uses the flag in a different way? E.g. the online checksum
verification runs as an independent process (i.e. not a backend) and
reads the control file to find out if the checksums are enabled or not.
So if we just switch from "on" to "off" that will break.

Of course, we may also say "Don't disable checksums while online
verification is running!" but that's not ideal.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Online enabling of checksums

От
Stephen Frost
Дата:
Greetings,

* Tomas Vondra (tomas.vondra@2ndquadrant.com) wrote:
> While looking at the online checksum verification patch (which I guess
> will get committed before this one), it occurred to me that disabling
> checksums may need to be more elaborate, to protect against someone
> using the stale flag value (instead of simply switching to "off"
> assuming that's fine).
>
> The signals etc. seem good enough for our internal stuff, but what if
> someone uses the flag in a different way? E.g. the online checksum
> verification runs as an independent process (i.e. not a backend) and
> reads the control file to find out if the checksums are enabled or not.
> So if we just switch from "on" to "off" that will break.
>
> Of course, we may also say "Don't disable checksums while online
> verification is running!" but that's not ideal.

I'm not really sure what else we could say here..?  I don't particularly
see an issue with telling people that if they disable checksums while
they're running a tool that's checking the checksums that they're going
to get odd results.

Thanks!

Stephen

Вложения

Re: Online enabling of checksums

От
Tomas Vondra
Дата:
On 09/29/2018 02:19 PM, Stephen Frost wrote:
> Greetings,
> 
> * Tomas Vondra (tomas.vondra@2ndquadrant.com) wrote:
>> While looking at the online checksum verification patch (which I guess
>> will get committed before this one), it occurred to me that disabling
>> checksums may need to be more elaborate, to protect against someone
>> using the stale flag value (instead of simply switching to "off"
>> assuming that's fine).
>>
>> The signals etc. seem good enough for our internal stuff, but what if
>> someone uses the flag in a different way? E.g. the online checksum
>> verification runs as an independent process (i.e. not a backend) and
>> reads the control file to find out if the checksums are enabled or not.
>> So if we just switch from "on" to "off" that will break.
>>
>> Of course, we may also say "Don't disable checksums while online
>> verification is running!" but that's not ideal.
> 
> I'm not really sure what else we could say here..?  I don't particularly
> see an issue with telling people that if they disable checksums while
> they're running a tool that's checking the checksums that they're going
> to get odd results.
> 

I don't know, to be honest. I was merely looking at the online
verification patch and realized that if someone disables checksums it
won't notice it (because it only reads the flag once, at the very
beginning) and will likely produce bogus errors.

Although, maybe it won't - it now uses a checkpoint LSN, so that might
fix it. The checkpoint LSN is read from the same controlfile as the
flag, so we know the checksums were enabled during that checkpoint. Soi
if we ignore failures with a newer LSN, that should do the trick, no?

So perhaps that's the right "protocol" to handle this?


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Online enabling of checksums

От
Stephen Frost
Дата:
Greetings,

* Tomas Vondra (tomas.vondra@2ndquadrant.com) wrote:
> On 09/29/2018 02:19 PM, Stephen Frost wrote:
> > * Tomas Vondra (tomas.vondra@2ndquadrant.com) wrote:
> >> While looking at the online checksum verification patch (which I guess
> >> will get committed before this one), it occurred to me that disabling
> >> checksums may need to be more elaborate, to protect against someone
> >> using the stale flag value (instead of simply switching to "off"
> >> assuming that's fine).
> >>
> >> The signals etc. seem good enough for our internal stuff, but what if
> >> someone uses the flag in a different way? E.g. the online checksum
> >> verification runs as an independent process (i.e. not a backend) and
> >> reads the control file to find out if the checksums are enabled or not.
> >> So if we just switch from "on" to "off" that will break.
> >>
> >> Of course, we may also say "Don't disable checksums while online
> >> verification is running!" but that's not ideal.
> >
> > I'm not really sure what else we could say here..?  I don't particularly
> > see an issue with telling people that if they disable checksums while
> > they're running a tool that's checking the checksums that they're going
> > to get odd results.
>
> I don't know, to be honest. I was merely looking at the online
> verification patch and realized that if someone disables checksums it
> won't notice it (because it only reads the flag once, at the very
> beginning) and will likely produce bogus errors.
>
> Although, maybe it won't - it now uses a checkpoint LSN, so that might
> fix it. The checkpoint LSN is read from the same controlfile as the
> flag, so we know the checksums were enabled during that checkpoint. Soi
> if we ignore failures with a newer LSN, that should do the trick, no?
>
> So perhaps that's the right "protocol" to handle this?

I certainly don't think we need to do anything more.

Thanks!

Stephen

Вложения

Re: Online enabling of checksums

От
Tomas Vondra
Дата:

On 09/29/2018 06:51 PM, Stephen Frost wrote:
> Greetings,
> 
> * Tomas Vondra (tomas.vondra@2ndquadrant.com) wrote:
>> On 09/29/2018 02:19 PM, Stephen Frost wrote:
>>> * Tomas Vondra (tomas.vondra@2ndquadrant.com) wrote:
>>>> While looking at the online checksum verification patch (which I guess
>>>> will get committed before this one), it occurred to me that disabling
>>>> checksums may need to be more elaborate, to protect against someone
>>>> using the stale flag value (instead of simply switching to "off"
>>>> assuming that's fine).
>>>>
>>>> The signals etc. seem good enough for our internal stuff, but what if
>>>> someone uses the flag in a different way? E.g. the online checksum
>>>> verification runs as an independent process (i.e. not a backend) and
>>>> reads the control file to find out if the checksums are enabled or not.
>>>> So if we just switch from "on" to "off" that will break.
>>>>
>>>> Of course, we may also say "Don't disable checksums while online
>>>> verification is running!" but that's not ideal.
>>>
>>> I'm not really sure what else we could say here..?  I don't particularly
>>> see an issue with telling people that if they disable checksums while
>>> they're running a tool that's checking the checksums that they're going
>>> to get odd results.
>>
>> I don't know, to be honest. I was merely looking at the online
>> verification patch and realized that if someone disables checksums it
>> won't notice it (because it only reads the flag once, at the very
>> beginning) and will likely produce bogus errors.
>>
>> Although, maybe it won't - it now uses a checkpoint LSN, so that might
>> fix it. The checkpoint LSN is read from the same controlfile as the
>> flag, so we know the checksums were enabled during that checkpoint. Soi
>> if we ignore failures with a newer LSN, that should do the trick, no?
>>
>> So perhaps that's the right "protocol" to handle this?
> 
> I certainly don't think we need to do anything more.
> 

Not sure I agree. I'm not suggesting we absolutely have to write huge
amount of code to deal with this issue, but I hope we agree we need to
at least understand the issue so that we can put warnings into docs.

FWIW pg_basebackup (in the default "verify checksums") has this issue
too AFAICS, and it seems rather unfriendly to just start reporting
checksum errors during backup in that case.

But as I mentioned, maybe there's no problem at all and using the
checkpoint LSN deals with it automatically.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Online enabling of checksums

От
Andres Freund
Дата:
On 2018-09-30 10:48:36 +0200, Tomas Vondra wrote:
> 
> 
> On 09/29/2018 06:51 PM, Stephen Frost wrote:
> > Greetings,
> > 
> > * Tomas Vondra (tomas.vondra@2ndquadrant.com) wrote:
> >> On 09/29/2018 02:19 PM, Stephen Frost wrote:
> >>> * Tomas Vondra (tomas.vondra@2ndquadrant.com) wrote:
> >>>> While looking at the online checksum verification patch (which I guess
> >>>> will get committed before this one), it occurred to me that disabling
> >>>> checksums may need to be more elaborate, to protect against someone
> >>>> using the stale flag value (instead of simply switching to "off"
> >>>> assuming that's fine).
> >>>>
> >>>> The signals etc. seem good enough for our internal stuff, but what if
> >>>> someone uses the flag in a different way? E.g. the online checksum
> >>>> verification runs as an independent process (i.e. not a backend) and
> >>>> reads the control file to find out if the checksums are enabled or not.
> >>>> So if we just switch from "on" to "off" that will break.
> >>>>
> >>>> Of course, we may also say "Don't disable checksums while online
> >>>> verification is running!" but that's not ideal.
> >>>
> >>> I'm not really sure what else we could say here..?  I don't particularly
> >>> see an issue with telling people that if they disable checksums while
> >>> they're running a tool that's checking the checksums that they're going
> >>> to get odd results.
> >>
> >> I don't know, to be honest. I was merely looking at the online
> >> verification patch and realized that if someone disables checksums it
> >> won't notice it (because it only reads the flag once, at the very
> >> beginning) and will likely produce bogus errors.
> >>
> >> Although, maybe it won't - it now uses a checkpoint LSN, so that might
> >> fix it. The checkpoint LSN is read from the same controlfile as the
> >> flag, so we know the checksums were enabled during that checkpoint. Soi
> >> if we ignore failures with a newer LSN, that should do the trick, no?
> >>
> >> So perhaps that's the right "protocol" to handle this?
> > 
> > I certainly don't think we need to do anything more.
> > 
> 
> Not sure I agree. I'm not suggesting we absolutely have to write huge
> amount of code to deal with this issue, but I hope we agree we need to
> at least understand the issue so that we can put warnings into docs.
> 
> FWIW pg_basebackup (in the default "verify checksums") has this issue
> too AFAICS, and it seems rather unfriendly to just start reporting
> checksum errors during backup in that case.
> 
> But as I mentioned, maybe there's no problem at all and using the
> checkpoint LSN deals with it automatically.

Given that this patch has not been developed in a few months, I don't
see why this has an active 2019-01 CF entry? I think we should mark this
as Returned With Feedback.

https://commitfest.postgresql.org/21/1535/

Greetings,

Andres Freund


Re: Online enabling of checksums

От
Magnus Hagander
Дата:
On Thu, Jan 31, 2019 at 11:57 AM Andres Freund <andres@anarazel.de> wrote:
On 2018-09-30 10:48:36 +0200, Tomas Vondra wrote:
>
>
> On 09/29/2018 06:51 PM, Stephen Frost wrote:
> > Greetings,
> >
> > * Tomas Vondra (tomas.vondra@2ndquadrant.com) wrote:
> >> On 09/29/2018 02:19 PM, Stephen Frost wrote:
> >>> * Tomas Vondra (tomas.vondra@2ndquadrant.com) wrote:
> >>>> While looking at the online checksum verification patch (which I guess
> >>>> will get committed before this one), it occurred to me that disabling
> >>>> checksums may need to be more elaborate, to protect against someone
> >>>> using the stale flag value (instead of simply switching to "off"
> >>>> assuming that's fine).
> >>>>
> >>>> The signals etc. seem good enough for our internal stuff, but what if
> >>>> someone uses the flag in a different way? E.g. the online checksum
> >>>> verification runs as an independent process (i.e. not a backend) and
> >>>> reads the control file to find out if the checksums are enabled or not.
> >>>> So if we just switch from "on" to "off" that will break.
> >>>>
> >>>> Of course, we may also say "Don't disable checksums while online
> >>>> verification is running!" but that's not ideal.
> >>>
> >>> I'm not really sure what else we could say here..?  I don't particularly
> >>> see an issue with telling people that if they disable checksums while
> >>> they're running a tool that's checking the checksums that they're going
> >>> to get odd results.
> >>
> >> I don't know, to be honest. I was merely looking at the online
> >> verification patch and realized that if someone disables checksums it
> >> won't notice it (because it only reads the flag once, at the very
> >> beginning) and will likely produce bogus errors.
> >>
> >> Although, maybe it won't - it now uses a checkpoint LSN, so that might
> >> fix it. The checkpoint LSN is read from the same controlfile as the
> >> flag, so we know the checksums were enabled during that checkpoint. Soi
> >> if we ignore failures with a newer LSN, that should do the trick, no?
> >>
> >> So perhaps that's the right "protocol" to handle this?
> >
> > I certainly don't think we need to do anything more.
> >
>
> Not sure I agree. I'm not suggesting we absolutely have to write huge
> amount of code to deal with this issue, but I hope we agree we need to
> at least understand the issue so that we can put warnings into docs.
>
> FWIW pg_basebackup (in the default "verify checksums") has this issue
> too AFAICS, and it seems rather unfriendly to just start reporting
> checksum errors during backup in that case.
>
> But as I mentioned, maybe there's no problem at all and using the
> checkpoint LSN deals with it automatically.

Given that this patch has not been developed in a few months, I don't
see why this has an active 2019-01 CF entry? I think we should mark this
as Returned With Feedback.

https://commitfest.postgresql.org/21/1535/


Unfortunately, I agree. I wish that wasn't the case, but due to things outside my control, that's what happened.

So yes, let's punt it.

//Magnus