Re: Online enabling of checksums

Поиск

Список

Период

Сортировка

От	Tomas Vondra
Тема	Re: Online enabling of checksums
Дата	4 апреля 2018 г. 04:11:49
Msg-id	82296c92-7124-dc5b-4f00-053f302d7233@2ndquadrant.com обсуждение исходный текст
Ответ на	Re: Online enabling of checksums (Magnus Hagander <magnus@hagander.net>)
Ответы	Re: Online enabling of checksums (Magnus Hagander <magnus@hagander.net>)
Список	pgsql-hackers

Дерево обсуждения

On 04/03/2018 02:05 PM, Magnus Hagander wrote:
> On Sun, Apr 1, 2018 at 2:04 PM, Magnus Hagander <magnus@hagander.net
> <mailto:magnus@hagander.net>> wrote:
> 
>     On Sat, Mar 31, 2018 at 5:38 PM, Tomas Vondra
>     <tomas.vondra@2ndquadrant.com <mailto:tomas.vondra@2ndquadrant.com>>
>     wrote:
> 
>         On 03/31/2018 05:05 PM, Magnus Hagander wrote:
>         > On Sat, Mar 31, 2018 at 4:21 PM, Tomas Vondra
>         > <tomas.vondra@2ndquadrant.com
>         <mailto:tomas.vondra@2ndquadrant.com>
>         <mailto:tomas.vondra@2ndquadrant.com
>         <mailto:tomas.vondra@2ndquadrant.com>>> wrote:
>         >
>         > ...
>         >
>         >     I do think just waiting for all running transactions to complete is
>         >     fine, and it's not the first place where we use it - CREATE SUBSCRIPTION
>         >     does pretty much exactly the same thing (and CREATE INDEX CONCURRENTLY
>         >     too, to some extent). So we have a precedent / working code we can copy.
>         >
>         >
>         > Thinking again, I don't think it should be done as part of
>         > BuildRelationList(). We should just do it once in the launcher before
>         > starting, that'll be both easier and cleaner. Anything started after
>         > that will have checksums on it, so we should be fine.
>         >
>         > PFA one that does this.
>         >
> 
>         Seems fine to me. I'd however log waitforxid, not the oldest one. If
>         you're a DBA and you want to make the checksumming to proceed,
>         knowing
>         the oldest running XID is useless for that. If we log
>         waitforxid, it can
>         be used to query pg_stat_activity and interrupt the sessions
>         somehow.
> 
> 
>     Yeah, makes sense. Updated.
> 
>      
> 
>         >     >     And if you try this with a temporary table (not
>         hidden in transaction,
>         >     >     so the bgworker can see it), the worker will fail
>         with this:
>         >     >
>         >     >       ERROR:  cannot access temporary tables of other
>         sessions
>         >     >
>         >     >     But of course, this is just another way how to crash
>         without updating
>         >     >     the result for the launcher, so checksums may end up
>         being enabled
>         >     >     anyway.
>         >     >
>         >     >
>         >     > Yeah, there will be plenty of side-effect issues from that
>         >     > crash-with-wrong-status case. Fixing that will at least
>         make things
>         >     > safer -- in that checksums won't be enabled when not put
>         on all pages. 
>         >     >
>         >
>         >     Sure, the outcome with checksums enabled incorrectly is a
>         consequence of
>         >     bogus status, and fixing that will prevent that. But that
>         wasn't my main
>         >     point here - not articulated very clearly, though.
>         >
>         >     The bigger question is how to handle temporary tables
>         gracefully, so
>         >     that it does not terminate the bgworker like this at all.
>         This might be
>         >     even bigger issue than dropped relations, considering that
>         temporary
>         >     tables are pretty common part of applications (and it also
>         includes
>         >     CREATE/DROP).
>         >
>         >     For some clusters it might mean the online checksum
>         enabling would
>         >     crash+restart infinitely (well, until reaching MAX_ATTEMPTS).
>         >
>         >     Unfortunately, try_relation_open() won't fix this, as the
>         error comes
>         >     from ReadBufferExtended. And it's not a matter of simply
>         creating a
>         >     ReadBuffer variant without that error check, because
>         temporary tables
>         >     use local buffers.
>         >
>         >     I wonder if we could just go and set the checksums anyway,
>         ignoring the
>         >     local buffers. If the other session does some changes,
>         it'll overwrite
>         >     our changes, this time with the correct checksums. But it
>         seems pretty
>         >     dangerous (I mean, what if they're writing stuff while
>         we're updating
>         >     the checksums? Considering the various short-cuts for
>         temporary tables,
>         >     I suspect that would be a boon for race conditions.)
>         >
>         >     Another option would be to do something similar to running
>         transactions,
>         >     i.e. wait until all temporary tables (that we've seen at
>         the beginning)
>         >     disappear. But we're starting to wait on more and more stuff.
>         >
>         >     If we do this, we should clearly log which backends we're
>         waiting for,
>         >     so that the admins can go and interrupt them manually.
>         >
>         >
>         >
>         > Yeah, waiting for all transactions at the beginning is pretty
>         simple.
>         >
>         > Making the worker simply ignore temporary tables would also be
>         easy.
>         >
>         > One of the bigger issues here is temporary tables are
>         *session* scope
>         > and not transaction, so we'd actually need the other session
>         to finish,
>         > not just the transaction.
>         >
>         > I guess what we could do is something like this:
>         >
>         > 1. Don't process temporary tables in the checksumworker, period.
>         > Instead, build a list of any temporary tables that existed
>         when the
>         > worker started in this particular database (basically anything
>         that we
>         > got in our scan). Once we have processed the complete
>         database, keep
>         > re-scanning pg_class until those particular tables are gone
>         (search by oid).
>         >
>         > That means that any temporary tables that are created *while*
>         we are
>         > processing a database are ignored, but they should already be
>         receiving
>         > checksums.
>         >
>         > It definitely leads to a potential issue with long running
>         temp tables.
>         > But as long as we look at the *actual tables* (by oid), we
>         should be
>         > able to handle long-running sessions once they have dropped
>         their temp
>         > tables.
>         >
>         > Does that sound workable to you?
>         >
> 
>         Yes, that's pretty much what I meant by 'wait until all
>         temporary tables
>         disappear'. Again, we need to make it easy to determine which
>         OIDs are
>         we waiting for, which sessions may need DBA's attention.
> 
>         I don't think it makes sense to log OIDs of the temporary
>         tables. There
>         can be many of them, and in most cases the connection/session is
>         managed
>         by the application, so the only thing you can do is kill the
>         connection.
> 
> 
>     Yeah, agreed. I think it makes sense to show the *number* of temp
>     tables. That's also a predictable amount of information -- logging
>     all temp tables may as you say lead to an insane amount of data.
> 
>     PFA a patch that does this. I've also added some docs for it.
> 
>     And I also noticed pg_verify_checksums wasn't installed, so fixed
>     that too.
> 
> 
> PFA a rebase on top of the just committed verify-checksums patch. 
> 

This seems OK in terms of handling errors in the worker and passing it
to the launcher. I haven't managed to do any crash testing today, but
code-wise it seems sane.

It however still fails to initialize the attempts field after allocating
the db entry in BuildDatabaseList, so if you try running with
-DRANDOMIZE_ALLOCATED_MEMORY it'll get initialized to values like this:

 WARNING:  attempts = -1684366952
 WARNING:  attempts = 1010514489
 WARNING:  attempts = -1145390664
 WARNING:  attempts = 1162101570

I guess those are not the droids we're looking for?

Likewise, I don't see where ChecksumHelperShmemStruct->abort gets
initialized. I think it only ever gets set in launcher_exit(), but that
does not seem sufficient. I suspect it's the reason for this behavior:

    test=# select pg_enable_data_checksums(10, 10);
    ERROR:  database "template0" does not allow connections
    HINT:  Allow connections using ALTER DATABASE and try again.
    test=# alter database template0 allow_connections = true;
    ALTER DATABASE
    test=# select pg_enable_data_checksums(10, 10);
    ERROR:  could not start checksumhelper: already running
    test=# select pg_disable_data_checksums();
     pg_disable_data_checksums
    ---------------------------

    (1 row)

    test=# select pg_enable_data_checksums(10, 10);
    ERROR:  could not start checksumhelper: has been cancelled

At which point the only thing you can do is restarting the cluster,
which seems somewhat unnecessary. But perhaps it's intentional?

Attached is a diff with a couple of minor comment tweaks, and correct
initialization of the attempts field.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вложения

checksums-tweaks.diff

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Thomas Munro
Дата: 04 апреля 2018 г., 04:07:10
Сообщение: Re: Optimize Arm64 crc32c implementation in Postgresql

Следующее

От: Andres Freund
Дата: 04 апреля 2018 г., 05:01:50
Сообщение: Re: [HACKERS] Restrict concurrent update/delete with UPDATE ofpartition key

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Online enabling of checksums

Вложения

Предыдущее

Следующее