Re: RFC: Add 'taint' field to pg_control.

Поиск
Список
Период
Сортировка
От Craig Ringer
Тема Re: RFC: Add 'taint' field to pg_control.
Дата
Msg-id CAMsr+YGJqHDP=HkLxAukhVz0R56MTfEj1++t8M-AWb+xFTwZqA@mail.gmail.com
обсуждение исходный текст
Ответ на RFC: Add 'taint' field to pg_control.  (Andres Freund <andres@anarazel.de>)
Ответы Re: RFC: Add 'taint' field to pg_control.  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
On 1 March 2018 at 05:43, Andres Freund <andres@anarazel.de> wrote:
Hi,

a significant number of times during investigations of bugs I wondered
whether running the cluster with various settings, or various tools
could've caused the issue at hand.  Therefore I'd like to propose adding
a 'tainted' field to pg_control, that contains some of the "history" of
the cluster. Individual bits inside that field that I can think of right
now are:
- pg_resetxlog was used non-passively on cluster
- ran with fsync=off
- ran with full_page_writes=off
- pg_upgrade was used

What do others think?


A huge +1 from me for the idea. I can't even count the number of black box "WTF did you DO?!?" servers I've looked at, where bizarre behaviour has turned out to be down to the user doing something very silly and not saying anything about it.

It's only some flags, so putting it in pg_control is arguably somewhat wasteful but so minor as to be of no real concern. And that's probably the best way to make sure it follows the cluster around no matter what backup/restore/copy mechanisms are used and how "clever" they try to be.

What I'd _really_ love would be to blow the scope of this up a bit and turn it into a key-events cluster journal, recording key param switches, recoveries (and lsn ranges), pg_upgrade's, etc. But then we'd run into people with weird workloads who managed to make it some massive file, we'd have to make sure we had a way to stop it getting left out of copies/backups, and it'd generally be irritating. So lets not do that. Proper support for class-based logging and multiple outputs would be a good solution for this at some future point.

What you propose is simple enough to be quick to implement, adds no admin overhead, and will be plenty useful enough.

I'd like to add "postmaster.pid was absent when the cluster started" to this list, please. Sure, it's not conclusive, and there are legit reasons why that might be the case, but so often it's somebody kill -9'ing the postmaster, then removing the postmaster.pid and starting up again without killing the workers....

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Daniel Gustafsson
Дата:
Сообщение: Re: Two small patches for the isolationtester lexer
Следующее
От: Craig Ringer
Дата:
Сообщение: Re: RFC: Add 'taint' field to pg_control.