Обсуждение: crash testing suggestions for 12 beta 1

Поиск

Список

Период

Сортировка

crash testing suggestions for 12 beta 1

От

Jeff Janes

Дата:

23 мая 2019 г., 15:24:12

Now that beta is out, I wanted to do some crash-recovery testing where I inject PANIC-inducing faults and see if it recovers correctly. A long-lived Perl process keeps track of what it should find after the crash, and verifies that it finds it. You will probably be familiar with the general theme from examples like the threads below. Would anyone like to nominate some areas to focus on? I think the pluggable storage refactoring work will be get inherently tested, so I'm not planning designing test specifically for that (unless there is a non-core plugin I should test with). Making the ctid be tie-breakers in btree index is also tested inherently (plus I think Peter tested that pretty thoroughly himself with similar methods). I've already tested declarative partitioning where the tuples do a lot of migrating, and tested prepared transactions. Any other suggestions for changes that might be risky and should be specifically targeted for testing?

https://www.postgresql.org/message-id/CAMkU=1xEUuBphDwDmB1WjN4+td4kpnEniFaTBxnk1xzHCw8_OQ@mail.gmail.com

https://www.postgresql.org/message-id/CAMkU=1xBP8cqdS5eK8APHL=X6RHMMM2vG5g+QamduuTsyCwv9g@mail.gmail.com

Cheers,

Jeff

Re: crash testing suggestions for 12 beta 1

От

Peter Geoghegan

Дата:

23 мая 2019 г., 15:55:02

On Thu, May 23, 2019 at 8:24 AM Jeff Janes <jeff.janes@gmail.com> wrote:
> Now that beta is out, I wanted to do some crash-recovery testing where I inject PANIC-inducing faults and see if it
recoverscorrectly.

Thank you for doing this. It's important work.

> Making the ctid be tie-breakers in btree index is also tested inherently (plus I think Peter tested that pretty
thoroughlyhimself with similar methods).

As you may know, the B-Tree code has a tendency to soldier on when an
index is corrupt. "Moving right" tends to conceal problems beyond
concurrent page splits. I didn't do very much fault injection type
testing with the B-Tree enhancements, but I did lean on amcheck
heavily during development. Note that a new, extremely thorough option
called "rootdescend" verification was added following the v12 work:

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=c1afd175b5b2e5c44f6da34988342e00ecdfb518

It probably wouldn't add noticeable overhead to use this during your
testing, and maybe to combine it with the "heapallindexed" option,
while using the bt_index_parent_check() variant -- that will detect
almost any imaginable index corruption. Admittedly, amcheck didn't
find any bugs in my code after the first couple of versions of the
patch series, so this approach seems unlikely to find any problems
now. Even still, it wouldn't be very difficult to do this extra step.
It seems worthwhile to be thorough here, given that we depend on the
B-Tree code so heavily.

--
Peter Geoghegan

Re: crash testing suggestions for 12 beta 1

От

Alvaro Herrera

Дата:

05 июня 2019 г., 21:11:38

On 2019-May-23, Jeff Janes wrote:

> Now that beta is out, I wanted to do some crash-recovery testing where I
> inject PANIC-inducing faults and see if it recovers correctly. A long-lived
> Perl process keeps track of what it should find after the crash, and
> verifies that it finds it.  You will probably be familiar with the general
> theme from examples like the threads below.  Would anyone like to nominate
> some areas to focus on?

Thanks for the offer!  Your work has showed its value in previous cycles.

REINDEX CONCURRENTLY would be one good area to focus on, I think, as
well as ALTER TABLE ATTACH PARTITION.  Maybe also INCLUDE columns in
GiST, and the stuff in commits 9155580fd, fe280694d and 7df159a62.

Thanks,

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: crash testing suggestions for 12 beta 1

От

Peter Geoghegan

Дата:

05 июня 2019 г., 21:32:49

On Wed, Jun 5, 2019 at 2:11 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> REINDEX CONCURRENTLY would be one good area to focus on, I think, as
> well as ALTER TABLE ATTACH PARTITION.  Maybe also INCLUDE columns in
> GiST, and the stuff in commits 9155580fd, fe280694d and 7df159a62.

Those all seem like good things to target.

Forgive me for droning on about amcheck once more, but maybe it'll
help: amcheck has the capability to detect at least two historic bugs
in CREATE INDEX CONCURRENTLY that made it into stable releases. The
"heapallindexed" verification option's bt_tuple_present_callback()
function has a header comment that talks about this. In short, any
"unhandled" broken hot chain (i.e. broken hot chains that are somehow
not correctly detected and handled) should be reported as corrupt by
amcheck with the "heapallindexed" check, provided the tuple is visible
to verification's heap scan.

The CREATE INDEX CONCURRENTLY bug that Pavan found a couple of years
back while testing the WARM patch is one example. A bug that was
fallout from the DROP INDEX CONCURRENTLY work is another historic
example. Alvaro will recall that this same check had a role in the
"freeze the dead" business.

-- 
Peter Geoghegan

Re: crash testing suggestions for 12 beta 1

От

Michael Paquier

Дата:

06 июня 2019 г., 07:31:43

On Wed, Jun 05, 2019 at 02:32:49PM -0700, Peter Geoghegan wrote:
> Forgive me for droning on about amcheck once more, but maybe it'll
> help: amcheck has the capability to detect at least two historic bugs
> in CREATE INDEX CONCURRENTLY that made it into stable releases. The
> "heapallindexed" verification option's bt_tuple_present_callback()
> function has a header comment that talks about this. In short, any
> "unhandled" broken hot chain (i.e. broken hot chains that are somehow
> not correctly detected and handled) should be reported as corrupt by
> amcheck with the "heapallindexed" check, provided the tuple is visible
> to verification's heap scan.
>
> The CREATE INDEX CONCURRENTLY bug that Pavan found a couple of years
> back while testing the WARM patch is one example. A bug that was
> fallout from the DROP INDEX CONCURRENTLY work is another historic
> example. Alvaro will recall that this same check had a role in the
> "freeze the dead" business.

REINDEX CONCURRENTLY is mostly a mapping of CREATE INDEX CONCURRENTLY
+ relation swapping + DROP INDEX CONCURRENTLY separated by multiple
transactions.  In my opinion, the swapping part which renames the
indexes and switches the dependencies is the most interesting of the
whole set because that's completely new.

Are you planning to make sanity checks using pg_catcheck or such?
--
Michael

Вложения

signature.asc

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: crash testing suggestions for 12 beta 1

crash testing suggestions for 12 beta 1

Re: crash testing suggestions for 12 beta 1

Re: crash testing suggestions for 12 beta 1

Re: crash testing suggestions for 12 beta 1

Re: crash testing suggestions for 12 beta 1

Вложения