Обсуждение: crash testing suggestions for 12 beta 1
Now that beta is out, I wanted to do some crash-recovery testing where I inject PANIC-inducing faults and see if it recovers correctly. A long-lived Perl process keeps track of what it should find after the crash, and verifies that it finds it. You will probably be familiar with the general theme from examples like the threads below. Would anyone like to nominate some areas to focus on? I think the pluggable storage refactoring work will be get inherently tested, so I'm not planning designing test specifically for that (unless there is a non-core plugin I should test with). Making the ctid be tie-breakers in btree index is also tested inherently (plus I think Peter tested that pretty thoroughly himself with similar methods). I've already tested declarative partitioning where the tuples do a lot of migrating, and tested prepared transactions. Any other suggestions for changes that might be risky and should be specifically targeted for testing?
Cheers,
Jeff
On Thu, May 23, 2019 at 8:24 AM Jeff Janes <jeff.janes@gmail.com> wrote: > Now that beta is out, I wanted to do some crash-recovery testing where I inject PANIC-inducing faults and see if it recoverscorrectly. Thank you for doing this. It's important work. > Making the ctid be tie-breakers in btree index is also tested inherently (plus I think Peter tested that pretty thoroughlyhimself with similar methods). As you may know, the B-Tree code has a tendency to soldier on when an index is corrupt. "Moving right" tends to conceal problems beyond concurrent page splits. I didn't do very much fault injection type testing with the B-Tree enhancements, but I did lean on amcheck heavily during development. Note that a new, extremely thorough option called "rootdescend" verification was added following the v12 work: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=c1afd175b5b2e5c44f6da34988342e00ecdfb518 It probably wouldn't add noticeable overhead to use this during your testing, and maybe to combine it with the "heapallindexed" option, while using the bt_index_parent_check() variant -- that will detect almost any imaginable index corruption. Admittedly, amcheck didn't find any bugs in my code after the first couple of versions of the patch series, so this approach seems unlikely to find any problems now. Even still, it wouldn't be very difficult to do this extra step. It seems worthwhile to be thorough here, given that we depend on the B-Tree code so heavily. -- Peter Geoghegan
On 2019-May-23, Jeff Janes wrote: > Now that beta is out, I wanted to do some crash-recovery testing where I > inject PANIC-inducing faults and see if it recovers correctly. A long-lived > Perl process keeps track of what it should find after the crash, and > verifies that it finds it. You will probably be familiar with the general > theme from examples like the threads below. Would anyone like to nominate > some areas to focus on? Thanks for the offer! Your work has showed its value in previous cycles. REINDEX CONCURRENTLY would be one good area to focus on, I think, as well as ALTER TABLE ATTACH PARTITION. Maybe also INCLUDE columns in GiST, and the stuff in commits 9155580fd, fe280694d and 7df159a62. Thanks, -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, Jun 5, 2019 at 2:11 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > REINDEX CONCURRENTLY would be one good area to focus on, I think, as > well as ALTER TABLE ATTACH PARTITION. Maybe also INCLUDE columns in > GiST, and the stuff in commits 9155580fd, fe280694d and 7df159a62. Those all seem like good things to target. Forgive me for droning on about amcheck once more, but maybe it'll help: amcheck has the capability to detect at least two historic bugs in CREATE INDEX CONCURRENTLY that made it into stable releases. The "heapallindexed" verification option's bt_tuple_present_callback() function has a header comment that talks about this. In short, any "unhandled" broken hot chain (i.e. broken hot chains that are somehow not correctly detected and handled) should be reported as corrupt by amcheck with the "heapallindexed" check, provided the tuple is visible to verification's heap scan. The CREATE INDEX CONCURRENTLY bug that Pavan found a couple of years back while testing the WARM patch is one example. A bug that was fallout from the DROP INDEX CONCURRENTLY work is another historic example. Alvaro will recall that this same check had a role in the "freeze the dead" business. -- Peter Geoghegan
On Wed, Jun 05, 2019 at 02:32:49PM -0700, Peter Geoghegan wrote: > Forgive me for droning on about amcheck once more, but maybe it'll > help: amcheck has the capability to detect at least two historic bugs > in CREATE INDEX CONCURRENTLY that made it into stable releases. The > "heapallindexed" verification option's bt_tuple_present_callback() > function has a header comment that talks about this. In short, any > "unhandled" broken hot chain (i.e. broken hot chains that are somehow > not correctly detected and handled) should be reported as corrupt by > amcheck with the "heapallindexed" check, provided the tuple is visible > to verification's heap scan. > > The CREATE INDEX CONCURRENTLY bug that Pavan found a couple of years > back while testing the WARM patch is one example. A bug that was > fallout from the DROP INDEX CONCURRENTLY work is another historic > example. Alvaro will recall that this same check had a role in the > "freeze the dead" business. REINDEX CONCURRENTLY is mostly a mapping of CREATE INDEX CONCURRENTLY + relation swapping + DROP INDEX CONCURRENTLY separated by multiple transactions. In my opinion, the swapping part which renames the indexes and switches the dependencies is the most interesting of the whole set because that's completely new. Are you planning to make sanity checks using pg_catcheck or such? -- Michael