Обсуждение: Perl modules for testing/viewing/corrupting/repairing your heap files
Hackers, Recently, as part of testing something else, I had need of a tool to create surgically precise corruption within heap pages. I wanted to make the corruption from within TAP tests, so I wrote the tool as a set of perl modules. The modules allow you to "tie" a perl array to a heap file, in essence thinking of the file as an array of heap pages. Each page within the file manifests as a tied perl hash, where each of the page header fields are an element in the hash, and the tuples in the page are an array of tied hashes, with each field in the tuple header as a field in that tied hash. This is all done in pure perl. There is no eXtended Subroutine component of this. The body of each tuple (stuff beyond the tuple header) is thought of merely as binary data. I haven't done any work to decode it into perl datastructures equivalent to integer, text, timestamp, etc., nor have I needed that functionality as yet. That seems doable as an extension of this work, at least if the caller passes tuple descriptor type information into the `tie @file` command. Stuff like the following example works in the implementation already completed. Note in particular that the file is bound in O_RDWR mode. That means it all gets written back to the underlying file and truly updates (corrupts) your data. It all also works in O_RDONLY mode, in which case the updates are made to a copy of the data in perl's memory, but none of it goes back to disk. Of course, nothing forces you to update anything. You could use this to read the fields from the file/page/tuple without making modifications. #!/usr/bin/perl use HeapTuple; use HeapPage; use HeapFile; use Fcntl; my @file; tie @file, 'HeapFile', path => 'base/12925/3599', pagesize => 8192, mode => O_RDWR; for my $page (@file) { $page->{pd_lsn_xrecoff}++; print $page->{pd_checksum}, "\n"; for (@{$page->{'tuples'}}) { $_->{HEAP_COMBOCID} = 1 if ($_->{HEAP_HASNULL}); $_->{t_xmin} = $_->{t_xmax} if $_->{HEAP_XMAX_COMMITTED}; } } untie @file; In my TAP test usage of these modules, I tend to fall into the pattern of: my $node = get_new_node('master'); $node->init; my $pgdata = $node->data_dir; $node->safe_psql('postgres', 'create table public.test (bar text)'); my $path = join('/', $pgdata, $node->safe_psql( 'postgres', "SELECT pg_relation_filepath('public.test')")); $node->stop; my @file; tie @file, 'HeapFile', path => $path, pagesize => 8192, mode => O_RDWR; # do some corruption $node->start; # do some queries against the corrupt table, see what happens For kicks, I just ran this one-liner and got many screenfuls of data. I'll just include the tail end: perl -e 'use HeapFile; tie @file, "HeapFile", path => "pgdata/base/12925/1255"; print(scalar(%$_)) for(@file);' BODY AS HEX ===> PRINTABLE ASCII ff 0f 06 00 00 00 00 00 ===> . . . . . . . . 47 20 00 00 46 06 46 43 ===> q 2 . . p l p g 49 47 06 05 3f 3d 06 06 ===> s q l _ c a l l 05 44 3d 06 40 06 41 48 ===> _ h a n d l e r 00 00 00 00 00 00 00 00 ===> . . . . . . . . 00 00 00 00 00 00 00 00 ===> . . . . . . . . 00 00 00 00 00 00 00 00 ===> . . . . . . . . 00 00 00 00 00 00 00 00 ===> . . . . . . . . 00 00 00 00 00 00 00 00 ===> . . . . . . . . 00 00 00 00 00 00 00 00 ===> . . . . . . . . 00 00 00 00 00 00 00 00 ===> . . . . . . . . 00 00 50 03 00 00 00 00 ===> . . . ? . . . . 00 00 00 00 00 00 00 00 ===> . . . . . . . . 42 00 00 00 00 4c 4b 00 ===> f . . . . v u . 00 00 00 00 00 08 00 00 ===> . . . . . . . . 3c 00 00 00 01 00 00 00 ===> ` . . . . . . . 00 00 00 00 01 00 00 00 ===> . . . . . . . . 00 00 00 00 00 00 00 00 ===> . . . . . . . . 02 46 06 46 43 49 47 06 ===> + p l p g s q l 05 3f 3d 06 06 05 44 3d ===> _ c a l l _ h a 06 40 06 41 48 15 18 06 ===> n d l e r ! $ l 45 3e 40 45 48 02 46 06 ===> i b d i r / p l 46 43 49 47 06 ===> p g s q l b6 01 00 00 t_xmin: 438 00 00 00 00 t_xmax: 0 02 00 00 00 t_field3: 2 00 00 bi_hi: 0 50 00 bi_lo: 80 06 00 ip_posid: 6 1d 00 t_infomask2: 29 Natts: 29 HEAP_KEYS_UPDATED: 0 HEAP_HOT_UPDATED: 0 HEAP_ONLY_TUPLE: 0 03 0b t_infomask: 2819 HEAP_HASNULL: 1 HEAP_HASVARWIDTH: 1 HEAP_HASEXTERNAL: 0 HEAP_HASOID_OLD: 0 HEAP_XMAX_KEYSHR_LOCK: 0 HEAP_COMBOCID: 0 HEAP_XMAX_EXCL_LOCK: 0 HEAP_XMAX_LOCK_ONLY: 0 HEAP_XMIN_COMMITTED: 1 HEAP_XMIN_INVALID: 1 HEAP_XMAX_COMMITTED: 0 HEAP_XMAX_INVALID: 1 HEAP_XMAX_IS_MULTI: 0 HEAP_UPDATED: 0 HEAP_MOVED_OFF: 0 HEAP_MOVED_IN: 0 20 t_hoff: 32 ffff0f06 NULL_BITFIELD: 11111111111111111111000001100 OID_OLD: BODY AS HEX ===> PRINTABLE ASCII ff 0f 06 00 00 00 00 00 ===> . . . . . . . . 48 20 00 00 46 06 46 43 ===> r 2 . . p l p g 49 47 06 05 45 06 06 45 ===> s q l _ i n l i 06 41 05 44 3d 06 40 06 ===> n e _ h a n d l 41 48 00 00 00 00 00 00 ===> e r . . . . . . 00 00 00 00 00 00 00 00 ===> . . . . . . . . 00 00 00 00 00 00 00 00 ===> . . . . . . . . 00 00 00 00 00 00 00 00 ===> . . . . . . . . 00 00 00 00 00 00 00 00 ===> . . . . . . . . 00 00 00 00 00 00 00 00 ===> . . . . . . . . 00 00 00 00 00 00 00 00 ===> . . . . . . . . 00 00 50 03 00 00 00 00 ===> . . . ? . . . . 00 00 00 00 00 00 00 00 ===> . . . . . . . . 42 00 00 01 00 4c 4b 00 ===> f . . . . v u . 01 00 00 00 00 08 00 00 ===> . . . . . . . . 46 00 00 00 01 00 00 00 ===> p . . . . . . . 00 00 00 00 01 00 00 00 ===> . . . . . . . . 01 00 00 00 00 00 00 00 ===> . . . . . . . . 00 08 00 00 02 46 06 46 ===> . . . . / p l p 43 49 47 06 05 45 06 06 ===> g s q l _ i n l 45 06 41 05 44 3d 06 40 ===> i n e _ h a n d 06 41 48 15 18 06 45 3e ===> l e r ! $ l i b 40 45 48 02 46 06 46 43 ===> d i r / p l p g 49 47 06 ===> s q l b6 01 00 00 t_xmin: 438 00 00 00 00 t_xmax: 0 03 00 00 00 t_field3: 3 00 00 bi_hi: 0 50 00 bi_lo: 80 07 00 ip_posid: 7 1d 00 t_infomask2: 29 Natts: 29 HEAP_KEYS_UPDATED: 0 HEAP_HOT_UPDATED: 0 HEAP_ONLY_TUPLE: 0 03 0b t_infomask: 2819 HEAP_HASNULL: 1 HEAP_HASVARWIDTH: 1 HEAP_HASEXTERNAL: 0 HEAP_HASOID_OLD: 0 HEAP_XMAX_KEYSHR_LOCK: 0 HEAP_COMBOCID: 0 HEAP_XMAX_EXCL_LOCK: 0 HEAP_XMAX_LOCK_ONLY: 0 HEAP_XMIN_COMMITTED: 1 HEAP_XMIN_INVALID: 1 HEAP_XMAX_COMMITTED: 0 HEAP_XMAX_INVALID: 1 HEAP_XMAX_IS_MULTI: 0 HEAP_UPDATED: 0 HEAP_MOVED_OFF: 0 HEAP_MOVED_IN: 0 20 t_hoff: 32 ffff0f06 NULL_BITFIELD: 11111111111111111111000001100 OID_OLD: BODY AS HEX ===> PRINTABLE ASCII ff 0f 06 00 00 00 00 00 ===> . . . . . . . . 49 20 00 00 46 06 46 43 ===> s 2 . . p l p g 49 47 06 05 4c 3d 06 45 ===> s q l _ v a l i 40 3d 4a 06 48 00 00 00 ===> d a t o r . . . 00 00 00 00 00 00 00 00 ===> . . . . . . . . 00 00 00 00 00 00 00 00 ===> . . . . . . . . 00 00 00 00 00 00 00 00 ===> . . . . . . . . 00 00 00 00 00 00 00 00 ===> . . . . . . . . 00 00 00 00 00 00 00 00 ===> . . . . . . . . 00 00 00 00 00 00 00 00 ===> . . . . . . . . 00 00 00 00 00 00 00 00 ===> . . . . . . . . 00 00 50 03 00 00 00 00 ===> . . . ? . . . . 00 00 00 00 00 00 00 00 ===> . . . . . . . . 42 00 00 01 00 4c 4b 00 ===> f . . . . v u . 01 00 00 00 00 08 00 00 ===> . . . . . . . . 46 00 00 00 01 00 00 00 ===> p . . . . . . . 00 00 00 00 01 00 00 00 ===> . . . . . . . . 01 00 00 00 00 00 00 00 ===> . . . . . . . . 01 00 00 00 19 46 06 46 ===> . . . . % p l p 43 49 47 06 05 4c 3d 06 ===> g s q l _ v a l 45 40 3d 4a 06 48 15 18 ===> i d a t o r ! $ 06 45 3e 40 45 48 02 46 ===> l i b d i r / p 06 46 43 49 47 06 ===> l p g s q l Is there any interest in this stuff, and if so, where should it live? I'm happy to reorganize this a bit if there is general interest in such a submission. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Not having received any feedback on this, I've dusted the modules off for submission as-is. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Вложения
On Wed, Apr 8, 2020 at 3:51 PM Mark Dilger <mark.dilger@enterprisedb.com> wrote: > Recently, as part of testing something else, I had need of a tool to create > surgically precise corruption within heap pages. I wanted to make the > corruption from within TAP tests, so I wrote the tool as a set of perl modules. There is also pg_hexedit: https://github.com/petergeoghegan/pg_hexedit -- Peter Geoghegan
> On Apr 14, 2020, at 6:17 PM, Peter Geoghegan <pg@bowt.ie> wrote: > > On Wed, Apr 8, 2020 at 3:51 PM Mark Dilger <mark.dilger@enterprisedb.com> wrote: >> Recently, as part of testing something else, I had need of a tool to create >> surgically precise corruption within heap pages. I wanted to make the >> corruption from within TAP tests, so I wrote the tool as a set of perl modules. > > There is also pg_hexedit: > > https://github.com/petergeoghegan/pg_hexedit I steered away from software released under the GPL, such as pg_hexedit, owing to difficulties in getting anything I developaccepted. (That's a hard enough problem without licensing issues.). I'm not taking a political stand for or againstthe GPL here, just a pragmatic position that I wouldn't be able to integrate pg_hexedit into a postgres submission. (Thanks for writing pg_hexedit, BTW. I'm not criticizing it.) The purpose of these perl modules is not the viewing of files, but the intentional and targeted corruption of files fromwithin TAP tests. There are limited examples of tests in the postgres source tree that intentionally corrupt files,and as I read them, they employ a blunt force trauma approach: In src/bin/pg_basebackup/t/010_pg_basebackup.pl: > # induce corruption > system_or_bail 'pg_ctl', '-D', $pgdata, 'stop'; > open $file, '+<', "$pgdata/$file_corrupt1"; > seek($file, $pageheader_size, 0); > syswrite($file, "\0\0\0\0\0\0\0\0\0"); > close $file; > system_or_bail 'pg_ctl', '-D', $pgdata, 'start'; In src/bin/pg_checksums/t/002_actions.pl: > # Time to create some corruption > open my $file, '+<', "$pgdata/$file_corrupted"; > seek($file, $pageheader_size, 0); > syswrite($file, "\0\0\0\0\0\0\0\0\0"); > close $file; These blunt force trauma tests are fine, as far as they go. But I wanted to be able to do things like # Corrupt the tuple to look like it has lots of attributes, some of # them null. This falsely creates the impression that the t_bits # array is longer than just one byte, but t_hoff still says otherwise. $tup->{HEAP_HASNULL} = 1; $tup->{HEAP_NATTS_MASK} = 0x3FF; $tup->{t_bits} = 0xAA; or # Same as above, but this time t_hoff plays along $tup->{HEAP_HASNULL} = 1; $tup->{HEAP_NATTS_MASK} = 0x3FF; $tup->{t_bits} = 0xAA; $tup->{t_hoff} = 32; That's hard to do from a TAP test without modules like this, as you have to calculate by hand the offsets where you're goingto write the corruption, and the bit pattern you are going to write to that location. Even if you do all that, nobodyelse is likely going to be able to read and maintain your tests. I'd like an easy way from within TAP tests to selectively corrupt files, to test whether various parts of the system failgracefully in the presence of corruption. What happens when a child partition is corrupted? Does that impact queriesthat only access other partitions? What kinds of corruption cause pg_upgrade to fail? ...to expand the scope of thecorruption? What happens to logical replication when there is corruption on the primary? ...on the standby? What kindsof corruption cause a query to return data from neighboring tuples that the querying role has not permission to view? What happens when a NAS is only intermittently corrupt? The modules I've submitted thus far are incomplete for this purpose. They don't yet handle toast tables, btree, hash, gist,gin, fsm, or vm, and I might be forgetting a few other things in the list. Before I go and implement all of that, Ithought perhaps others would express preferences about how this should all work, even stuff like, "Don't bother implementingthat in perl, as I'm reimplementing the entire testing structure in COBOL", or similarly unexpected feedback. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Apr 15, 2020 at 7:22 AM Mark Dilger <mark.dilger@enterprisedb.com> wrote: > I steered away from software released under the GPL, such as pg_hexedit, owing to difficulties in getting anything I developaccepted. (That's a hard enough problem without licensing issues.). I'm not taking a political stand for or againstthe GPL here, just a pragmatic position that I wouldn't be able to integrate pg_hexedit into a postgres submission. > > (Thanks for writing pg_hexedit, BTW. I'm not criticizing it.) The only reason that pg_hexedit is under the GPL is that it's derived from pg_filedump, which was and is also GPL 2. Note that pg_filedump is hosted on community resources, and is something that index access methods know about and try not to break (grep for pg_filedump in the Postgres source code). pg_hexedit supports all index access methods with the core distribution, including even the unpopular ones, like SP-GiST. > That's hard to do from a TAP test without modules like this, as you have to calculate by hand the offsets where you'regoing to write the corruption, and the bit pattern you are going to write to that location. Even if you do all that,nobody else is likely going to be able to read and maintain your tests. Logical corruption is almost inherently a once-off thing. I think that a tool like pg_hexedit is useful for seeing how the system behaves with certain novel kinds of logical corruption, which it will tolerate to varying degrees and with diverse symptoms. Pretty much for investigating on a once-off basis. I have occasionally wished for an SQL-like interface to bufpage.c routines like PageIndexTupleDelete(), PageRepairFragmentation(), etc. That would probably be a great deal more maintainable than what you propose to do. It's not really equivalent, of course, but it would give tests a way to dynamically manipulate/damage pages at the "logical level". That seems like the thing that's hard to simulate right now. -- Peter Geoghegan