On Mon, Mar 16, 2020 at 12:38:35PM -0700, Andres Freund wrote:
> > We have ruled out (?) below-PG hardware problems by a
> > successful run of:
> >
> > cp -rv —preserve=all /var/lib/postgresql/9.6 /tmp/
>
> FWIW, I don't think that rules out hardware problems at all. In plenty
> cases of corruption you can just end up with corrupted on-disk data
> (swapped blocks, zeroed blocks, randomly different values ...).
... hence the (?) ...
> But obviously it is not at all guaranteed that is the case. Could you
> describe the "history" of the database? Replication set up, failovers,
> etc?
No replication, no failovers.
There may have been hard shutdowns as in power failure but
there's no history of that to relate.
> A plpgsql function that returns the rows one-by-one and catches the
> exception is probably your best bet.
We have done that (in Python) for good measure during
recovery procedures.
> It could roughly look something like:
>
> CREATE OR REPLACE FUNCTION salvage(p_tblname regclass)
> RETURNS SETOF text
> LANGUAGE plpgsql AS
> $$
> DECLARE
> v_row record;
> BEGIN
> FOR v_row IN EXECUTE 'SELECT * FROM '||p_tblname::text LOOP
> BEGIN
> -- this forces detoasting
> RETURN NEXT v_row::text;
> EXCEPTION WHEN internal_error OR data_corrupted OR index_corrupted THEN
> -- add pkey or something else
> RAISE NOTICE 'failed to return data';
> END;
> END LOOP;
> END
> $$
>
> should work. You can call it like
> SELECT (salvaged_rec.rec).* FROM (SELECT salvaged_text::salvage_me FROM salvage('salvage_me') AS salvaged_text) AS
salvaged_rec(rec)
Thanks for taking the time. Would something like this be a
useful addition to the adminpack extension ?
Best,
Karsten
--
GPG 40BE 5B0E C98E 1713 AFA6 5BC0 3BEA AC80 7D4F C89B