Обсуждение: Unlinked files in PGDATA/base following unclean shutdown
Our application is running Postgres 7.4, (working on conversion to 8.3 right now). Our testing involves various forms of violence, including shutting off power and kill -9 postmaster. Occasionally we observe a form of database corruption in which one of the files storing a table or index disappears. The logs will contain ERRORs that look like this: could not open relation "some_table_name": No such file or directory When this happens, and I cross-reference the pg_class.oid with the expected file under PGDATA, the file is missing (and does not appear to be in lost+found). I have fsync set to true, and wal_sync_method set to fsync. A few questions about this: 1) Why is this happening? 2) To help investigate this problem, I've written a script to cross-reference pg_class and the files in PGDATA/base. (I know that I should use pg_class.relfilenode instead of pg_class.oid -- I'll fix that.) The question is how to check for consistency in the case of large tables, which are split into multiple segments, (e.g. 123456.1, 123456.2). I.e., how can I find out how many segments there should be? Any chance it's as simple as (pg_class.relpages + SUITABLE_CONSTANT - 1) / SUITABLE_CONSTANT? Jack
Jack Orenstein <jack.orenstein@hds.com> writes: > Our application is running Postgres 7.4, (working on conversion to 8.3 > right now). Our testing involves various forms of violence, including > shutting off power and kill -9 postmaster. Do you have reason to trust either your kernel or your disk drives under such abuse? In general I'd bet on the drives being at fault a lot sooner than any other part of the food chain. Complete disappearance of files that should be there is a fault at the filesystem level or lower, anyway. > The question is how to check for consistency in the case of > large tables, which are split into multiple segments, (e.g. 123456.1, > 123456.2). I.e., how can I find out how many segments there should be? The kernel-defined EOF is the truth, the whole truth, and nothing but the truth. There is no other authority. All segments before the last one should be exactly 1GB, but the last one can be anything up to that. Consult the comments in md.c for more details. (I think 7.4 may treat some corner cases differently from 8.3 anyway.) regards, tom lane
Tom Lane wrote: > Jack Orenstein <jack.orenstein@hds.com> writes: >> The question is how to check for consistency in the case of >> large tables, which are split into multiple segments, (e.g. 123456.1, >> 123456.2). I.e., how can I find out how many segments there should be? > > The kernel-defined EOF is the truth, the whole truth, and nothing but > the truth. There is no other authority. > > All segments before the last one should be exactly 1GB, but the last > one can be anything up to that. Consult the comments in md.c for > more details. (I think 7.4 may treat some corner cases differently > from 8.3 anyway.) > > regards, tom lane > Just to be clear, you mean 2**30 (1,073,741,824) bytes, not 10**9? Jack
Jack Orenstein wrote: > Tom Lane wrote: >> All segments before the last one should be exactly 1GB, but the last >> one can be anything up to that. Consult the comments in md.c for >> more details. (I think 7.4 may treat some corner cases differently >> from 8.3 anyway.) > > Just to be clear, you mean 2**30 (1,073,741,824) bytes, not 10**9? It's RELSEG_SIZE * BLCKSZ, where #define RELSEG_SIZE 131072 #define BLCKSZ 8192 (both from pg_config.h) -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support