Обсуждение: BUG #5718: Cannot start postgres (FATAL: invalid cache id: 19)

Поиск
Список
Период
Сортировка

BUG #5718: Cannot start postgres (FATAL: invalid cache id: 19)

От
"Aleksandr Dushein"
Дата:
The following bug has been logged online:

Bug reference:      5718
Logged by:          Aleksandr Dushein
Email address:      adushein@rutube.ru
PostgreSQL version: 9.0.1
Operating system:   FreeBSD 8.1-RELEASE #0: Mon Jul 19 02:36:49 UTC 2010
root@mason.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC  amd64
Description:        Cannot start postgres (FATAL:  invalid cache id: 19)
Details:

Hello. I was upgraded my database from 8.4.4 то 9.0.1 one week ago by
pg_dump/pg_restore. Yesterday database stop responding with message ERROR:
database is not accepting commands to avoid wraparound data loss
I trying to restart in single mode and fail with the same error.


%postgres --single -D /data/pgdata -d 5
DEBUG:  TZ "Africa/Algiers" scores 0: at 1263427200 2010-01-14 01:00:00 std
versus 2010-01-14 03:00:00 std
...skipped...
DEBUG:  TZ "UCT" scores 0: at 1263427200 2010-01-14 00:00:00 std versus
2010-01-14 03:00:00 std
DEBUG:  TZ "UTC" scores 0: at 1263427200 2010-01-14 00:00:00 std versus
2010-01-14 03:00:00 std
DEBUG:  TZ "Universal" scores 0: at 1263427200 2010-01-14 00:00:00 std
versus 2010-01-14 03:00:00 std
DEBUG:  TZ "W-SU" gets max score 5200
DEBUG:  TZ "Zulu" scores 0: at 1263427200 2010-01-14 00:00:00 std versus
2010-01-14 03:00:00 std
DEBUG:  Reject TZ "Mideast/Riyadh87": uses leap seconds
DEBUG:  Reject TZ "Mideast/Riyadh88": uses leap seconds
DEBUG:  Reject TZ "Mideast/Riyadh89": uses leap seconds
DEBUG:  TZ "posixrules" scores 0: at 1263427200 2010-01-13 19:00:00 std
versus 2010-01-14 03:00:00 std
DEBUG:  invoking IpcMemoryCreate(size=1103020032)
DEBUG:  removing file "pg_notify/0000"
DEBUG:  InitPostgres
DEBUG:  my backend id is 1
LOG:  database system was shut down at 2010-10-19 11:25:38 MSD
DEBUG:  checkpoint record is at 433/C474D3C8
DEBUG:  redo record is at 433/C474D3C8; shutdown TRUE
DEBUG:  next transaction ID: 0/2146484301; next OID: 106178
DEBUG:  next MultiXactId: 1; next MultiXactOffset: 0
DEBUG:  oldest unfrozen transaction ID: 654, in database 16385
DEBUG:  transaction ID wrap limit is 2147484301, limited by database with
OID 16385
FATAL:  invalid cache id: 19
DEBUG:  shmem_exit(1): 8 callbacks to make
DEBUG:  proc_exit(1): 3 callbacks to make
DEBUG:  exit(1)
DEBUG:  shmem_exit(-1): 0 callbacks to make
DEBUG:  proc_exit(-1): 0 callbacks to make


---------config (only uncommented lines)-------------
%cat /data/pgdata/postgresql.conf | grep -v '^[         ]*#' | grep -v '^$'
listen_addresses='*'
max_connections = 100                   # (change requires restart)
shared_buffers = 1024MB                 # min 128kB
temp_buffers = 256MB
work_mem = 128MB
maintenance_work_mem = 512MB
fsync = off
wal_buffers = 512kB
log_destination = 'syslog'
silent_mode = on
update_process_title = off
datestyle = 'iso, mdy'
lc_messages = 'C'                       # locale for system error message
lc_monetary = 'en_US.UTF-8'                     # locale for monetary
formatting
lc_numeric = 'en_US.UTF-8'                      # locale for number
formatting
lc_time = 'en_US.UTF-8'                         # locale for time
formatting
default_text_search_config = 'pg_catalog.english'

---------sysctl (only uncommented lines)-------------
analyzer-db-1# cat /etc/sysctl.conf | grep -v '^[       ]*#' | grep -v '^$'
kern.ipc.shmall=1048576
kern.ipc.shmmax=4294967296
kern.ipc.semmap=256
kern.ipc.shm_use_phys=1
kern.ipc.somaxconn=4096
kern.ipc.maxsockbuf=16777216
kern.maxfiles=65536
net.inet.tcp.blackhole=0
net.inet.udp.blackhole=0
net.inet.tcp.delayed_ack=0
net.inet.tcp.sendspace=32768
net.inet.tcp.recvspace=32768
net.local.stream.recvspace=65535
net.local.stream.sendspace=65535
net.inet.tcp.keepidle=300000
net.inet.tcp.keepintvl=30000
net.inet.tcp.msl=10000
kern.ipc.nmbclusters=65536


analyzer-db-1# cat /boot/loader.conf
kern.ipc.semmni=256
kern.ipc.semmns=512
kern.ipc.semmnu=256


---------disk utilization -------------

analyzer-db-1# df -h
Filesystem              Size    Used   Avail Capacity  Mounted on
/dev/mfid0s1a           989M    268M    642M    29%    /
devfs                   1.0K    1.0K      0B   100%    /dev
/dev/mfid0s1d           1.9G     14K    1.8G     0%    /tmp
/dev/mfid0s1f           8.7G    1.2G    6.8G    15%    /usr
/dev/mfid0s1e            15G    101M     14G     1%    /var
/dev/mfid0s2d           483G     54G    391G    12%    /data
/dev/mfid1s1d           541G    228G    269G    46%    /data2
/dev/mfid2s1d           1.1T    639G    356G    64%    /data3
10.1.13.31:/vol/vol5    1.0T    721G    303G    70%    /data/backup
analyzer-db-1# du -sch /data*/pgdata/
 54G    /data/pgdata/
201G    /data2/pgdata/
523G    /data3/pgdata/
778G    total

Re: BUG #5718: Cannot start postgres (FATAL: invalid cache id: 19)

От
Alvaro Herrera
Дата:
Excerpts from Aleksandr Dushein's message of mié oct 20 06:44:09 -0300 2010:

> FATAL:  invalid cache id: 19

Hmm, try removing pg_internal.init from directories under "base/" and
restart single mode.  (You then need to run vacuum on the specified
database to get out of the other problem).

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: BUG #5718: Cannot start postgres (FATAL: invalid cache id: 19)

От
Tom Lane
Дата:
"Aleksandr Dushein" <adushein@rutube.ru> writes:
> Description:        Cannot start postgres (FATAL:  invalid cache id: 19)

> Hello. I was upgraded my database from 8.4.4 то 9.0.1 one week ago by
> pg_dump/pg_restore. Yesterday database stop responding with message ERROR:
> database is not accepting commands to avoid wraparound data loss
> I trying to restart in single mode and fail with the same error.

Ugh.  The "invalid cache id" failure is my fault.  You can fix that by
applying this patch to 9.0:
http://git.postgresql.org/gitweb?p=postgresql.git;a=commitdiff;h=c98cd9bdb618794643b670c8240c0aad213340c7
after which you'll probably need to do a manual vacuum with a standalone
backend before the DB will restart.

However, the bigger question is how you got into this state in the first
place.  Even if you'd had autovacuum turned off, it should have
protected you against a wraparound.  I suspect that pg_upgrade is
failing to preserve datfrozenxid and related values properly.  Do you
by any chance still have the 8.4 database available for examination?
Do you have an idea how many transactions you executed since converting?

Bruce: exactly what is pg_upgrade's scheme for getting the right
frozenxid values into the new catalogs, anyway?

            regards, tom lane

Re: BUG #5718: Cannot start postgres (FATAL: invalid cache id: 19)

От
Alvaro Herrera
Дата:
Excerpts from Tom Lane's message of mié oct 20 13:57:16 -0300 2010:
> "Aleksandr Dushein" <adushein@rutube.ru> writes:
> > Description:        Cannot start postgres (FATAL:  invalid cache id: 19)
>
> > Hello. I was upgraded my database from 8.4.4 то 9.0.1 one week ago by
> > pg_dump/pg_restore. Yesterday database stop responding with message ERROR:
> > database is not accepting commands to avoid wraparound data loss
> > I trying to restart in single mode and fail with the same error.
>
> Ugh.  The "invalid cache id" failure is my fault.  You can fix that by
> applying this patch to 9.0:
> http://git.postgresql.org/gitweb?p=postgresql.git;a=commitdiff;h=c98cd9bdb618794643b670c8240c0aad213340c7
> after which you'll probably need to do a manual vacuum with a standalone
> backend before the DB will restart.
>
> However, the bigger question is how you got into this state in the first
> place.  Even if you'd had autovacuum turned off, it should have
> protected you against a wraparound.  I suspect that pg_upgrade is
> failing to preserve datfrozenxid and related values properly.

This exact symptom was just reported twice in pgsql-es-ayuda.  Upgrading
to 9.0.4 fixed the problem.  One of them used pg_upgrade from a previous
version, the other one did not.  I'm not sure that it's worthwhile to
expore the underlying cause, but it seems clear that pg_upgrade is not
the only possibly guilty party.

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: BUG #5718: Cannot start postgres (FATAL: invalid cache id: 19)

От
Tom Lane
Дата:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> This exact symptom was just reported twice in pgsql-es-ayuda.  Upgrading
> to 9.0.4 fixed the problem.  One of them used pg_upgrade from a previous
> version, the other one did not.  I'm not sure that it's worthwhile to
> expore the underlying cause, but it seems clear that pg_upgrade is not
> the only possibly guilty party.

Yeah, it seems like we are getting into XID-wraparound-approaching
states a lot more easily, or a lot quicker, than I would expect.
The original theory about that, which turned out to be true, was that
pg_upgrade was bollixing relfrozenxid values.  But that makes at least
two reports from people who (claimed they) didn't use pg_upgrade.
So maybe there is something else going on.  I see no way to investigate
on the strength of the available evidence though ...

            regards, tom lane