Обсуждение: Moved postgres, now won't start

Поиск
Список
Период
Сортировка

Moved postgres, now won't start

От
Madison Kelly
Дата:
Hi all,

   I've created a small 2-node (Debian Etch, PgSQL8.1) cluster using a
(shared) DRBD8 partition formatted as ext3 running in Primary/Secondary
mode.

   I shut down postgresql-8.1, moved '/etc/postgresql' and
'/etc/postgres-commin' to '/ha/etc' (where '/ha' is the DRBD partitions
mount point). Then I created symlinks to the directories under '/ha' and
then restarted PostgreSQL. Everything *seemed* okay, until I tried to
connect to a database (ie: 'template1' as 'postgres'). Then I get the error:

$ psql template1
psql: FATAL:  could not open file "global/pg_database": No such file or
directory

   When I tried connecting to another DB as a user with a (md5) password
it recognizes if the password is right or not. Also, the file:

# cat /var/lib/postgresql/8.1/main/global/pg_database
"postgres" 10793 1663 499 499
"template1" 1 1663 499 499
"template0" 10792 1663 499 499

   Exists, and is readable as you can see.

   Any idea what's wrong? Does it not like that '/var/lib/postgres ->
'/ha/var/lib/postgres'?

   Thanks!

Madison

Re: Moved postgres, now won't start

От
Tom Lane
Дата:
Madison Kelly <linux@alteeve.com> writes:
>    I've created a small 2-node (Debian Etch, PgSQL8.1) cluster using a
> (shared) DRBD8 partition formatted as ext3 running in Primary/Secondary
> mode.

>    I shut down postgresql-8.1, moved '/etc/postgresql' and
> '/etc/postgres-commin' to '/ha/etc' (where '/ha' is the DRBD partitions
> mount point). Then I created symlinks to the directories under '/ha' and
> then restarted PostgreSQL. Everything *seemed* okay, until I tried to
> connect to a database (ie: 'template1' as 'postgres'). Then I get the error:

> $ psql template1
> psql: FATAL:  could not open file "global/pg_database": No such file or
> directory

I think that's the first actual file access that happens during the
connect sequence (everything before that is done with in-memory caches
in the postmaster).  So what I'm wondering is whether you *really* shut
down and restarted the postmaster, or whether you are trying to connect
to the same old postmaster process that has now had all its files
deleted out from under it.

            regards, tom lane

Re: Moved postgres, now won't start

От
Zoltan Boszormenyi
Дата:
Hi,

Madison Kelly írta:
> Hi all,
>
>   I've created a small 2-node (Debian Etch, PgSQL8.1) cluster using a
> (shared) DRBD8 partition formatted as ext3 running in
> Primary/Secondary mode.
>
>   I shut down postgresql-8.1, moved '/etc/postgresql' and
> '/etc/postgres-commin' to '/ha/etc' (where '/ha' is the DRBD
> partitions mount point). Then I created symlinks to the directories
> under '/ha' and then restarted PostgreSQL. Everything *seemed* okay,
> until I tried to connect to a database (ie: 'template1' as
> 'postgres'). Then I get the error:
>
> $ psql template1
> psql: FATAL:  could not open file "global/pg_database": No such file
> or directory
>
>   When I tried connecting to another DB as a user with a (md5)
> password it recognizes if the password is right or not. Also, the file:
>
> # cat /var/lib/postgresql/8.1/main/global/pg_database
> "postgres" 10793 1663 499 499
> "template1" 1 1663 499 499
> "template0" 10792 1663 499 499
>
>   Exists, and is readable as you can see.
>
>   Any idea what's wrong? Does it not like that '/var/lib/postgres ->
> '/ha/var/lib/postgres'?
>
>   Thanks!
>
> Madison

Do you use SELinux?
Look for "avc denied" messages in the logs to see if it's the case.

--
----------------------------------
Zoltán Böszörményi
Cybertec Geschwinde & Schönig GmbH
http://www.postgresql.at/



Re: Moved postgres, now won't start

От
Madison Kelly
Дата:
Tom Lane wrote:
> I think that's the first actual file access that happens during the
> connect sequence (everything before that is done with in-memory caches
> in the postmaster).  So what I'm wondering is whether you *really* shut
> down and restarted the postmaster, or whether you are trying to connect
> to the same old postmaster process that has now had all its files
> deleted out from under it.
>
>             regards, tom lane

Thank you for your reply!

Before the move;

# /etc/init.d/postgresql-8.1 status
Version Cluster   Port Status Owner    Data directory
   Log file
8.1     main      5432 online postgres /var/lib/postgresql/8.1/main
   /var/log/postgresql/postgresql-8.1-main.log
# /etc/init.d/postgresql-8.1 stop
Stopping PostgreSQL 8.1 database server: main.
nicole:/etc/postgresql/8.1/main# /etc/init.d/postgresql-8.1 status
Version Cluster   Port Status Owner    Data directory
   Log file
8.1     main      5432 down   postgres /var/lib/postgresql/8.1/main
   /var/log/postgresql/postgresql-8.1-main.log

I hope that doesn't get too mangled. Unless I am misunderstanding
"stop", then I think it was stopped. I made the move/symlinks mentioned
in my first post, then restarted.

For double certainty, I switched to the slave node after shutting down
postgres on the master node and doubled checked that it was still 'down'
as well.

Madison

Re: Moved postgres, now won't start

От
Madison Kelly
Дата:
Zoltan Boszormenyi wrote:
> Do you use SELinux?
> Look for "avc denied" messages in the logs to see if it's the case.

   No, I don't (unless I missed it and Debian Etch uses it by default
now). To be sure, I checked the log files and only say this:

2007-07-16 13:58:03 EDT LOG:  incomplete startup packet
2007-07-16 13:58:04 EDT LOG:  could not open temporary statistics file
"global/pgstat.tmp": No such file or directory
2007-07-16 13:59:03 EDT FATAL:  could not open file
"global/pg_database": No such file or directory
2007-07-16 13:59:04 EDT LOG:  could not open temporary statistics file
"global/pgstat.tmp": No such file or directory
2007-07-16 14:00:03 EDT FATAL:  could not open file
"global/pg_database": No such file or directory

   Over and over again. I tried shutting down postgresql again and got
this at the shell:

# /etc/init.d/postgresql-8.1 stop
Stopping PostgreSQL 8.1 database server: main* pg_ctl: postmaster does
not shut down
(does not shutdown gracefully, now stopping immediately)pg_ctl: could
not send stop signal (PID: 19958): No such process
Insecure dependency in kill while running with -T switch at
/usr/bin/pg_ctlcluster line 370.
(does not shutdown, killing the process)
  failed!

   And this in the logs:

2007-07-16 14:28:00 EDT LOG:  received fast shutdown request
2007-07-16 14:28:00 EDT LOG:  shutting down
2007-07-16 14:28:00 EDT PANIC:  could not open control file
"global/pg_control": No such file or directory
2007-07-16 14:28:00 EDT LOG:  background writer process (PID 19960) was
terminated by signal 6
2007-07-16 14:28:00 EDT LOG:  terminating any other active server processes
2007-07-16 14:28:00 EDT LOG:  all server processes terminated;
reinitializing
2007-07-16 14:28:00 EDT LOG:  could not open file "postmaster.pid": No
such file or directory
2007-07-16 14:28:00 EDT PANIC:  could not open control file
"global/pg_control": No such file or directory
2007-07-16 14:28:00 EDT LOG:  could not open temporary statistics file
"global/pgstat.tmp": No such file or directory


   Lastly, to be very sure, I tried grep'ing for that string with no
results:

nicole:/var/log# grep "avc denied" * -Rni
nicole:/var/log#

   Thanks for the reply!

Madison

Re: Moved postgres, now won't start

От
Madison Kelly
Дата:
Tom Lane wrote:
> Madison Kelly <linux@alteeve.com> writes:
>>    I've created a small 2-node (Debian Etch, PgSQL8.1) cluster using a
>> (shared) DRBD8 partition formatted as ext3 running in Primary/Secondary
>> mode.
>
>>    I shut down postgresql-8.1, moved '/etc/postgresql' and
>> '/etc/postgres-commin' to '/ha/etc' (where '/ha' is the DRBD partitions
>> mount point). Then I created symlinks to the directories under '/ha' and
>> then restarted PostgreSQL. Everything *seemed* okay, until I tried to
>> connect to a database (ie: 'template1' as 'postgres'). Then I get the error:
>
>> $ psql template1
>> psql: FATAL:  could not open file "global/pg_database": No such file or
>> directory
>
> I think that's the first actual file access that happens during the
> connect sequence (everything before that is done with in-memory caches
> in the postmaster).  So what I'm wondering is whether you *really* shut
> down and restarted the postmaster, or whether you are trying to connect
> to the same old postmaster process that has now had all its files
> deleted out from under it.

To test your idea, I rebooted both cluster nodes and it works now.

How could I have done this without requiring a reboot? Is there a way to
tell postgres to create an entirely new connection?

Thanks!!

Madison

Re: Moved postgres, now won't start

От
Tom Lane
Дата:
Madison Kelly <linux@alteeve.com> writes:
>    Over and over again. I tried shutting down postgresql again and got
> this at the shell:

> # /etc/init.d/postgresql-8.1 stop
> Stopping PostgreSQL 8.1 database server: main* pg_ctl: postmaster does
> not shut down
> (does not shutdown gracefully, now stopping immediately)pg_ctl: could
> not send stop signal (PID: 19958): No such process
> Insecure dependency in kill while running with -T switch at
> /usr/bin/pg_ctlcluster line 370.
> (does not shutdown, killing the process)
>   failed!

>    And this in the logs:

> 2007-07-16 14:28:00 EDT LOG:  received fast shutdown request
> 2007-07-16 14:28:00 EDT LOG:  shutting down
> 2007-07-16 14:28:00 EDT PANIC:  could not open control file
> "global/pg_control": No such file or directory
> 2007-07-16 14:28:00 EDT LOG:  background writer process (PID 19960) was
> terminated by signal 6
> 2007-07-16 14:28:00 EDT LOG:  terminating any other active server processes
> 2007-07-16 14:28:00 EDT LOG:  all server processes terminated;
> reinitializing
> 2007-07-16 14:28:00 EDT LOG:  could not open file "postmaster.pid": No
> such file or directory
> 2007-07-16 14:28:00 EDT PANIC:  could not open control file
> "global/pg_control": No such file or directory
> 2007-07-16 14:28:00 EDT LOG:  could not open temporary statistics file
> "global/pgstat.tmp": No such file or directory

I think this proves my theory --- that all looks like leftover processes
trying to work in an installation that isn't there anymore.  (Except I
have no idea what the "insecure dependency" bit is about.)

What I suspect happened is that you moved the directories before you
actually shut down the old postmaster, and then the initscript's "stop"
command would have failed because it couldn't find the postmaster.pid file.

You could get rid of the old postmaster by doing "ps auxww | grep post"
to determine its PID and then "kill -QUIT postmaster_pid".  The real
problem you're likely to have is that if you moved the directories while
anything was happening, you'll have an inconsistent snapshot of the
database files, probably meaning database corruption.  There isn't
anything much you can do about that at this stage (although REINDEXing
your more active tables might not be a bad idea, once you've got the
thing talking to you again).  I hope you have a reasonably recent backup
to resort to, in case it emerges that things are hopelessly messed up.

            regards, tom lane

Re: Moved postgres, now won't start

От
Alvaro Herrera
Дата:
Tom Lane wrote:

> I think this proves my theory --- that all looks like leftover processes
> trying to work in an installation that isn't there anymore.  (Except I
> have no idea what the "insecure dependency" bit is about.)

"Insecure dependency" is about Perl tainted mode (which pg_ctlcluster is
written in).

--
Alvaro Herrera                               http://www.PlanetPostgreSQL.org/
Tulio: oh, para qué servirá este boton, Juan Carlos?
Policarpo: No, aléjense, no toquen la consola!
Juan Carlos: Lo apretaré una y otra vez.