Обсуждение: what happend to my database

Поиск
Список
Период
Сортировка

what happend to my database

От
"Medi Montaseri"
Дата:
Hi,

I am faced with a database disapperance and seeking some explanations outside of gremlins.
I had a database running at

cat /etc/sysconfig/pgsl/postmaster
PGDATA=/qmsvol/pg_8.1.9/data
PGLOG=/var/log/pgsql/pgstartup.log

Where /qmsvol is an iSCSI block device
A couple of days ago, my server was rebooted and by the time I got to it my database was deleted, gone, zapped, not there any more.

I looked at my pgstartup.log where I see the following....

postmaster cannot access the server configuration file "/qmsvol/pg_8.1.9/postgresql.conf": Permission denied
over 17 times and then following by...
The database cluster will be initialized with locale en_US.UTF-8.

I think the following happend...
Since my PGDATA was on an iSCSI device, by the time /etc/rc3.d/S64postgresql was executed, the device below it was not available.....question...why the error says "permission denied" vs "file not found".  In the meantime, pg_ctl kept trying and finally concluded that the data directory is blank, and hence this must be a out-of-box case and he is good to initdb the PGDATA and as it called initdb to do the job... the iSCSI volume below it came online and by then the bomb had already been dropped.

Now I need to find some facts to support this...
Where else can I look for forensics

Thanks
Medi




Re: what happend to my database

От
Steve Holdoway
Дата:
On Wed, 28 May 2008 18:37:06 -0700
"Medi Montaseri" <montaseri@gmail.com> wrote:

> Hi,
>
> I am faced with a database disapperance and seeking some explanations
> outside of gremlins.
> I had a database running at
>
> cat /etc/sysconfig/pgsl/postmaster
> PGDATA=/qmsvol/pg_8.1.9/data
> PGLOG=/var/log/pgsql/pgstartup.log
>
> Where /qmsvol is an iSCSI block device
> A couple of days ago, my server was rebooted and by the time I got to it my
> database was deleted, gone, zapped, not there any more.
>
> I looked at my pgstartup.log where I see the following....
>
> postmaster cannot access the server configuration file
> "/qmsvol/pg_8.1.9/postgresql.conf": Permission denied
> over 17 times and then following by...
> The database cluster will be initialized with locale en_US.UTF-8.
>
> I think the following happend...
> Since my PGDATA was on an iSCSI device, by the time /etc/rc3.d/S64postgresql
> was executed, the device below it was not available.....question...why the
> error says "permission denied" vs "file not found".  In the meantime, pg_ctl
> kept trying and finally concluded that the data directory is blank, and
> hence this must be a out-of-box case and he is good to initdb the PGDATA and
> as it called initdb to do the job... the iSCSI volume below it came online
> and by then the bomb had already been dropped.
>
> Now I need to find some facts to support this...
When you mount a partition on linux, it does this by overlaying it's root directory with the existing one on the parent
volume.Ownerships and permissions are also replaced. I expect that the /qmsvol directory will be owned by root, with
fairlyrestrictive access rights. This will not be the case the root ( . ) directory on the external device, which will
bepostgres-friendly. 
> Where else can I look for forensics
I don't think you need any more! To fix this, I'd do 2 things. First, start postgres much later in the boot sequence:
  cd /etc/rc3.d ; mv S64postgresql S99postgresql
( and the same in rc5.d if you're using a gui at all ).

and do the converse to whichever script mounts your external devices. Also add in a test that the device is mounted in
thestart) block of /etc/init.d/postgresql... something simple like 

    while [ ! -d /qmsvol/pg_8.1.9/data ]
    do
        sleep 5
    done

( well, something that can't hang forever would be preferable! ).

>
> Thanks
> Medi
>

hth,

Steve

--
Steve Holdoway <steve.holdoway@firetrust.com>

Re: what happend to my database

От
Tom Lane
Дата:
Steve Holdoway <steve.holdoway@firetrust.com> writes:
> "Medi Montaseri" <montaseri@gmail.com> wrote:
>> I think the following happend...
>> Since my PGDATA was on an iSCSI device, by the time /etc/rc3.d/S64postgresql
>> was executed, the device below it was not available.....question...why the
>> error says "permission denied" vs "file not found".  In the meantime, pg_ctl
>> kept trying and finally concluded that the data directory is blank, and
>> hence this must be a out-of-box case and he is good to initdb the PGDATA and
>> as it called initdb to do the job... the iSCSI volume below it came online
>> and by then the bomb had already been dropped.
>>
>> Now I need to find some facts to support this...
> When you mount a partition on linux, it does this by overlaying it's root directory with the existing one on the
parentvolume. Ownerships and permissions are also replaced. I expect that the /qmsvol directory will be owned by root,
withfairly restrictive access rights. This will not be the case the root ( . ) directory on the external device, which
willbe postgres-friendly. 
>> Where else can I look for forensics
> I don't think you need any more! To fix this, I'd do 2 things. First, start postgres much later in the boot sequence:
>   cd /etc/rc3.d ; mv S64postgresql S99postgresql
> ( and the same in rc5.d if you're using a gui at all ).

The other thing to do is remove the auto-initdb behavior in your startup
script.  We've done that in recent releases because of prior reports of
this type of problem.  The OP's script is evidently still old-school,
though.

            regards, tom lane

Re: what happend to my database

От
"Medi Montaseri"
Дата:
Yes, this type of presumptuous behavior to wipe out a production database based on a few checks is too risky...

Behavior one:
First out-of-box time, pg_ctl does not find any database files, it tells the user that "sorry I did not find any database to start....see initdb....
Result: we have a semi-unhappy user/admin that says... what is initdb

Behavior two:
In order to enhance the out-of-box experience, we have wiped out a production environment, leading to many unhappy staff and customers....

PG developers...I am not impressed at all...

Medi



On Wed, May 28, 2008 at 7:51 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Steve Holdoway <steve.holdoway@firetrust.com> writes:
> "Medi Montaseri" <montaseri@gmail.com> wrote:
>> I think the following happend...
>> Since my PGDATA was on an iSCSI device, by the time /etc/rc3.d/S64postgresql
>> was executed, the device below it was not available.....question...why the
>> error says "permission denied" vs "file not found".  In the meantime, pg_ctl
>> kept trying and finally concluded that the data directory is blank, and
>> hence this must be a out-of-box case and he is good to initdb the PGDATA and
>> as it called initdb to do the job... the iSCSI volume below it came online
>> and by then the bomb had already been dropped.
>>
>> Now I need to find some facts to support this...
> When you mount a partition on linux, it does this by overlaying it's root directory with the existing one on the parent volume. Ownerships and permissions are also replaced. I expect that the /qmsvol directory will be owned by root, with fairly restrictive access rights. This will not be the case the root ( . ) directory on the external device, which will be postgres-friendly.
>> Where else can I look for forensics
> I don't think you need any more! To fix this, I'd do 2 things. First, start postgres much later in the boot sequence:
>   cd /etc/rc3.d ; mv S64postgresql S99postgresql
> ( and the same in rc5.d if you're using a gui at all ).

The other thing to do is remove the auto-initdb behavior in your startup
script.  We've done that in recent releases because of prior reports of
this type of problem.  The OP's script is evidently still old-school,
though.

                       regards, tom lane

--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

Re: what happend to my database

От
"Scott Marlowe"
Дата:
On Wed, May 28, 2008 at 11:14 PM, Medi Montaseri <montaseri@gmail.com> wrote:
> Yes, this type of presumptuous behavior to wipe out a production database
> based on a few checks is too risky...
>
> Behavior one:
> First out-of-box time, pg_ctl does not find any database files, it tells the
> user that "sorry I did not find any database to start....see initdb....
> Result: we have a semi-unhappy user/admin that says... what is initdb
>
> Behavior two:
> In order to enhance the out-of-box experience, we have wiped out a
> production environment, leading to many unhappy staff and customers....
>
> PG developers...I am not impressed at all...

In defense of the pg developers, the behaviour you describe was
removed long ago BECAUSE of the issues you mention.

The fact is that pg developers can't police every distro out there to
make sure they've removed such hinky behaviour from their startup
scripts.  So, the persons to NOT be impressed with at all are the
folks who maintain your OS's postgresql packaging, not the pg
developers.

Course, you can always switch to MySQL, or Oracle, or MSSQL where
nothing like that ever happens.  uh huh.

Re: what happend to my database

От
"Medi Montaseri"
Дата:


On Tue, Jun 10, 2008 at 12:49 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
On Wed, May 28, 2008 at 11:14 PM, Medi Montaseri <montaseri@gmail.com> wrote:
> Yes, this type of presumptuous behavior to wipe out a production database
> based on a few checks is too risky...
>
> Behavior one:
> First out-of-box time, pg_ctl does not find any database files, it tells the
> user that "sorry I did not find any database to start....see initdb....
> Result: we have a semi-unhappy user/admin that says... what is initdb
>
> Behavior two:
> In order to enhance the out-of-box experience, we have wiped out a
> production environment, leading to many unhappy staff and customers....
>
> PG developers...I am not impressed at all...

In defense of the pg developers, the behaviour you describe was
removed long ago BECAUSE of the issues you mention.

The fact is that pg developers can't police every distro out there to
make sure they've removed such hinky behaviour from their startup
scripts.  So, the persons to NOT be impressed with at all are the
folks who maintain your OS's postgresql packaging, not the pg
developers.
 
stand corrected



Course, you can always switch to MySQL, or Oracle, or MSSQL where
nothing like that ever happens.  uh huh.

Never...I rather stay and fix it...than run away to a different country

Thanks

Re: what happend to my database

От
"Scott Marlowe"
Дата:
On Tue, Jun 10, 2008 at 1:57 PM, Medi Montaseri <montaseri@gmail.com> wrote:
>
>
> On Tue, Jun 10, 2008 at 12:49 PM, Scott Marlowe <scott.marlowe@gmail.com>
> wrote:
>>
>> On Wed, May 28, 2008 at 11:14 PM, Medi Montaseri <montaseri@gmail.com>
>> wrote:
>> > Yes, this type of presumptuous behavior to wipe out a production
>> > database
>> > based on a few checks is too risky...
>> >
>> > Behavior one:
>> > First out-of-box time, pg_ctl does not find any database files, it tells
>> > the
>> > user that "sorry I did not find any database to start....see initdb....
>> > Result: we have a semi-unhappy user/admin that says... what is initdb
>> >
>> > Behavior two:
>> > In order to enhance the out-of-box experience, we have wiped out a
>> > production environment, leading to many unhappy staff and customers....
>> >
>> > PG developers...I am not impressed at all...
>>
>> In defense of the pg developers, the behaviour you describe was
>> removed long ago BECAUSE of the issues you mention.
>>
>> The fact is that pg developers can't police every distro out there to
>> make sure they've removed such hinky behaviour from their startup
>> scripts.  So, the persons to NOT be impressed with at all are the
>> folks who maintain your OS's postgresql packaging, not the pg
>> developers.
>
>
> stand corrected
>
>>
>>
>> Course, you can always switch to MySQL, or Oracle, or MSSQL where
>> nothing like that ever happens.  uh huh.
>
> Never...I rather stay and fix it...than run away to a different country

Me too.  Sorry I really shoulda tossed a smily on the end there... :)
There's one now...