Обсуждение: db replication and errors

Поиск
Список
Период
Сортировка

db replication and errors

От
Benjamin
Дата:
Hello everyone,

The scene :
Currently, we r using,  Postgresql v 7.3, on RedHat 9
machines.

We provide a hot standby at all times for the Primary
server.

So, we have to maintain a duplicate of  the entire
database.
The problem arises, when the standby boots.

I wanted to know, whenever the standby boots up, what
all files shud it copy frm the Primary.
All files, under the data directory??
Viz., global, base, clog and xlog?

What are those pg_internal.init, pgstat.stat files?
What's the difference between global and base
directories?
If the async feature is used on the primary, shud we
copy on the xlog and clog files onto the backup as
well?

As of now, we shut down postmaster, on the Primary
whenever the standby boots up, and then copy all the above said files,
from the primary to the standby.

Duz this ensure, all data is written onto the disk b4
postmaster shut downs?

Quite a few times, I have encountered errors, like, xlogflush is not
satisfied,
bogus attribute number for <some num , eg. -2>,  catalog is missing,
cache lookup failed.
I am pretty sure, these are related to the copying of files I described
above.

Any links for me to find more abt these files, and these typical error
conditions encountered?
The archive lists, did give me some information abt these errors. But
cudnt make out much.

These and more... one at a time.

Regards


--

Benjamin Jacob.

Disclaimer :
------------------------------------------------------------------------------
If you are not the intended recipient of this transmission to whom it is
addressed, or have received this transmission in error, you are hereby
notified that any dissemination, distribution or copying of this transmission
is strictly prohibited. Please notify us immediately and delete this e-mail
from your system. The sender does not accept liability for any errors or
omissions in the contents of this message which arise as a result of e-mail
transmission, which cannot be guaranteed to be secure or error-free as
information could be intercepted, corrupted, lost, destroyed, arrive late or
incomplete, arrive at wrong address or contain viruses. If verification
is required please request a hard-copy version.  This e-mail contains only the
personal opinions of the sender and does not represent an official
communication from NetYantra of any manner.
------------------------------------------------------------------------------



Re: db replication and errors

От
Richard Huxton
Дата:
Benjamin wrote:
>
> Hello everyone,
>
> The scene :
> Currently, we r using,  Postgresql v 7.3, on RedHat 9
> machines.
>
> We provide a hot standby at all times for the Primary
> server.
>
> So, we have to maintain a duplicate of  the entire
> database.
> The problem arises, when the standby boots.
>
> I wanted to know, whenever the standby boots up, what
> all files shud it copy frm the Primary.
> All files, under the data directory??
> Viz., global, base, clog and xlog?

Yes - all files.

> What are those pg_internal.init, pgstat.stat files?
> What's the difference between global and base
> directories?

The "base" directory is the main data store - you can see what the
numbers refer to by using the oid2name utility in contrib/

> If the async feature is used on the primary, shud we
> copy on the xlog and clog files onto the backup as
> well?

What is the "async feature"?

> As of now, we shut down postmaster, on the Primary
> whenever the standby boots up, and then copy all the above said files,
> from the primary to the standby.
>
> Duz this ensure, all data is written onto the disk b4
> postmaster shut downs?

Provided the postmaster shuts down cleanly, and you've synced to disk
then all should be OK.

> Quite a few times, I have encountered errors, like, xlogflush is not
> satisfied,
> bogus attribute number for <some num , eg. -2>,  catalog is missing,
> cache lookup failed.

One of 4 things could be at fault:
  1. Files aren't being sync'ed to disk
  2. You aren't copying the right files
  3. The versions of PG don't match
  4. The platforms you are running on are different (e.g. Sun-Sparc vs x86)

It might be worth looking at "slony" to run a replication setup, rather
than copying files.

--
   Richard Huxton
   Archonet Ltd

Re: db replication and errors

От
Benjamin
Дата:
Thanx for the pointers, Richard.

>> If the async feature is used on the primary, shud we
>> copy on the xlog and clog files onto the backup as
>> well?
>
>
> What is the "async feature"?

I meant fsync.
I meant to ask, if FSYNC is enabled, is all pending data written onto
the disk?

>
>> As of now, we shut down postmaster, on the Primary
>> whenever the standby boots up, and then copy all the above said files,
>> from the primary to the standby.
>>
>> Duz this ensure, all data is written onto the disk b4
>> postmaster shut downs?
>
>
> Provided the postmaster shuts down cleanly, and you've synced to disk
> then all should be OK.

Wot decides this "sync" n how do i check it?

>> Quite a few times, I have encountered errors, like, xlogflush is not
>> satisfied,
>> bogus attribute number for <some num , eg. -2>,  catalog is missing,
>> cache lookup failed.
>
>
> One of 4 things could be at fault:
>  1. Files aren't being sync'ed to disk
>  2. You aren't copying the right files
>  3. The versions of PG don't match
>  4. The platforms you are running on are different (e.g. Sun-Sparc vs
> x86)

The latter two are not the case, I use Redhat 9 on all the machines,
with PG VERSION 7.3
The former two, yes, I agree, cud be the cause of problems.

I would like to know, where to look on such errors.eg for cache lookup
failure, wot triggers that??how do i get abt tracking down the issue?

>
> It might be worth looking at "slony" to run a replication setup,
> rather than copying files.

Did think of slony previously. But slony has the limitation of not being
able to replicate large objects, rite?
How large are these large objects supposed to be?
Run-time replication is not an issue, as I have other mechanisms for
that, which are part of this server, and they work fine.
The only problem I am facing now, is of the case when the standby is
booting up. I have to ensure an absolutely correct copying of files.

I want to know how do i go about diagnosing problems, if and when they
arise.
I have come across pg_filedump. But cant really make out much frm the
output that pg_filedump produces.

A long way to go... I agree.



--

Benjamin Jacob.

Disclaimer :
------------------------------------------------------------------------------
If you are not the intended recipient of this transmission to whom it is
addressed, or have received this transmission in error, you are hereby
notified that any dissemination, distribution or copying of this transmission
is strictly prohibited. Please notify us immediately and delete this e-mail
from your system. The sender does not accept liability for any errors or
omissions in the contents of this message which arise as a result of e-mail
transmission, which cannot be guaranteed to be secure or error-free as
information could be intercepted, corrupted, lost, destroyed, arrive late or
incomplete, arrive at wrong address or contain viruses. If verification
is required please request a hard-copy version.  This e-mail contains only the
personal opinions of the sender and does not represent an official
communication from NetYantra of any manner.
------------------------------------------------------------------------------



Re: db replication and errors

От
Richard Huxton
Дата:
Benjamin wrote:
> Thanx for the pointers, Richard.
>
>>> If the async feature is used on the primary, shud we
>>> copy on the xlog and clog files onto the backup as
>>> well?
>>
>>
>>
>> What is the "async feature"?
>
>
> I meant fsync.
> I meant to ask, if FSYNC is enabled, is all pending data written onto
> the disk?

Yes. Turn it on if you want your data to survive a power failure.
Oh, and make sure your disks aren't write-caching even when you sync.
Search the list archives for cache and IDE for plenty of discussion.

>>
>>> As of now, we shut down postmaster, on the Primary
>>> whenever the standby boots up, and then copy all the above said files,
>>> from the primary to the standby.
>>>
>>> Duz this ensure, all data is written onto the disk b4
>>> postmaster shut downs?
>>
>> Provided the postmaster shuts down cleanly, and you've synced to disk
>> then all should be OK.
>
>
> Wot decides this "sync" n how do i check it?
>
>>> Quite a few times, I have encountered errors, like, xlogflush is not
>>> satisfied,
>>> bogus attribute number for <some num , eg. -2>,  catalog is missing,
>>> cache lookup failed.
>>
>>
>> One of 4 things could be at fault:
>>  1. Files aren't being sync'ed to disk
>>  2. You aren't copying the right files
>>  3. The versions of PG don't match
>>  4. The platforms you are running on are different (e.g. Sun-Sparc vs
>> x86)
>
>
> The latter two are not the case, I use Redhat 9 on all the machines,
> with PG VERSION 7.3
> The former two, yes, I agree, cud be the cause of problems.
>
> I would like to know, where to look on such errors.eg for cache lookup
> failure, wot triggers that??how do i get abt tracking down the issue?

A cache lookup failure is usually due to the OID of an object changing,
where you drop/recreate a temporary table and a function is still
referring to its old OID.
In your case, I'm not sure what's causing the problem. It could be
you've not copied the table definitions over and you've updated your
schema on the original machine.

>> It might be worth looking at "slony" to run a replication setup,
>> rather than copying files.
>
> Did think of slony previously. But slony has the limitation of not being
> able to replicate large objects, rite?
> How large are these large objects supposed to be?

Um, large as you like. See the manuals for discussion of large object
support. I'm guessing you're not using it.

> Run-time replication is not an issue, as I have other mechanisms for
> that, which are part of this server, and they work fine.
> The only problem I am facing now, is of the case when the standby is
> booting up. I have to ensure an absolutely correct copying of files.

If you've got a replicated version of the database why bother copying
the files?
Thinking about it, why copy the files at all anyway? If the server is
still running why has PG stopped?

> I want to know how do i go about diagnosing problems, if and when they
> arise.
> I have come across pg_filedump. But cant really make out much frm the
> output that pg_filedump produces.

If you have *any* problems, then the file copy didn't work. Bin it and
restore from backup. It's only when you don't have a backup that it's
worse messing with pg_filedump.

--
   Richard Huxton
   Archonet Ltd

Re: db replication and errors

От
Benjamin
Дата:
> Run-time replication is not an issue, as I have other mechanisms for
> that, which are part of this server, and they work fine.
>
>> The only problem I am facing now, is of the case when the standby is
>> booting up. I have to ensure an absolutely correct copying of files.
>
>
> If you've got a replicated version of the database why bother copying
> the files?
> Thinking about it, why copy the files at all anyway? If the server is
> still running why has PG stopped?


Well, the standby has to be a replica. So there would be occasion, say
on a machine crash or something, when the standby reboots, or it is
installed much later than the Primary. So when it is booted, I have to
make sure it is an exact replica of the Primary.

>
>> I want to know how do i go about diagnosing problems, if and when
>> they arise.
>> I have come across pg_filedump. But cant really make out much frm the
>> output that pg_filedump produces.
>
>
> If you have *any* problems, then the file copy didn't work. Bin it and
> restore from backup. It's only when you don't have a backup that it's
> worse messing with pg_filedump.
>
will work on that for sure..

Now, the server as such, and the copying mechanism itself has been sort
of cleaned up, and possible error points in the copying of files are
being logged. Now things seem a lot brighter!!

thanx for the help.

--

Benjamin Jacob.

Disclaimer :
------------------------------------------------------------------------------
If you are not the intended recipient of this transmission to whom it is
addressed, or have received this transmission in error, you are hereby
notified that any dissemination, distribution or copying of this transmission
is strictly prohibited. Please notify us immediately and delete this e-mail
from your system. The sender does not accept liability for any errors or
omissions in the contents of this message which arise as a result of e-mail
transmission, which cannot be guaranteed to be secure or error-free as
information could be intercepted, corrupted, lost, destroyed, arrive late or
incomplete, arrive at wrong address or contain viruses. If verification
is required please request a hard-copy version.  This e-mail contains only the
personal opinions of the sender and does not represent an official
communication from NetYantra of any manner.
------------------------------------------------------------------------------