Re: Re: "missing chunk number XX for toast value YY in pg_toast ..." after pg_basebackup.

Поиск
Список
Период
Сортировка
От Adrian Klaver
Тема Re: Re: "missing chunk number XX for toast value YY in pg_toast ..." after pg_basebackup.
Дата
Msg-id 56E0AFAD.2000405@aklaver.com
обсуждение исходный текст
Ответ на Re: Re: "missing chunk number XX for toast value YY in pg_toast ..." after pg_basebackup.  (<fredrik@huitfeldt.com>)
Ответы Re: Re: "missing chunk number XX for toast value YY in pg_toast ..." after pg_basebackup.  (<fredrik@huitfeldt.com>)
Список pgsql-general
On 03/09/2016 10:41 AM, fredrik@huitfeldt.com wrote:
> Hi Adrian,
>
> thank you very much for your patience. I apologise for the missing
> information.
>
> On 9 March 2016 16:13:00 +01:00, Adrian Klaver
> <adrian.klaver@aklaver.com> wrote:
>> On 03/09/2016 04:56 AM, fredrik@huitfeldt.com
>> <mailto:fredrik@huitfeldt.com> wrote:
>>
>>     Hi Adrian,
>>
>>     thank you very much for your response.
>>
>>     I ran the "VACUUM ANALYZE" command on the master node.
>>
>>     Regarding log messages.
>>
>>     Here is the contents of the log (excluding
>>     connections/disconnections):
>>
>>
>> Assuming the below is from the replica database.
> the "LOG: recovery was paused" message was indeed from the replica.
>>
>>
>>     2016-02-22 02:30:08 GMT 24616 LOG: recovery has paused
>>
>>
>> So what happened to cause the above?
>
> we automatically pause recovery on the replica before running pg_dump.
> This is in order to make certain that we get a consistent dump of the
> database.

Still muddling through this, but to recap and be clear in my mind:

1) This only started occurring with 9.1.15, but worked in previous
versions of 9.1.
So what was the last version of 9.1 that worked?

2) You seed a replica with pg_basebackup.

3) You set up synchronous streaming replication to the replica.

4) You pause the replication and use pg_dump to dump the replica.

5) At this point the error in the subject has occurred twice since you
switched to 9.1.5

6) Running full vacuum analyze on the master solves the problem.
How is it solved?
In other words do you resume replication after the vacuum, or before?
Then do you redo the pg_dump?
Or do you start over with a new pg_basebackup?

>>
>> I am not seeing anything below that indicates the recovery started again.
> the reason why we do not see a matching "resume" is that the pg_dump
> failed and our error handling was insufficient.
>>
>>     2016-02-22 02:30:08 GMT 24616 HINT: Execute pg_xlog_replay_resume() to
>>     continue.
>>     2016-02-22 02:37:19 GMT 23859 DBNAME ERROR: missing chunk number 0 for
>>     toast value 2747579 in pg_toast_22066
>>     2016-02-22 02:37:19 GMT 23859 DBNAME STATEMENT: COPY public.room_shape
>>     (room_uuid, data) TO stdout;
>>     2016-02-22 02:37:41 GMT 2648 DBNAME LOG: could not receive data from
>>     client: Connection reset by peer
>>     2016-02-22 02:37:41 GMT 2648 DBNAME LOG: unexpected EOF on client
>>     connection
>>
>>
>> What does the log from the master show?
> It doesnt seem to show much. It does have these repeated messages, however:
>
> 2016-02-22 02:12:18 GMT 30908  LOG:  using stale statistics instead of
> current ones because stats collector is not responding
> 2016-02-22 02:13:01 GMT 30908  LOG:  using stale statistics instead of
> current ones because stats collector is not responding
> 2016-02-22 02:13:52 GMT 30908  LOG:  using stale statistics instead of
> current ones because stats collector is not responding
>
> There are lots of these mesages within the timeframe. There seems to be
> a couple of them every 2-4 hours.

This is usually a sign of resource starvation. I see this on an old
machine, at night, when I run some intensive file system backups. I
figured it out by looking at my crontab. The problems such as they are
is that the messages fill up logs and your statistics become, as the
message says, stale for how ever long the collector does not respond.

>
>>
>>
>>     Best regards,
>>     Fredrik Huitfeldt
>>
>>
>>     On 7 March 2016 16:35:29 +01:00, Adrian Klaver
>>     <adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>> wrote:
>>
>>         On 03/06/2016 10:18 PM, fredrik@huitfeldt.com
>>         <mailto:fredrik@huitfeldt.com>
>>         <mailto:fredrik@huitfeldt.com <mailto:fredrik@huitfeldt.com>>
>>         wrote:
>>
>>         HI All,
>>
>>         i would really appreciate any help I can get on this issue.
>>
>>         basically, a pg_basebackup + streaming attach, led to a database
>>         that we
>>         could not read from afterwards.
>>
>>
>>         From original post:
>>
>>         http://www.postgresql.org/message-id/1456919678340.31300.116900@webmail2
>>
>>         "The issue remained until we ran a full vacuum analyze on the
>>         cluster."
>>
>>         Which cluster was that, the master or the slave?
>>
>>         "I have logfiles from the incident, but I cannot see anything
>>         out of
>>         the ordinary (despite having a fair amount of experience
>>         investigating
>>         postgresql logs)."
>>
>>
>>         Can we see the section before and after ERROR?
>>
>>
>>         Beset regards,
>>         Fredrik
>>
>>         PS please advise if this is better posted on another list.
>>
>>
>>
>>         --
>>         Adrian Klaver
>>         adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>
>>         <mailto:adrian.klaver@aklaver.com
>>         <mailto:adrian.klaver@aklaver.com>>
>>
>>
>>
>>
>> --
>> Adrian Klaver
>> adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>
> Best regards,
> Fredrik
>


--
Adrian Klaver
adrian.klaver@aklaver.com


В списке pgsql-general по дате отправления:

Предыдущее
От: Kevin Grittner
Дата:
Сообщение: Re: Fwd: How does PostgreSQL serializable transaction works? (SELECT/UPDATE vs INSERT conflict)
Следующее
От:
Дата:
Сообщение: Re: Email address VERP problems (was RE: Does a call to a language handler provide a context/session, and somewhere to keep session data?