Re: emergency outage requiring database restart

Поиск
Список
Период
Сортировка
От Merlin Moncure
Тема Re: emergency outage requiring database restart
Дата
Msg-id CAHyXU0z7O7TqXOk9mj7A9KxT6r5aczjCBZQbvgP7vsrjfDeH6g@mail.gmail.com
обсуждение исходный текст
Ответ на Re: emergency outage requiring database restart  (Merlin Moncure <mmoncure@gmail.com>)
Ответы Re: emergency outage requiring database restart  (Merlin Moncure <mmoncure@gmail.com>)
Список pgsql-hackers
On Wed, Oct 26, 2016 at 11:35 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
> On Tue, Oct 25, 2016 at 3:08 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
>> Confirmation of problem re-occurrence will come in a few days.    I'm
>> much more likely to believe 6+sigma occurrence (storage, freak bug,
>> etc) should it prove the problem goes away post rebuild.
>
> ok, no major reported outage yet, but just got:
>
> 2016-10-26 11:27:55 CDT [postgres@castaging]: ERROR:  invalid page in
> block 12 of relation base/203883/1259


*) still on 9.5.2 :(  not in control of the package update process

*) getting more of these:
[root@rcdylsdbmpf001 data]# cat
/var/lib/pgsql/9.5/data_checksum/pg_log/postgresql-26.log  | grep
"invalid page"
2016-10-26 11:26:42 CDT [postgres@castaging]: ERROR:  invalid page in
block 10 of relation base/203883/1259
2016-10-26 11:27:55 CDT [postgres@castaging]: ERROR:  invalid page in
block 12 of relation base/203883/1259
2016-10-26 12:16:44 CDT [postgres@castaging]: ERROR:  invalid page in
block 13 of relation base/203883/1259
2016-10-26 12:18:58 CDT [postgres@castaging]: ERROR:  invalid page in
block 15 of relation base/203883/1259
2016-10-26 12:19:12 CDT [postgres@castaging]: ERROR:  invalid page in
block 7 of relation base/203883/2662

castaging=# select relname from pg_class where oid in(1259, 2662);     relname
────────────────────pg_class_oid_indexpg_class

*) only one database in the cluster is damaged.   I do not suspect
storage at this point

*) Currently I can execute pg_dump with no problems.   So far impact
is not severe but soon I have to do dump/restore

*) age(xmin) from pg_class gives all reasonable values (approx 757k)
that aren't 2147483647

*) All cases of statement executing getting this error in a pl/pgsql
routine that does the following:
BEGIN
... <several postgres_fdw querying same 1 record 1 column tabel>
 DROP TABLE IF EXISTS foo; CREATE TEMP TABLE foo ON COMMIT DROP AS ... SELECT <sqsh function push sql script to sql
sever>... INTO Result; 
 DROP TABLE IF EXISTS bar; CREATE TEMP TABLE bar ON COMMIT DROP AS ...
PERFORM  <sqsh function removing script>;

EXCEPTION WHEN OTHERS THEN    PERFORM  <sqsh function removing script>;    RAISE;
END;

*) page verification miss is numerically close
[root@rcdylsdbmpf001 ~]# cat
/var/lib/pgsql/9.5/data/pg_log/postgresql-26.log | grep "page
verification"
2016-10-26 11:26:42 CDT [postgres@castaging]: WARNING:  page
verification failed, calculated checksum 37251 but expected 37244
2016-10-26 11:27:55 CDT [postgres@castaging]: WARNING:  page
verification failed, calculated checksum 37249 but expected 37244
2016-10-26 12:16:44 CDT [postgres@castaging]: WARNING:  page
verification failed, calculated checksum 44363 but expected 44364
2016-10-26 12:18:58 CDT [postgres@castaging]: WARNING:  page
verification failed, calculated checksum 49525 but expected 49539
2016-10-26 12:19:12 CDT [postgres@castaging]: WARNING:  page
verification failed, calculated checksum 37345 but expected 37340

*) sample log leading into the first error (all queries are logged)
2016-10-26 11:26:40 CDT [postgres@castaging]: LOG:  execute <unnamed>:
SELECT PushMarketSample($1::TEXT) AS published
2016-10-26 11:26:40 CDT [postgres@castaging]: DETAIL:  parameters: $1 = '8840'
2016-10-26 11:26:40 CDT [postgres@mpf2]: LOG:  statement: START
TRANSACTION ISOLATION LEVEL REPEATABLE READ
2016-10-26 11:26:40 CDT [postgres@mpf2]: LOG:  statement: SAVEPOINT s2
2016-10-26 11:26:40 CDT [postgres@mpf2]: LOG:  execute <unnamed>:
DECLARE c1 CURSOR FOR       SELECT period FROM public.dataentryperiod
2016-10-26 11:26:40 CDT [postgres@mpf2]: LOG:  statement: FETCH 100 FROM c1
2016-10-26 11:26:41 CDT [postgres@mpf2]: LOG:  statement: CLOSE c1
2016-10-26 11:26:41 CDT [postgres@mpf2]: LOG:  execute <unnamed>:
DECLARE c2 CURSOR FOR       SELECT period FROM public.dataentryperiod
2016-10-26 11:26:41 CDT [postgres@mpf2]: LOG:  statement: FETCH 100 FROM c2
2016-10-26 11:26:41 CDT [postgres@mpf2]: LOG:  statement: CLOSE c2
2016-10-26 11:26:41 CDT [rms@mpf2]: ERROR:  relation "tblagent" does
not exist at character 15
2016-10-26 11:26:41 CDT [rms@mpf2]: STATEMENT:  select * from tblagent
2016-10-26 11:26:41 CDT [rms@mpf2]: LOG:  execute <unnamed>: SELECT
CURRENT_SCHEMA()
2016-10-26 11:26:42 CDT [postgres@mpf2]: LOG:  execute <unnamed>:
DECLARE c3 CURSOR FOR       SELECT period FROM public.dataentryperiod
2016-10-26 11:26:42 CDT [postgres@mpf2]: LOG:  statement: FETCH 100 FROM c3
2016-10-26 11:26:42 CDT [postgres@mpf2]: LOG:  statement: CLOSE c3
2016-10-26 11:26:42 CDT [postgres@mpf2]: LOG:  execute <unnamed>:
DECLARE c4 CURSOR FOR       SELECT period FROM public.dataentryperiod
2016-10-26 11:26:42 CDT [postgres@mpf2]: LOG:  statement: FETCH 100 FROM c4
2016-10-26 11:26:42 CDT [postgres@mpf2]: LOG:  statement: CLOSE c4
2016-10-26 11:26:42 CDT [postgres@castaging]: WARNING:  page
verification failed, calculated checksum 37251 but expected 37244
2016-10-26 11:26:42 CDT [postgres@castaging]: CONTEXT:  SQL statement
"CREATE TEMP TABLE ApartmentPublishBySubmkt ON COMMIT DROP AS         SELECT           ApartmentId,
TotalUnits,          SubmarketId,           SUM(Rent * UnitCount) / NULLIF(SUM(UnitCount), 0) AS Rent,
SUM(Occupancy* UnitCount) / NULLIF(SUM(UnitCount), 0) AS Occupancy         FROM AptSample         GROUP BY 1, 2, 3"
 PL/pgSQL function pushmarketsample(text,date,integer) line 103 
at SQL statement
2016-10-26 11:26:42 CDT [postgres@mpf2]: LOG:  statement: ROLLBACK TO
SAVEPOINT s2; RELEASE SAVEPOINT s2
2016-10-26 11:26:42 CDT [postgres@castaging]: ERROR:  invalid page in
block 10 of relation base/203883/1259
2016-10-26 11:26:42 CDT [postgres@castaging]: CONTEXT:  SQL statement
"CREATE TEMP TABLE ApartmentPublishBySubmkt ON COMMIT DROP AS         SELECT           ApartmentId,
TotalUnits,          SubmarketId,           SUM(Rent * UnitCount) / NULLIF(SUM(UnitCount), 0) AS Rent, 


merlin



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: Improving RLS planning
Следующее
От: Josh Berkus
Дата:
Сообщение: Re: Default setting for autovacuum_freeze_max_age