Re: emergency outage requiring database restart

Поиск
Список
Период
Сортировка
От Merlin Moncure
Тема Re: emergency outage requiring database restart
Дата
Msg-id CAHyXU0wLgMvD_KVJyfZhACBpkfDbPEawkqbx2EObYxMt2O=kMA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: emergency outage requiring database restart  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Ответы Re: emergency outage requiring database restart  (Merlin Moncure <mmoncure@gmail.com>)
Re: emergency outage requiring database restart  (Kevin Grittner <kgrittn@gmail.com>)
Список pgsql-hackers
On Mon, Oct 17, 2016 at 2:04 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> Merlin Moncure wrote:
>
>> castaging=# CREATE OR REPLACE VIEW vw_ApartmentSample AS
>> castaging-#   SELECT ...
>> ERROR:  42809: "pg_cast_oid_index" is an index
>> LINE 11:   FROM ApartmentSample s
>>                 ^
>> LOCATION:  heap_openrv_extended, heapam.c:1304
>>
>> should I be restoring from backups?
>
> It's pretty clear to me that you've got catalog corruption here.  You
> can try to fix things manually as they emerge, but that sounds like a
> fool's errand.

Yeah.  Believe me -- I know the drill.  Most or all the damage seemed
to be to the system catalogs with at least two critical tables dropped
or inaccessible in some fashion.  A lot of the OIDs seemed to be
pointing at the wrong thing.  Couple more datapoints here.

*) This database is OLTP, doing ~ 20 tps avg (but very bursty)
*) Another database on the same cluster was not impacted.  However
it's more olap style and may not have been written to during the
outage

Now, this infrastructure running this system is running maybe 100ish
postgres clusters and maybe 1000ish sql server instances with
approximately zero unexplained data corruption issues in the 5 years
I've been here.  Having said that, this definitely smells and feels
like something on the infrastructure side.  I'll follow up if I have
any useful info.

merlin



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Heikki Linnakangas
Дата:
Сообщение: Re: Query cancel seems to be broken in master since Oct 17
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Query cancel seems to be broken in master since Oct 17