Re: Memory Errors

Поиск
Список
Период
Сортировка
От Merlin Moncure
Тема Re: Memory Errors
Дата
Msg-id AANLkTi=E8itOJPzs-3kCADm+e3ww9J29GSjQ=zO4auJa@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Memory Errors  (Sam Nelson <samn@consistentstate.com>)
Ответы Re: Memory Errors  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-general
On Wed, Sep 8, 2010 at 4:03 PM, Sam Nelson <samn@consistentstate.com> wrote:
> It figures I'd have an idea right after posting to the mailing list.
> Yeah, running COPY foo TO stdout; gets me a list of data before erroring
> out, so I did a copy (select * from foo order by id asc) to stdout; to see
> if I could make some kind of guess as to whether this was related to a
> single row or something else.
> I got the id of the last row the copy to command was able to grab normally
> and tried to figure out the next id.  The following started to make me think
> along the lines of some kinda bad corruption (even before getting responses
> that agreed with that):
> Assuming that the last id copied was 1500:
> 1) select * from foo where id = (select min(id) from foo where id > 1500);
> Results in 0 rows
> 2) select min(id) from foo where id > 1500;
> Results in, for example, 200000
> 3) select max(id) from foo where id > 1500;
> Results in, for example, 90000 (a much lower number than returned by min)
> 4) select id from foo where id > 1500 order by id asc limit 10;
> Results in (for example):
> 200000
> 202000
> 210273
> 220980
> 15005
> 15102
> 15104
> 15110
> 15111
> 15113
> So ... yes, it seems that those four id's are somehow part of the problem.
> They're on amazon EC2 boxes (yeah, we're not too fond of the EC2 boxes
> either), so memtest isn't available, but no new corruption has cropped up
> since they stopped killing the waiting queries (I just double checked - they
> were getting corrupted rows constantly, and we haven't gotten one since that
> script stopped killing queries).

That's actually a startling indictment of ec2 -- how were you killing
your queries exactly?  You say this is repeatable?  What's your
setting of full_page_writes?

one way to identify and potentially nuke bad records of this kind is
to do something like:

select max(length(field1)) from foo order by 1 desc limit 5;

where field1 is the first varlen field (text, bytea, etc) from left to
right order.  look for bogously high values and move on to the next
field if you don't find any.  once you hit a bad value, try deleting
the record by it's key.

once you've found/deleted them all,  immediately pull off a dump, then
rebuild the table.  as always, take a filesystem dump before doing
this type of surgery...

merlin
merlin

В списке pgsql-general по дате отправления:

Предыдущее
От: Susan Cassidy
Дата:
Сообщение: Re: how do i count() similar items
Следующее
От: John R Pierce
Дата:
Сообщение: Re: error while autovacuuming