Обсуждение: Re: [SQL] PostgreSQL server terminated by signal 11

Поиск
Список
Период
Сортировка

Re: [SQL] PostgreSQL server terminated by signal 11

От
"Daniel Caune"
Дата:
> De : Tom Lane [mailto:tgl@sss.pgh.pa.us]
> Envoyé : vendredi, juillet 28, 2006 09:38
> À : Daniel Caune
> Cc : pgsql-admin@postgresql.org; pgsql-sql@postgresql.org
> Objet : Re: [SQL] PostgreSQL server terminated by signal 11
>
> "Daniel Caune" <daniel.caune@ubisoft.com> writes:
> > Program received signal SIGSEGV, Segmentation fault.
> > 0x08079e2a in slot_attisnull ()
> > (gdb) bt
> > #0  0x08079e2a in slot_attisnull ()
> > #1  0x0807a1d0 in slot_getattr ()
> > #2  0x080c6c73 in FormIndexDatum ()
> > #3  0x080c6ef1 in IndexBuildHeapScan ()
> > #4  0x0809b44d in btbuild ()
> > #5  0x0825dfdd in OidFunctionCall3 ()
> > #6  0x080c4f95 in index_build ()
> > #7  0x080c68eb in index_create ()
> > #8  0x08117e36 in DefineIndex ()
>
> Hmph.  gdb is lying to you, because slot_getattr doesn't call
> slot_attisnull.
> This isn't too unusual in a non-debug build, because the symbol table is
> incomplete (no mention of non-global functions).
>
> Given that this doesn't happen right away, but only after it's been
> processing for awhile, we can assume that FormIndexDatum has been
> successfully iterated many times already, which seems to eliminate
> theories like the slot or the keycol value being bogus.  I'm pretty well
> convinced now that we're looking at a problem with corrupted data.  Can
> you do a SELECT * FROM (or COPY FROM) the table without error?
>
>             regards, tom lane

The statement "copy gslog_event to stdout;" leads to "ERROR:  invalid memory alloc request size 4294967293" after
awhile.

  (...)
  354964834       2006-07-19 10:53:42.813+00      (...)
  354964835       2006-07-19 10:53:44.003+00      (...)
  ERROR:  invalid memory alloc request size 4294967293


I tried then "select * from gslog_event where gslog_event_id >= 354964834 and gslog_event_id <= 354964900;":

  354964834 | 2006-07-19 10:53:42.813+00 | (...)
  354964835 | 2006-07-19 10:53:44.003+00 | (...)
  354964837 | 2006-07-19 10:53:44.113+00 | (...)
  354964838 | 2006-07-19 10:53:44.223+00 | (...)
  (...)
  (66 rows)


The statement "select * from gslog_event;" leads to "Killed"...  Ouch! The psql client just exits (the postgres server
crashestoo)! 

The statement "select * from gslog_event where gslog_event_id <= 354964834;" passed.


I did other tests on some other tables that contain less data but that seem also corrupted:

  copy player to stdout
  ERROR:  invalid memory alloc request size 1918988375

  select * from player where id >=771042 and id<=771043;
  ERROR:  invalid memory alloc request size 1918988375

  select max(length(username)) from player;
  ERROR:  invalid memory alloc request size 1918988375

  select max(length(username)) from player where id <= 771042;
   max
  -----
    15

  select max(length(username)) from player where id >= 771050;
   max
  -----
    15

  select max(length(username)) from player where id >= 771044 and id <= 771050;
   max
  -----
    13

Finally:

  select * from player where id=771043;
  ERROR:  invalid memory alloc request size 1918988375

  select id from player where id=771043;
     id
  --------
   771043
  (1 row)

  agora=> select username from player where id=771043;
  ERROR:  invalid memory alloc request size 1918988375


I'm also pretty much convinced that there are some corrupted data, especially varchar row.  Before dropping corrupted
rows,is there a way to read part of corrupted data? 

Thanks Tom for your great support.  I'm just afraid that I wasted your time...  Anyway I'll write a FAQ that provides
someinformation about this kind of problem we have faced. 

Regards,


--
Daniel

Re: [SQL] PostgreSQL server terminated by signal 11

От
Tom Lane
Дата:
"Daniel Caune" <daniel.caune@ubisoft.com> writes:
> The statement "copy gslog_event to stdout;" leads to "ERROR:  invalid memory alloc request size 4294967293" after
awhile.
> ...
> I did other tests on some other tables that contain less data but that seem also corrupted:

This is a bit scary as it suggests a systemic problem.  You should
definitely try to find out exactly what the corruption looks like.
It's usually not hard to home in on where the first corrupted row is
--- you do
    SELECT ctid, * FROM tab LIMIT n;
and determine the largest value of n that won't trigger a failure.
The corrupted region is then just after the last ctid you see.
You can look at those blocks with "pg_filedump -i -f" and see if
anything pops out.  Check the PG archives for previous discussions
of dealing with corrupted data.

            regards, tom lane