Обсуждение: Re: [SQL] PostgreSQL server terminated by signal 11

Поиск
Список
Период
Сортировка

Re: [SQL] PostgreSQL server terminated by signal 11

От
"Daniel Caune"
Дата:

> -----Message d'origine-----
> De : Tom Lane [mailto:tgl@sss.pgh.pa.us]
> Envoyé : jeudi, juillet 27, 2006 16:06
> À : Daniel Caune
> Cc : pgsql-sql@postgresql.org
> Objet : Re: [SQL] PostgreSQL server terminated by signal 11
>
> "Daniel Caune" <daniel.caune@ubisoft.com> writes:
> > My PostgreSQL server running on a Linux machine is terminated by signal
> > 11 whenever I try to create some indexes on a table, which contains
> > quite a lot of data.
>
> Judging from your examples it's got something to do with the partial
> index WHERE clause.  What PG version is this exactly?  If you leave out
> different parts of the WHERE, does it still crash?  Does the crash
> happen immediately after you give the command, or does it run for
> awhile?  It might be worth getting a stack trace from the failure
> (best way is to attach to the running backend with gdb, provoke the
> crash, and do "bt" --- search for "gdb" in the archives if you need
> details).
>
>             regards, tom lane

Quite a long time I didn't use gdb! :-)  Anyway I proceed as described hereafter; correct me if I was wrong.

> ps -eaf | grep postgres

postgres  2792  2789  0 21:50 pts/2    00:00:00 su postgres
postgres  2793  2792  0 21:50 pts/2    00:00:00 bash
postgres  2902     1  7 22:17 ?        00:01:10 postgres: dbo agora [local] idle

                                                                               
postgres  2952     1  2 22:32 ?        00:00:00 /usr/lib/postgresql/8.1/bin/postmaster -D /var/lib/postgresql/8.1/main
-cunix_socket_directory=/var/run/postgresql -c config_file=/etc/postgresql/8.1/main/postgresql.conf -c
hba_file=/etc/postgresql/8.1/main/pg_hba.conf-c ident_file=/etc/postgresql/8.1/main/pg_ident.conf 
postgres  2954  2952  0 22:32 ?        00:00:00 postgres: writer process

                                                                               
postgres  2955  2952  0 22:32 ?        00:00:00 postgres: stats buffer process

                                                                               
postgres  2956  2955  0 22:32 ?        00:00:00 postgres: stats collector process

                                                                               

I connected to the postgres server using psql and I retrieved the backend pid by executing the statement "SELECT
pg_backend_pid();"

I started gdb under the UNIX account postgres and I attached the backend process providing the pid returned by the
statement.

I run the command responsible for creating the index and I entered "continue" in gdb for executing the command.  After
awhile, the server crashes: 

  Program received signal SIGSEGV, Segmentation fault.
  0x08079e2a in slot_attisnull ()
  (gdb)
  Continuing.

  Program terminated with signal SIGSEGV, Segmentation fault.
  The program no longer exists.

I can't do "bt" since the program no longer exists.  How can I provide more information, stack trace, and so on?

--
Daniel

Re: [SQL] PostgreSQL server terminated by signal 11

От
Tom Lane
Дата:
"Daniel Caune" <daniel.caune@ubisoft.com> writes:
> I run the command responsible for creating the index and I entered "continue" in gdb for executing the command.
Aftera while, the server crashes: 

>   Program received signal SIGSEGV, Segmentation fault.
>   0x08079e2a in slot_attisnull ()
>   (gdb)
>   Continuing.

>   Program terminated with signal SIGSEGV, Segmentation fault.
>   The program no longer exists.

> I can't do "bt" since the program no longer exists.

I think you typed one carriage return too many and the thing re-executed
the last command, ie, the continue.  Try it again.

The lack of arguments shown for slot_attisnull suggests that all we're
going to get is a list of function names, without line numbers or
argument values.  If that's not enough to figure out the problem, can
you rebuild with --enable-debug to get a more useful stack trace?

            regards, tom lane

Re: [SQL] PostgreSQL server terminated by signal 11

От
"D'Arcy J.M. Cain"
Дата:
On Thu, 27 Jul 2006 19:00:27 -0400
"Daniel Caune" <daniel.caune@ubisoft.com> wrote:
> I run the command responsible for creating the index and I entered "continue" in gdb for executing the command.
Aftera while, the server crashes: 
>
>   Program received signal SIGSEGV, Segmentation fault.
>   0x08079e2a in slot_attisnull ()

That's a pretty small function.  I don't see much room for error.  This
diff in src/backend/access/common/heaptuple.c seems like the most
likely place to catch it.

RCS file: /cvsroot/pgsql/src/backend/access/common/heaptuple.c,v
retrieving revision 1.110
diff -u -p -u -r1.110 heaptuple.c
--- heaptuple.c 14 Jul 2006 14:52:16 -0000      1.110
+++ heaptuple.c 27 Jul 2006 23:37:54 -0000
@@ -1470,8 +1470,13 @@ slot_getsomeattrs(TupleTableSlot *slot,
 bool
 slot_attisnull(TupleTableSlot *slot, int attnum)
 {
-       HeapTuple       tuple = slot->tts_tuple;
-       TupleDesc       tupleDesc = slot->tts_tupleDescriptor;
+       HeapTuple       tuple;
+       TupleDesc       tupleDesc;
+
+       assert(slot != NULL);
+
+       tuple =  slot->tts_tuple;
+       tupleDesc = slot->tts_tupleDescriptor;

        /*
         * system attributes are handled by heap_attisnull

Of course, you still have to find out what's calling it with slot set
to NULL if that turns out to be the problem.  It may also be that slot
is not NULL but set to garbage.  You could also add a notice there.
Two, in fact.  One to display the address of slot and one to display
the value of slot->tts_tuple or slot->tts_tupleDescriptor.  If the
first shows a non NULL value and the second causes your crash that
tells you that the value of slot is probably trashed before
calling the function.

Do this in conjunction with Tom Lanes suggestion of "--enable-debug" for
more information.

--
D'Arcy J.M. Cain <darcy@druid.net>         |  Democracy is three wolves
http://www.druid.net/darcy/                |  and a sheep voting on
+1 416 425 1212     (DoD#0082)    (eNTP)   |  what's for dinner.

Re: [SQL] PostgreSQL server terminated by signal 11

От
Daniel CAUNE
Дата:
> -----Message d'origine-----
> De : pgsql-sql-owner@postgresql.org [mailto:pgsql-sql-owner@postgresql.org]
> De la part de Tom Lane
> Envoyé : jeudi 27 juillet 2006 19:26
> À : Daniel Caune
> Cc : pgsql-admin@postgresql.org; pgsql-sql@postgresql.org
> Objet : Re: [SQL] PostgreSQL server terminated by signal 11
>
> "Daniel Caune" <daniel.caune@ubisoft.com> writes:
> > I run the command responsible for creating the index and I entered
> "continue" in gdb for executing the command.  After a while, the server
> crashes:
>
> >   Program received signal SIGSEGV, Segmentation fault.
> >   0x08079e2a in slot_attisnull ()
> >   (gdb)
> >   Continuing.
>
> >   Program terminated with signal SIGSEGV, Segmentation fault.
> >   The program no longer exists.
>
> > I can't do "bt" since the program no longer exists.
>
> I think you typed one carriage return too many and the thing re-executed
> the last command, ie, the continue.  Try it again.
>

OK, I'll try that tomorrow morning.  Perhaps can I set a conditional breakpoint to function slot_attisnull when
parameterslot is null (or slot->tts_tupleDescriptor is null). 

> The lack of arguments shown for slot_attisnull suggests that all we're
> going to get is a list of function names, without line numbers or
> argument values.  If that's not enough to figure out the problem, can
> you rebuild with --enable-debug to get a more useful stack trace?
>

Well, I installed PostgreSQL using apt-get but that won't be a problem to get the source from the CVS repository and to
builda postgres binary using the option you provide to me.  Just let me the time to do that. :-) 

Thanks,


--
Daniel


Re: [SQL] PostgreSQL server terminated by signal 11

От
Daniel CAUNE
Дата:
> -----Message d'origine-----
> De : pgsql-sql-owner@postgresql.org [mailto:pgsql-sql-owner@postgresql.org]
> De la part de D'Arcy J.M. Cain
> Envoyé : jeudi 27 juillet 2006 19:49
> À : Daniel Caune
> Cc : tgl@sss.pgh.pa.us; pgsql-admin@postgresql.org; pgsql-
> sql@postgresql.org
> Objet : Re: [SQL] PostgreSQL server terminated by signal 11
>
> On Thu, 27 Jul 2006 19:00:27 -0400
> "Daniel Caune" <daniel.caune@ubisoft.com> wrote:
> > I run the command responsible for creating the index and I entered
> "continue" in gdb for executing the command.  After a while, the server
> crashes:
> >
> >   Program received signal SIGSEGV, Segmentation fault.
> >   0x08079e2a in slot_attisnull ()
>
> That's a pretty small function.  I don't see much room for error.  This
> diff in src/backend/access/common/heaptuple.c seems like the most
> likely place to catch it.
>
> RCS file: /cvsroot/pgsql/src/backend/access/common/heaptuple.c,v
> retrieving revision 1.110
> diff -u -p -u -r1.110 heaptuple.c
> --- heaptuple.c 14 Jul 2006 14:52:16 -0000      1.110
> +++ heaptuple.c 27 Jul 2006 23:37:54 -0000
> @@ -1470,8 +1470,13 @@ slot_getsomeattrs(TupleTableSlot *slot,
>  bool
>  slot_attisnull(TupleTableSlot *slot, int attnum)
>  {
> -       HeapTuple       tuple = slot->tts_tuple;
> -       TupleDesc       tupleDesc = slot->tts_tupleDescriptor;
> +       HeapTuple       tuple;
> +       TupleDesc       tupleDesc;
> +
> +       assert(slot != NULL);
> +
> +       tuple =  slot->tts_tuple;
> +       tupleDesc = slot->tts_tupleDescriptor;
>
>         /*
>          * system attributes are handled by heap_attisnull
>
> Of course, you still have to find out what's calling it with slot set
> to NULL if that turns out to be the problem.  It may also be that slot
> is not NULL but set to garbage.  You could also add a notice there.
> Two, in fact.  One to display the address of slot and one to display
> the value of slot->tts_tuple or slot->tts_tupleDescriptor.  If the
> first shows a non NULL value and the second causes your crash that
> tells you that the value of slot is probably trashed before
> calling the function.
>

Yes, I was afraid to go that deeper, but it's time! :-))

Actually it seems, from the source code, that a null slot->tts_tuple won't lead to a segmentation fault in function
slot_attisnull,while slot and slot->tts_tupleDescriptor will.  I will trace the function trying to discover what goes
wrongbehind the scene. 

> Do this in conjunction with Tom Lane suggestion of "--enable-debug" for
> more information.
>
OK

--
Daniel


Re: [SQL] PostgreSQL server terminated by signal 11

От
Tom Lane
Дата:
Daniel CAUNE <d.caune@free.fr> writes:
> Actually it seems, from the source code, that a null slot->tts_tuple
> won't lead to a segmentation fault in function slot_attisnull, while
> slot and slot->tts_tupleDescriptor will.

I'll bet on D'Arcy's theory that slot is being passed in as NULL.
Exactly why remains to be seen ... we need that stack trace!

            regards, tom lane