Обсуждение: server process segfaulting

Поиск
Список
Период
Сортировка

server process segfaulting

От
James Gregory
Дата:
Hi all,

I have a bit of a problem I need some help with. I have a piece of
software (a web thing) that uses postgresql, and there is a particular
piece of code in it which seems to crash the postgres server with
SIGSEGV. The logs are below.

So first of all, are there any common gotchas that make postgres crash
that I'm not aware of?

The logs from the point where it is dying are below. The last queries
before the segfault are coming from a trigger I wrote in plpython to do
referential integrity checking for inherited tables (I posted about it
before writing said code).

Which leads me to believe that this is probably a problem with plpython.
So does anyone know anything about plpython and segfaults?

Next qn. I found this:


http://snaga.org/pgsql/cvsweb.cgi/pgsql/src/pl/plpython/TODO?rev=1.1.1.1&content-type=text/x-cvsweb-markup&hideattic=0&only_with_tag=DT0_0

In point 3 it seems to suggest that if the schema of any of the tables
change, then the plpython functions will need to be recreated. It
doesn't actually say whether or not "making postgres unhappy" ==
segfault. I would like to try this and see if it will fix my problem,
but I'm more than a little concerned about postgres removing all my
triggers if I drop the function. Will postgres drop the triggers? If it
does is there an easy way to what that document is suggesting and
rebuild the triggers as I go?

And ultimately, if plpython can't be made to work for this task, what's
the best way forward? I had a quick look at the plpython source and I
don't think it's something I'll be able to hack on in the short term. Am
I better off writing a C module to do what I need to do?

Any feedback much appreciated.

oh, and

 PostgreSQL 7.3.2 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.2.2
(Mandrake Linux 9.1 3.2.2-3mdk)

from

$ rpm -q postgresql postgresql-server postgresql-contrib
postgresql-7.3.2-5mdk
postgresql-server-7.3.2-5mdk
postgresql-contrib-7.3.2-5mdk

Thanks,

James.


The log:

May 14 18:11:31 pirate postgres[11599]: [43] NOTICE:  ('running foreign
key check',)
May 14 18:11:31 pirate postgres[11599]: [44] LOG:  query: select * from
referential_constraint where foreign_key ilike $1
May 14 18:11:31 pirate postgres[11599]: [45] LOG:  query: select
count(*) as count from "domain" where "id" = $1
May 14 18:11:31 pirate postgres[11421]: [7] LOG:  server process (pid
11599) was terminated by signal 11
May 14 18:11:31 pirate postgres[11421]: [8] LOG:  terminating any other
active server processes
May 14 18:11:31 pirate postgres[11421]: [9] LOG:  all server processes
terminated; reinitializing shared memory and semaphores
May 14 18:11:31 pirate postgres[11600]: [10] LOG:  database system was
interrupted at 2003-05-14 18:02:23 EST
May 14 18:11:31 pirate postgres[11600]: [11] LOG:  checkpoint record is
at 0/345728C
May 14 18:11:31 pirate postgres[11600]: [12] LOG:  redo record is at
0/345728C; undo record is at 0/0; shutdown TRUE
May 14 18:11:31 pirate postgres[11600]: [13] LOG:  next transaction id:
55444; next oid: 157552
May 14 18:11:31 pirate postgres[11600]: [14] LOG:  database system was
not properly shut down; automatic recovery in progress
May 14 18:11:32 pirate postgres[11600]: [15] LOG:  redo starts at
0/34572CC
May 14 18:11:32 pirate postgres[11600]: [16] LOG:  ReadRecord: record
with zero length at 0/345F590
May 14 18:11:32 pirate postgres[11600]: [17] LOG:  redo done at
0/345F4DC
May 14 18:11:34 pirate postgres[11600]: [18] LOG:  database system is
ready



Re: server process segfaulting

От
Tom Lane
Дата:
James Gregory <james@anchor.net.au> writes:
> The logs from the point where it is dying are below. The last queries
> before the segfault are coming from a trigger I wrote in plpython to do
> referential integrity checking for inherited tables (I posted about it
> before writing said code).

Um.  There was a report that plpython triggers get confused if you try
to apply the same trigger procedure to multiple tables (it tries to use
the first table's row descriptor with all the other tables, and yes that
can lead to a segfault).

AFAIR this is still unfixed in CVS tip --- someone had volunteered to
produce a fix, but it has not materialized yet.  In the meantime, you
need to make a separate trigger function for each table :-(

> In point 3 it seems to suggest that if the schema of any of the tables
> change, then the plpython functions will need to be recreated.

I don't think you need to recreate them, just start a fresh session.
The cached row descriptors are only cached within a backend.

            regards, tom lane

Re: server process segfaulting

От
James Gregory
Дата:
On Thu, 2003-05-15 at 01:53, Tom Lane wrote:
> James Gregory <james@anchor.net.au> writes:
> > The logs from the point where it is dying are below. The last queries
> > before the segfault are coming from a trigger I wrote in plpython to do
> > referential integrity checking for inherited tables (I posted about it
> > before writing said code).
>
> Um.  There was a report that plpython triggers get confused if you try
> to apply the same trigger procedure to multiple tables (it tries to use
> the first table's row descriptor with all the other tables, and yes that
> can lead to a segfault).

Is it only plpython that has the problem? If I wanted to fix this where
would I start looking? presumably pgsql/src/plpython/plpython.c. Do you
have a link with more info about the bug by any chance?

Many thanks for your help. My code is exhibiting exactly that behaviour,
so it sounds like that's what the problem is.

James.



Re: server process segfaulting

От
Tom Lane
Дата:
James Gregory <james@anchor.net.au> writes:
> On Thu, 2003-05-15 at 01:53, Tom Lane wrote:
>> Um.  There was a report that plpython triggers get confused if you try
>> to apply the same trigger procedure to multiple tables (it tries to use
>> the first table's row descriptor with all the other tables, and yes that
>> can lead to a segfault).

> Is it only plpython that has the problem?

I'm not sure.  It's only been reported against plpython, but it seems
possible that our other PLs might have the same bug.  I'd only be
willing to bet that plpgsql doesn't have it, because that's the most
heavily used PL and someone woulda noticed by now...

> If I wanted to fix this where
> would I start looking? presumably pgsql/src/plpython/plpython.c. Do you
> have a link with more info about the bug by any chance?

Not offhand, but if you search the PG list archives you will find the bug
report.  I think it was back around the beginning of this year.

If fading memory serves, I suggested a quick-hack solution of including
the target table's OID into the Python name of the function (so that
triggers on different tables are automatically different Python objects)
but whoever it was that was promising to do the legwork wanted to look
for a cleaner approach.

At this point I've lost faith in whoever-it-was, and would gladly accept
a patch based on the quick-hack approach.

            regards, tom lane

Re: server process segfaulting

От
elein
Дата:
The problem is that the information in the dictionary
element TD[] that is used to store information is
probably shared by all invocations of the function
within the transaction.

It is similar to the problem where all invokations
share a common SD[]  for a particular function
in the scope of a connection.

That this is a bug or a feature is debateable.
Handling the memory scope is very tricky.

This is an educated guess. I have not looked at
the plpython code itself, altough I can vouch
for the behaviour.

elein

On Wednesday 14 May 2003 19:39, Tom Lane wrote:
> James Gregory <james@anchor.net.au> writes:
> > On Thu, 2003-05-15 at 01:53, Tom Lane wrote:
> >> Um.  There was a report that plpython triggers get confused if you try
> >> to apply the same trigger procedure to multiple tables (it tries to use
> >> the first table's row descriptor with all the other tables, and yes that
> >> can lead to a segfault).
>
> > Is it only plpython that has the problem?
>
> I'm not sure.  It's only been reported against plpython, but it seems
> possible that our other PLs might have the same bug.  I'd only be
> willing to bet that plpgsql doesn't have it, because that's the most
> heavily used PL and someone woulda noticed by now...
>
> > If I wanted to fix this where
> > would I start looking? presumably pgsql/src/plpython/plpython.c. Do you
> > have a link with more info about the bug by any chance?
>
> Not offhand, but if you search the PG list archives you will find the bug
> report.  I think it was back around the beginning of this year.
>
> If fading memory serves, I suggested a quick-hack solution of including
> the target table's OID into the Python name of the function (so that
> triggers on different tables are automatically different Python objects)
> but whoever it was that was promising to do the legwork wanted to look
> for a cleaner approach.
>
> At this point I've lost faith in whoever-it-was, and would gladly accept
> a patch based on the quick-hack approach.
>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
>
>

--
=============================================================
elein@varlena.com     Database Consulting     www.varlena.com
PostgreSQL General Bits    http:/www.varlena.com/GeneralBits/
   "Free your mind the rest will follow" -- en vogue


Re: server process segfaulting

От
Elein Mustain
Дата:
We had this problem in spades when implementing the
C interface for informix.  The problem is in allocation,
but it is a design problem, not a bug.  The problem
is local to plpython and not plpgsql because only
plpython creates storage that can be accessed between
function calls.  Special care must be taken to
ensure NEW and OLD are local to the instance and
not the statement.  And so must some of the context storage.

There are several possible scopes for memory:
session, connection, transaction, statement*, one call
The semantics for statement are very hairy because
of the theoretical infinite nesting of subselects
and function calls.

The place to attack the problem starts with the
C interfaces.  When allocating memory and attaching
it to the function call info, how long does it last?
Is the function writer expected to clean up?
Allocated data's scope should be well defined to be the single
instance of the function. Understanding these
things and then examining the semantics of python
dictionaries should lead to an understanding of sorts.

I take advantage of the plpython SD storage as
it stands and work around its limitations. This
will be fodder for my talk at oscon on running
aggregates.

If anyone really wants to tackle this, be prepared.
The memory scope issues are not simple, but they
should be easier in postgresql than in informix
because of the fe-be model.

D'Arcy should be involved and I'd really like to
go over scoping issues in more detail and perhaps
help avoid some of the worst pitfalls since I've
already done them.

Hmmm.  I think I could be clearer.  If anyone is
interested I can write something up.

elein

From tgl@sss.pgh.pa.us  Tue Jun  3 19:48:28 2003
>X-UIDL: >Yc"!&F+!!6nR!!p2F!!
>To: James Gregory <james@anchor.net.au>
>cc: elein@varlena.com, pgsql-general@postgresql.org
>Subject: Re: [GENERAL] server process segfaulting
>In-reply-to: <1054690041.3891.56.camel@pirate.bridge.anchor.net.au>
>References: <1052902703.6429.50.camel@pirate.bridge.anchor.net.au>
<1052965133.6435.57.camel@pirate.bridge.anchor.net.au><4994.1052966347@sss.pgh.pa.us>
<200306011555.19760.elein@varlena.com><1054690041.3891.56.camel@pirate.bridge.anchor.net.au> 
>Comments: In-reply-to James Gregory <james@anchor.net.au>
>    message dated "04 Jun 2003 11:27:22 +1000"
>Date: Tue, 03 Jun 2003 22:47:59 -0400
>From: Tom Lane <tgl@sss.pgh.pa.us>
>Content-Length: 897
>Lines: 20
>
>James Gregory <james@anchor.net.au> writes:
>> Is it worth tracing that through or is this not the problem?
>
>I believe the problem has been diagnosed as follows: the plpython stuff
>is assuming that any one trigger function will be used with only one
>tuple descriptor.  Apply the same trigger function to two relations with
>different rowtypes, and you get a crash, because the initially-cached
>tuple descriptor is wrong for the second relation.  It has nothing to do
>with storage allocation.
>
>I'm not sure whether the problem occurs with any PL languages besides
>plpython --- it seems like it could be a generic issue.  I don't believe
>that plpgsql suffers from it, because it's too widely used: we'd have
>heard reports if it had the problem.  But pltcl etc could have the same
>problem for all I know.
>
>You can check the archives once Marc gets the pieces put back
>together...
>
>            regards, tom lane
>

Re: server process segfaulting

От
Tom Lane
Дата:
Elein Mustain <elein@tulip.norcov.com> writes:
> The problem
> is local to plpython and not plpgsql because only
> plpython creates storage that can be accessed between
> function calls.

When I looked at it, I thought that it could be solved trivially by
instantiating a separate Python object for each per-relation version
of a trigger function.  But not being a Python user, I didn't try
to fix it because I couldn't test it very well.

            regards, tom lane