Обсуждение: The enormous s->childXids problem

Поиск
Список
Период
Сортировка

The enormous s->childXids problem

От
Theo Schlossnagle
Дата:
I've tracked the problem I mentioned earlier with my 4.5 million node  
linked list of s->childXids down.  We use plperl to connect to Oracle  
over DBI.  The select function is dbi-link's remote_select.   
remote_select will perform the query and then for each row  
return_next which calls the SPI.xs stuff to do plperl_return_next  
which is wrapped in a PG_TRY block.  I see the value of the try block  
to kick back sensible errors to perl, but creating childXids for  
every row of a setof seems wildly excessive.  What's the harm in  
simply not TRY'ing around there?

I ask with respect to the suitability as general solution and as the  
suitability for my acute issue (of a 5 million row setof returned  
from that).  Will it break anything?

Best regards,
Theo

// Theo Schlossnagle
// CTO -- http://www.omniti.com/~jesus/
// OmniTI Computer Consulting, Inc. -- http://www.omniti.com/




Re: The enormous s->childXids problem

От
Gregory Stark
Дата:
Theo Schlossnagle <jesus@omniti.com> writes:

> I've tracked the problem I mentioned earlier with my 4.5 million node linked
> list of s->childXids down.  We use plperl to connect to Oracle  over DBI.  The
> select function is dbi-link's remote_select.   remote_select will perform the
> query and then for each row  return_next which calls the SPI.xs stuff to do
> plperl_return_next  which is wrapped in a PG_TRY block.  I see the value of the
> try block  to kick back sensible errors to perl, but creating childXids for
> every row of a setof seems wildly excessive.  What's the harm in  simply not
> TRY'ing around there?

PG_TRY alone just sets up a longmp handler. Often it is used in conjunction
with subtransactions though and I wouldn't be surprised if that was happening
here but it's not quite right there.

If you look in plperl.c you'll see a number of places that do call
BeginInternalSubTransaction (and each one has a comment above that mentions
"sub-transaction"). They use PG_TRY blocks to recover control so they can
abort the subtransaction and throw a perl error.

However plperl_return_next is one of the few cases that *doesn't*. I'm not
sure exactly by what logic it gets an exception. I suppose the idea is that
there aren't very many SQL errors return next can actually trigger.

Anyways, perhaps you're also calling one of the other functions like
plperl_spi_{exec,query,fetchrow,prepare,exec_prepared,query_prepared}? I
wouldn't expect so given that you're actually doing Oracle queries though.

> I ask with respect to the suitability as general solution and as the
> suitability for my acute issue (of a 5 million row setof returned  from that).
> Will it break anything?

If you don't have a subtransaction then you can't really recover from any
error. There will be locks left over, memory allocated that isn't freed etc.
PG_TRY will recover control but you're pretty much stuck rethrowing it
eventually. In other words, any errors will require that you roll back the
whole transaction.

I'm not sure what happens to perl when you longjmp out of the perl interpreter
instead of finishing execution normally. Subtransactions and PG_TRY are both
relatively new and Postgres has had plperl for a lot longer so I imagine
there's a way to get it to work. I don't remember how it used to work though.
Perhaps it set a flag and returned from the perl interpreter normally and then
rethrew the error from outside the perl interpreter? Or perhaps perl is ok
with you longjmping out from inside it.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com


Re: The enormous s->childXids problem

От
Tom Lane
Дата:
Theo Schlossnagle <jesus@omniti.com> writes:
> The select function is dbi-link's remote_select.   
> remote_select will perform the query and then for each row  
> return_next which calls the SPI.xs stuff to do plperl_return_next  
> which is wrapped in a PG_TRY block.  I see the value of the try block  
> to kick back sensible errors to perl, but creating childXids for  
> every row of a setof seems wildly excessive.  What's the harm in  
> simply not TRY'ing around there?

Failing.

The real question is why does the subtransaction actually assign itself
an XID --- a simple RETURN NEXT operation ought not do that, AFAICS.
What is it you're doing in there that changes the database?
        regards, tom lane


Re: The enormous s->childXids problem

От
Gregory Stark
Дата:
Tom Lane <tgl@sss.pgh.pa.us> writes:

> The real question is why does the subtransaction actually assign itself
> an XID --- a simple RETURN NEXT operation ought not do that, AFAICS.
> What is it you're doing in there that changes the database?

I suspect the answer to that is the same as the answer to what's actually
creating the subtransaction. plperl_return_next doesn't. I think something
must be doing an actual SPI query, not just a return next.

-- 
greg



Re: The enormous s->childXids problem

От
Tom Lane
Дата:
Gregory Stark <gsstark@mit.edu> writes:
> Tom Lane <tgl@sss.pgh.pa.us> writes:
>> The real question is why does the subtransaction actually assign itself
>> an XID --- a simple RETURN NEXT operation ought not do that, AFAICS.

> I suspect the answer to that is the same as the answer to what's actually
> creating the subtransaction. plperl_return_next doesn't. I think something
> must be doing an actual SPI query, not just a return next.

The other question on the table is why it didn't respond to QueryCancel
in a reasonable amount of time.  I'd really like to see a complete test
case for this problem ...
        regards, tom lane


Re: The enormous s->childXids problem

От
Robert Treat
Дата:
On Saturday 16 September 2006 20:34, Tom Lane wrote:
> Gregory Stark <gsstark@mit.edu> writes:
> > Tom Lane <tgl@sss.pgh.pa.us> writes:
> >> The real question is why does the subtransaction actually assign itself
> >> an XID --- a simple RETURN NEXT operation ought not do that, AFAICS.
> >
> > I suspect the answer to that is the same as the answer to what's actually
> > creating the subtransaction. plperl_return_next doesn't. I think
> > something must be doing an actual SPI query, not just a return next.
>
> The other question on the table is why it didn't respond to QueryCancel
> in a reasonable amount of time.  I'd really like to see a complete test
> case for this problem ...
>

I think the plperl was a red herring.  Once dbi-link grabs a recordset, the 
rows are looped over, processed, and then inserted (based on some 
conditionals) into another table. Those inserts are wrapped in a 
begin....exception block, which, since it is in a loop, I suspect is creating 
the large number of childXids in cases where there are a large number of 
inserts.   I haven't tested that theory, but it seems logical, and should be 
easy enough to reproduce with a simple LOOP ... END LOOP in plpgsql.  

-- 
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL