Обсуждение: transaction control in pl/pgsql

Поиск
Список
Период
Сортировка

transaction control in pl/pgsql

От
Birgit Laggner
Дата:
Dear list,

I have some data (big size) and I've written a long function in pl/pgsql
which processes the data in several steps. At a test run my function
aborted because of memory exhaustion. My guess is, that everything what
happens during the function transaction is stored in the memory - until
it's full... So, my idea for solving that problem would be to cut the
big function into smaller functions. But, I don't want to write 30
function calls at the end - I would rather like to have one function
which is calling all these small functions, so I would only have to
write one sql-query at the end. What I fear is either, that, if this
function calls the other functions, everything is only one trancaction
again and I get memory overflow once more.

I've read the documentation regarding this on
http://www.postgresql.org/docs/8.4/interactive/plpgsql-structure.html :

"It is important not to confuse the use of BEGIN/END for grouping
statements in PL/pgSQL with the similarly-named SQL commands for
transaction control. PL/pgSQL's BEGIN/END are only for grouping; they do
not start or end a transaction. Functions and trigger procedures are
always executed within a transaction established by an outer query —
they cannot start or commit that transaction, since there would be no
context for them to execute in."

Somewhere else I've read: "PostgreSQL does not have nested transactions."

I'm still not sure if I got it right or if there are other possibilities
to solve my problem. Any suggestions would be appreciated!

Thanks and regards,

Birgit.

Re: transaction control in pl/pgsql

От
Grzegorz Jaśkiewicz
Дата:
a) you can't explicitly control transactions in plpgsql. If you need some sort of a form of it, use save points.
b) you are trying to outsmart database software, and this is just a biiig mistake, and you should stop doing that completely.

Re: transaction control in pl/pgsql

От
Pavel Stehule
Дата:
2010/4/1 Birgit Laggner <birgit.laggner@vti.bund.de>:
> Dear list,
>
> I have some data (big size) and I've written a long function in pl/pgsql
> which processes the data in several steps. At a test run my function
> aborted because of memory exhaustion. My guess is, that everything what
> happens during the function transaction is stored in the memory - until
> it's full... So, my idea for solving that problem would be to cut the
> big function into smaller functions. But, I don't want to write 30
> function calls at the end - I would rather like to have one function
> which is calling all these small functions, so I would only have to
> write one sql-query at the end. What I fear is either, that, if this
> function calls the other functions, everything is only one trancaction
> again and I get memory overflow once more.

plpgsql can you implicit subtransaction - every block with protected
section is evaluated under subtransaction. But I don't think, so
subtransaction help in your case. You do some what is memory expensive
- example: larger SRF function in plpgsql, badly used hash
aggregation, maybe using of deffered triggers. Subtransaction doesn't
help. Try to use temp tables instead.

Regards
Pavel Stehule

>
> I've read the documentation regarding this on
> http://www.postgresql.org/docs/8.4/interactive/plpgsql-structure.html :
>
> "It is important not to confuse the use of BEGIN/END for grouping
> statements in PL/pgSQL with the similarly-named SQL commands for
> transaction control. PL/pgSQL's BEGIN/END are only for grouping; they do
> not start or end a transaction. Functions and trigger procedures are
> always executed within a transaction established by an outer query —
> they cannot start or commit that transaction, since there would be no
> context for them to execute in."
>
> Somewhere else I've read: "PostgreSQL does not have nested transactions."
>
> I'm still not sure if I got it right or if there are other possibilities
> to solve my problem. Any suggestions would be appreciated!
>
> Thanks and regards,
>
> Birgit.
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

Re: transaction control in pl/pgsql

От
Birgit Laggner
Дата:
Hi Grzegorz,

sorry, but that doesn't help me, perhaps you could get a little bit clearer:

@a) Does the use of SAVEPOINT avoid memory overflow? I could not find an
explanation about memory use in the documentation of SAVEPOINT.

@b) Do you mean I should not process my data or I should not use plpgsql
to do that? In what way I'm trying to outsmart the software???

Thanks,

Birgit.


On 01.04.2010 12:35, Grzegorz Jaśkiewicz wrote:
> a) you can't explicitly control transactions in plpgsql. If you need
> some sort of a form of it, use save points.
> b) you are trying to outsmart database software, and this is just a
> biiig mistake, and you should stop doing that completely.
>

Re: transaction control in pl/pgsql

От
Grzegorz Jaśkiewicz
Дата:


2010/4/1 Birgit Laggner <birgit.laggner@vti.bund.de>
Hi Grzegorz,

sorry, but that doesn't help me, perhaps you could get a little bit clearer:

@a) Does the use of SAVEPOINT avoid memory overflow? I could not find an
explanation about memory use in the documentation of SAVEPOINT.

transactions don't really use a lot of memory, but you want to keep them short, due to possible locking, etc.
 
@b) Do you mean I should not process my data or I should not use plpgsql
to do that? In what way I'm trying to outsmart the software???


you are trying to save some memory, that database is going to allocated. You can control that much better by writing things the way others do, and by tweaking your config.

What I was trying to say in a), is that you can control transactions in a way - by using savepoints.

You started to talk about saving memory that database might allocate for transactions. This sounds like 'I am trying to be smarter about things than my DB'. Don't do that. It is good to know how things work, and use it wisely, but don't do things for database, since it was designed to take care of memory, and transactions, etc.




--
GJ

Re: transaction control in pl/pgsql

От
Alban Hertroys
Дата:
On 1 Apr 2010, at 12:22, Birgit Laggner wrote:

> Dear list,
>
> I have some data (big size) and I've written a long function in pl/pgsql
> which processes the data in several steps. At a test run my function
> aborted because of memory exhaustion. My guess is, that everything what
> happens during the function transaction is stored in the memory - until
> it's full...

No, transactions operate on the database just like anything else. Running out of memory doing that is not impossible,
butyou'd need to create some really big transactions to reach those limits (I think I've been quoted the limit being
2^32instructions per transaction a looong time ago, just to give you an indication). 

You're probably running out of memory for another reason. Now to figure out why...

I've seen an issue with deferred constraints causing this, for example. Do you use deferred constraints?

Another cause that pops up regularly is that people specify too much global work_mem for postgres. work_mem gets
allocatedper connection, so the more connections you have the more memory goes to work_mem and other resources may
receivetoo little, or you cross what's available. 
What are your relevant postgres settings (max connections, work_mem, shared mem, etc.) and how much physical, total and
sharedmemory does your server have? 

Another possibility is that your function stores large amounts of data in variables that are not backed up by database
tables.That means all that data will be stored in memory, and even if it goes to swap at some point (not sure it will,
itwould seriously hurt performance) there is a limit. 
If this is the case, maybe you could use temporary tables to process that data instead of trying to do it all in
memory.

> So, my idea for solving that problem would be to cut the
> big function into smaller functions. But, I don't want to write 30
> function calls at the end - I would rather like to have one function
> which is calling all these small functions, so I would only have to
> write one sql-query at the end.

Splitting up big functions into smaller functions is always a good idea. That's part of general programming paradigms.
Itwon't cut down the size of your transaction though. 

> What I fear is either, that, if this
> function calls the other functions, everything is only one trancaction
> again and I get memory overflow once more.


If the problem persists, maybe you could post your function somewhere. As it's apparently a rather long function, can
youstrip it down to something that still causes it to run out of memory but that will be a bit easier for the people on
thislist to wade through? 

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:737,4bb4831310417247659380!



Re: transaction control in pl/pgsql

От
Birgit Laggner
Дата:
Hi Alban,

thanks for your detailed answer!

My database settings:
max connections: 20
work_mem: 100MB
shared buffers: 12000MB

Server memory:
physical 32GB
total memory 50GB (incl. swap)
shared memory ??

I am not sure if I use deferred constraints - the only constraints I use
in the function are NOT NULL constraints. But I don't believe my memory
overflow had something to do with them, because the error occured during
a loop. There should not happen any sudden changes regarding any NOT
NULL constraints between one loop cycle and the next.

My function was the only connection to the database as far as I know -
so, I don't think allocation of memory should be a reason...

I would say: Yes, my function seems to store large amounts of data in
memory. But in my function, I tried to store as much as possible of the
interim results in real tables (not temp tables) instead of storing them
in variables. But my guess is that postgres doesn't write the tables and
therefore keeps everything in memory. (by the way: the swap was used up,
too)

It's really difficult to post only a part of the function, just because
every next step is based on the result of the previous step. I also
guess that every step on its own wouldn't cause memory overflow, but
that it's more like the memory use adds up with every step. But I will
try and cut the function into little snippets and let them run one for
one - perhaps the memory overflows still occurs for one snippet...

I you have any ideas ...

Thanks again and regards,

Birgit.




On 01.04.2010 13:27, Alban Hertroys wrote:
> On 1 Apr 2010, at 12:22, Birgit Laggner wrote:
>
>
>> Dear list,
>>
>> I have some data (big size) and I've written a long function in pl/pgsql
>> which processes the data in several steps. At a test run my function
>> aborted because of memory exhaustion. My guess is, that everything what
>> happens during the function transaction is stored in the memory - until
>> it's full...
>>
> No, transactions operate on the database just like anything else. Running out of memory doing that is not impossible,
butyou'd need to create some really big transactions to reach those limits (I think I've been quoted the limit being
2^32instructions per transaction a looong time ago, just to give you an indication). 
>
> You're probably running out of memory for another reason. Now to figure out why...
>
> I've seen an issue with deferred constraints causing this, for example. Do you use deferred constraints?
>
> Another cause that pops up regularly is that people specify too much global work_mem for postgres. work_mem gets
allocatedper connection, so the more connections you have the more memory goes to work_mem and other resources may
receivetoo little, or you cross what's available. 
> What are your relevant postgres settings (max connections, work_mem, shared mem, etc.) and how much physical, total
andshared memory does your server have? 
>
> Another possibility is that your function stores large amounts of data in variables that are not backed up by
databasetables. That means all that data will be stored in memory, and even if it goes to swap at some point (not sure
itwill, it would seriously hurt performance) there is a limit. 
> If this is the case, maybe you could use temporary tables to process that data instead of trying to do it all in
memory.
>
>
>> So, my idea for solving that problem would be to cut the
>> big function into smaller functions. But, I don't want to write 30
>> function calls at the end - I would rather like to have one function
>> which is calling all these small functions, so I would only have to
>> write one sql-query at the end.
>>
> Splitting up big functions into smaller functions is always a good idea. That's part of general programming
paradigms.It won't cut down the size of your transaction though. 
>
>
>> What I fear is either, that, if this
>> function calls the other functions, everything is only one trancaction
>> again and I get memory overflow once more.
>>
>
> If the problem persists, maybe you could post your function somewhere. As it's apparently a rather long function, can
youstrip it down to something that still causes it to run out of memory but that will be a bit easier for the people on
thislist to wade through? 
>
> Alban Hertroys
>
> --
> If you can't see the forest for the trees,
> cut the trees and you'll see there is no forest.
>
>
> !DSPAM:1054,4bb4832810417514219450!
>
>
>
>

Re: transaction control in pl/pgsql

От
Alban Hertroys
Дата:
On 1 Apr 2010, at 14:38, Birgit Laggner wrote:

> Hi Alban,
>
> thanks for your detailed answer!
>
> My database settings:
> max connections: 20
> work_mem: 100MB
> shared buffers: 12000MB
>
> Server memory:
> physical 32GB
> total memory 50GB (incl. swap)
> shared memory ??

Ok, so max work_mem in use at any given time is 20 * 100MB ~ 2G. Considering you have 12G shared buffers and 50G
availablein total that's not your problem. 

Btw, the amount of shared memory is something controlled from the kernel (you appear to be on a Linux system). If less
thanyour 12G shared buffers would have been configured for kernel shared memory, then Postgres wouldn't start up at
all,so there's probably no need to look at its value. 

> I am not sure if I use deferred constraints - the only constraints I use
> in the function are NOT NULL constraints. But I don't believe my memory
> overflow had something to do with them, because the error occured during
> a loop. There should not happen any sudden changes regarding any NOT
> NULL constraints between one loop cycle and the next.

Deferred constraints are constraints that aren't checked until the end of the transaction. You can imagine keeping
trackof the things to check on commit can build up. But this doesn't apply to NOT NULL constraints as the documentation
says:

DEFERRABLE
NOT DEFERRABLE
This controls whether the constraint can be deferred. A constraint that is not deferrable will be checked immediately
afterevery command. Checking of constraints that are deferrable may be postponed until the end of the transaction
(usingthe SET CONSTRAINTS command). NOT DEFERRABLE is the default. Only foreign key constraints currently accept this
clause.All other constraint types are not deferrable. 


> My function was the only connection to the database as far as I know -
> so, I don't think allocation of memory should be a reason...

Probably not, no.

> I would say: Yes, my function seems to store large amounts of data in
> memory. But in my function, I tried to store as much as possible of the
> interim results in real tables (not temp tables) instead of storing them
> in variables.

Good practice I think. Do any of those variables ever contain a large amount of data? I realised that since you have
onesingle function many of your variables stay allocated until the final END statement. If that's what's causing you to
runout of memory then it would help to split the function up into smaller ones, it would reduce the scope of such
variables.

> But my guess is that postgres doesn't write the tables and
> therefore keeps everything in memory. (by the way: the swap was used up,
> too)

No it certainly doesn't do that, if you're operating on tables (like you say you do) then it writes such things to its
WAL.

Maybe there's something else trying to keep lots of data in memory? Can you tell what is using all that memory?

What happens to the results of your function? Is that a large result set and is some piece of software on the same
machinebuffering all of that in memory, to display it for example? I recall psql can suffer from that problem, but it
hasa switch to turn that off (the name eludes me). Web-site scripts also have a tendency to handle data that way. 

A few wild guesses; Did someone for example configure the WAL to be on a memory-disk? Is some other process churning up
memorywhile you're running your function? 

> It's really difficult to post only a part of the function, just because
> every next step is based on the result of the previous step. I also
> guess that every step on its own wouldn't cause memory overflow, but
> that it's more like the memory use adds up with every step. But I will
> try and cut the function into little snippets and let them run one for
> one - perhaps the memory overflows still occurs for one snippet...
>
> I you have any ideas ...
>
> Thanks again and regards,
>
> Birgit.
>
>
>
>
> On 01.04.2010 13:27, Alban Hertroys wrote:
>> On 1 Apr 2010, at 12:22, Birgit Laggner wrote:
>>
>>
>>> Dear list,
>>>
>>> I have some data (big size) and I've written a long function in pl/pgsql
>>> which processes the data in several steps. At a test run my function
>>> aborted because of memory exhaustion. My guess is, that everything what
>>> happens during the function transaction is stored in the memory - until
>>> it's full...
>>>
>> No, transactions operate on the database just like anything else. Running out of memory doing that is not
impossible,but you'd need to create some really big transactions to reach those limits (I think I've been quoted the
limitbeing 2^32 instructions per transaction a looong time ago, just to give you an indication). 
>>
>> You're probably running out of memory for another reason. Now to figure out why...
>>
>> I've seen an issue with deferred constraints causing this, for example. Do you use deferred constraints?
>>
>> Another cause that pops up regularly is that people specify too much global work_mem for postgres. work_mem gets
allocatedper connection, so the more connections you have the more memory goes to work_mem and other resources may
receivetoo little, or you cross what's available. 
>> What are your relevant postgres settings (max connections, work_mem, shared mem, etc.) and how much physical, total
andshared memory does your server have? 
>>
>> Another possibility is that your function stores large amounts of data in variables that are not backed up by
databasetables. That means all that data will be stored in memory, and even if it goes to swap at some point (not sure
itwill, it would seriously hurt performance) there is a limit. 
>> If this is the case, maybe you could use temporary tables to process that data instead of trying to do it all in
memory.
>>
>>
>>> So, my idea for solving that problem would be to cut the
>>> big function into smaller functions. But, I don't want to write 30
>>> function calls at the end - I would rather like to have one function
>>> which is calling all these small functions, so I would only have to
>>> write one sql-query at the end.
>>>
>> Splitting up big functions into smaller functions is always a good idea. That's part of general programming
paradigms.It won't cut down the size of your transaction though. 
>>
>>
>>> What I fear is either, that, if this
>>> function calls the other functions, everything is only one trancaction
>>> again and I get memory overflow once more.
>>>
>>
>> If the problem persists, maybe you could post your function somewhere. As it's apparently a rather long function,
canyou strip it down to something that still causes it to run out of memory but that will be a bit easier for the
peopleon this list to wade through? 
>>
>> Alban Hertroys
>>
>> --
>> If you can't see the forest for the trees,
>> cut the trees and you'll see there is no forest.
>>
>>
>>
>>
>>
>>
>>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>
>
>
>

Alban Hertroys

--
Screwing up is an excellent way to attach something to the ceiling.


!DSPAM:737,4bb4b67310411808037631!



Re: transaction control in pl/pgsql

От
Merlin Moncure
Дата:
On Thu, Apr 1, 2010 at 6:22 AM, Birgit Laggner
<birgit.laggner@vti.bund.de> wrote:
> Dear list,
>
> I have some data (big size) and I've written a long function in pl/pgsql
> which processes the data in several steps. At a test run my function
> aborted because of memory exhaustion. My guess is, that everything what
> happens during the function transaction is stored in the memory - until
> it's full... So, my idea for solving that problem would be to cut the
> big function into smaller functions. But, I don't want to write 30
> function calls at the end - I would rather like to have one function
> which is calling all these small functions, so I would only have to
> write one sql-query at the end. What I fear is either, that, if this
> function calls the other functions, everything is only one trancaction
> again and I get memory overflow once more.

I don't know all the specifics of your case but ultimately there are
limits to what you can reasonably do in a single transaction,
especially if you are writing to the database.  If you push the limit
the database starts to push back.  Transactions generally should be as
short as possible.  Long transactions inhibit the ability of the
database to do certain types of maintenance on itself and have other
issues like bad performance and memory exhaustion.

Regardless, of how many separate functions/savepoints/begin/end blocks
your 'outer' function calls, your entire set of work is going to
operate within the context of a single transaction.  This is an iron
clad rule which (at present) there is no work around for.  For this
reason certain classes of data processing must unhappily be done on
the client side, introducing another language and forcing all the data
back and forth through the protocol.

In the future, it may be possible to execute pl/pgsql-ish type of code
in the backend that allows explicit transaction control. This feature
might be a 'stored procedure', or there might be some other type of
nomenclature to distinguish functions that manage their own
transaction state.

merlin

Re: transaction control in pl/pgsql

От
Birgit Laggner
Дата:
Hi Merlin, hi Alban,

thank you both for your helpful answers. Now, I splitted the function
into smaller parts which have to be called seperately one after another.
Probably, I will write a script for calling all the functions needed.
Not as nice as an all in one function, but if there is no other way....
Executing my function snippets I came down to some possible explanation
of the memory overflow: The function runs som loop cycles and for every
cycle, I let the function write a notice of the current loop cycle
number. I use pgadmin for writing and executing most of my postgres
stuff. So, I guess, writing all these loop cycle notices in the pgadmin
window lead to the exorbitant memory usage. Now, I let the function
write a notice only every 100 loop cycles. Until now, I didn't have
problems with memory overflow anymore...

Regards,

Birgit.

On 01.04.2010 22:33, Merlin Moncure wrote:
> On Thu, Apr 1, 2010 at 6:22 AM, Birgit Laggner
> <birgit.laggner@vti.bund.de> wrote:
>
>> Dear list,
>>
>> I have some data (big size) and I've written a long function in pl/pgsql
>> which processes the data in several steps. At a test run my function
>> aborted because of memory exhaustion. My guess is, that everything what
>> happens during the function transaction is stored in the memory - until
>> it's full... So, my idea for solving that problem would be to cut the
>> big function into smaller functions. But, I don't want to write 30
>> function calls at the end - I would rather like to have one function
>> which is calling all these small functions, so I would only have to
>> write one sql-query at the end. What I fear is either, that, if this
>> function calls the other functions, everything is only one trancaction
>> again and I get memory overflow once more.
>>
> I don't know all the specifics of your case but ultimately there are
> limits to what you can reasonably do in a single transaction,
> especially if you are writing to the database.  If you push the limit
> the database starts to push back.  Transactions generally should be as
> short as possible.  Long transactions inhibit the ability of the
> database to do certain types of maintenance on itself and have other
> issues like bad performance and memory exhaustion.
>
> Regardless, of how many separate functions/savepoints/begin/end blocks
> your 'outer' function calls, your entire set of work is going to
> operate within the context of a single transaction.  This is an iron
> clad rule which (at present) there is no work around for.  For this
> reason certain classes of data processing must unhappily be done on
> the client side, introducing another language and forcing all the data
> back and forth through the protocol.
>
> In the future, it may be possible to execute pl/pgsql-ish type of code
> in the backend that allows explicit transaction control. This feature
> might be a 'stored procedure', or there might be some other type of
> nomenclature to distinguish functions that manage their own
> transaction state.
>
> merlin
>
>