Обсуждение: Problem with aborting entire transactions on error

Поиск

Список

Период

Сортировка

Problem with aborting entire transactions on error

От

Zbigniew

Дата:

10 декабря 2012 г., 06:20:39

Hello,

As I read while googling the web, many people complained about this
before. Couldn't it be made optional (can be even with "default ON")?
I understand, that there are situations, when it is a must - for
example, when the rest of queries rely on the result of first ones -
but there are numerous situations, when just skipping a faulty query
is all we need.

A simple - but very common - example: I wanted to perform really large
number of inserts - using transaction, to make it faster - while being
sure the duplicate entries will be skipped. Of course, this job will
be done best by server itself, which is keeping an eye on "primary
key" of the table. Unfortunately: not during a transaction! Any "dupe"
will trash thousands other (proper) entries immediately.

Why is this? My guess is, there is kind of logic in the code, like this:

if { no error during query } {
  do it
} else {
 withdraw this one
 rollback entire transaction
}

Therefore my request - and, as I saw, of many others - would be just
to introduce a little change:

if { no error during query } {
  do it
} else {
 withdraw this one
 if { ROLLBACK_ON_ERROR } {
   rollback entire transaction
  }
}

(if there's no ROLLBACK_ON_ERROR - it should carry on with the
remaining queries)

Is it really so problematic to introduce such code change, allowing
the users to control this behaviour? Yes, I read about using
"savepoints" - but I think we agree, it's just cumbersome workaround -
and not real solution, like my proposal. All we need is either a
variable to set, or a command, that will allow to modify the present
functionality in the way described above.
--
regards,
Zbigniew

Re: Problem with aborting entire transactions on error

От

David Johnston

Дата:

10 декабря 2012 г., 07:04:08

On Dec 9, 2012, at 22:20, Zbigniew <zbigniew2011@gmail.com> wrote:

> Hello,
>
> As I read while googling the web, many people complained about this
> before. Couldn't it be made optional (can be even with "default ON")?
> I understand, that there are situations, when it is a must - for
> example, when the rest of queries rely on the result of first ones -
> but there are numerous situations, when just skipping a faulty query
> is all we need.
>
> A simple - but very common - example: I wanted to perform really large
> number of inserts - using transaction, to make it faster - while being
> sure the duplicate entries will be skipped.

And what happens if one of those inserts errors out for reasons other than a duplicate?

> Of course, this job will
> be done best by server itself, which is keeping an eye on "primary
> key" of the table. Unfortunately: not during a transaction! Any "dupe"
> will trash thousands other (proper) entries immediately.
>
> Why is this? My guess is, there is kind of logic in the code, like this:
>
> if { no error during query } {
>  do it
> } else {
> withdraw this one
> rollback entire transaction
> }
>
> Therefore my request - and, as I saw, of many others - would be just
> to introduce a little change:
>
> if { no error during query } {
>  do it
> } else {
> withdraw this one
> if { ROLLBACK_ON_ERROR } {
>   rollback entire transaction
>  }
> }
>
> (if there's no ROLLBACK_ON_ERROR - it should carry on with the
> remaining queries)
>
> Is it really so problematic to introduce such code change, allowing
> the users to control this behaviour?

Since current belief is that such behavior is unwise no one is willing to give their time to doing so. I'm not sure
whetherthat means that if you supplied such a patch it would be rejected.  Since their is a correct way to solve the
duplicatesscenario (see below) without savepoints you may wish to supply another example if you want to try and
convincepeople. 

> Yes, I read about using
> "savepoints" - but I think we agree, it's just cumbersome workaround -
> and not real solution, like my proposal. All we need is either a
> variable to set, or a command, that will allow to modify the present
> functionality in the way described above.

The true solution is to insert into a staging table that allows duplicates (but ideally contains other constraints that
youdo care about) and then de-dupe and insert into your final table. 

> --
> regards,
> Zbigniew
>

This may be an undesirable instance of the database not allowing you to shoot yourself in the foot but as ignoring
errorsis bad practice motivation to allow it is small.  You should always be able to import the data without errors
intoan explicitly defined table and then write queries to convert between the input format and the final format -
explicitlymaking no coding the necessary translation decisions and procedures.  In that way you always know that your
importroutine is always working as expected and not guessing whether it was the known error condition or an unknown
conditionthe caused a record to go missing. 

David J.

Re: Problem with aborting entire transactions on error

От

"David Johnston"

Дата:

10 декабря 2012 г., 17:16:53

Please reply-to the list, not just myself.

> -----Original Message-----
> From: Zbigniew [mailto:zbigniew2011@gmail.com]
> Sent: Monday, December 10, 2012 6:26 AM
> To: David Johnston
> Subject: Re: [GENERAL] Problem with aborting entire transactions on error
>
> 2012/12/10, David Johnston <polobo@yahoo.com>:
>
> >> A simple - but very common - example: I wanted to perform really
> >> large number of inserts - using transaction, to make it faster -
> >> while being sure the duplicate entries will be skipped.
> >
> > And what happens if one of those inserts errors out for reasons other
> > than a duplicate?
>
> Then maybe there could be something like
> "DON'T_ROLLBACK_ON_DUPLICATE_ERROR"
> instead of "more general" directive?
>
> > Since current belief is that such behavior is unwise
>
> No idea, why common belief is "allowing the user to decide is unwise".
> Is the "common assumption", that the users of Postgres are children
> 4-5 years old?

If it requires coding something to provide the user the desired flexibility
then whether or not such flexibility is wise or unwise is going to go into
said decision.  Also, since you are begging others to solve your own
problems you are in many ways behaving as a child.  Especially with an Open
Source project like PostgreSQL adults are welcomed and encouraged to solve
their own problems by altering the source code and, in the spirit of
community, contributing it back to the project (contribution also means you
do not have to maintain your own custom version of the software).  Please do
not take this as a personal affront; I just simply wanted to extend on the
analogy that you put forward.  I fully recognize the need to ask the
community for help myself (I am also not a coder) but in doing so I know I
am imposing on the charity of others and as such need to make best efforts
to understand and respect their point-of-view and readily acknowledge my own
ignorance in the matter.

>
> > no one is willing to give their time to doing so.
>
> Pay attention, that actually my (and not just mine) request isn't to write
> some new, sophisticated code - but just a little change in existing code
> instead.

If this is your belief then do it yourself.  Your response to my question
above already indicates that what you initially thought was very simple had
at least one complication you did not understand.  There are likely others.

>
> > I'm not sure whether that means that if you supplied such a patch it
> > would be rejected.  Since their is a correct way to solve the
> > duplicates scenario (see below) without savepoints you may wish to
> > supply another example if you want to try and convince people.
>
> I'm afraid, I don't understand. How many examples can I supply for the
> problem, like:
> 1. I need only to avoid duplicated rows (the ones with duplicated primary
> key).
> 2. I don't need any "stronger" reaction for such error - since I know,
there
> won't be any troubles.
> 3. Since server controls primary key, all I need is to make it a bit less
> "nervous" about such attempt to insert dupe, only dropping this one (and
> not everything else together with single faulty entry)?
>
> Whatever will I supply, the scenario basically will be the same - I just
would to
> avoid such "throwing the baby out with the bathwater".
>
> > The true solution is to insert into a staging table that allows
> > duplicates (but ideally contains other constraints that you do care
> > about) and then de-dupe and insert into your final table.
>
> ...which means twice as work for the server (AT LEAST twice). When all I
> needed was just to skip dupes, not aborting the transaction. And it could
be
> perfectly done.
>
> > This may be an undesirable instance of the database not allowing you
> > to shoot yourself in the foot but as ignoring errors is bad practice
> > motivation to allow it is small.
>
> I'm NOT proposing to just ignore errors - read carefully. please - I'm
writing
> about letting user decide, what impact should an error have:
> - only dropping "faulty" query, or...
> - ...aborting the entire transaction (which is undesirable in many
situations)

Except for ETL aborting is almost universally the desired behavior.  I'll
grant you your arguments for why, during ETL, this feature could be useful
(as opposed to performing duplicate work), but I still think that a blind
"ignore duplicates and leave the first instance in place" ETL policy is a
bad one in almost all cases.

>
> > You should always be able to import the data without errors into an
> > explicitly defined table and then write queries to convert between the
> > input format and the final format - explicitly making no coding the
> > necessary translation decisions and procedures.  In that way you
> > always know that your import routine is always working as expected and
> > not guessing whether it was the known error condition or an unknown
> > condition the caused a record to go missing.
>
> Well, looking for solution yesterday I've found several messages of other
> people,
> who complained about the same: that simple duplicate record is
> aborting whole transaction. This makes me think, that indeed this is quite
> common case, when limiting the "server response" (and NOT "ignoring the
> error" completely, which could mean insertion of the duplicated row),
could
> be very useful. There are situations, when all we need is just avoid dupes
-
> that's why we have primary keys - and "too much protection" makes the
> work unnecessarily difficult.
> --
> regards,
> Zbigniew

It is easy to complain but apparently no one feels strongly enough to either
code a solution themselves or sponsor someone else to do so.  As I have not
seen any core coders respond I cannot be certain whether there are
underlying technical issues preventing this but there is at the least a
resource allocation concern since neither code donors nor those sponsored by
clients have made the time to implement this "simple feature".  It may be
more productive, not being a core coder yourself, to simply ask why such a
feature has not been implemented given the apparent demand instead of
asserting (from ignorance) that such an implementation should be very simple
to accomplish.  The later approach (as well as your response to me -
personally) is much more confrontational and contrary (the direct reply at
least) to the posted etiquette for this community.

David J.

Re: Problem with aborting entire transactions on error

От

Zbigniew

Дата:

10 декабря 2012 г., 17:52:48

2012/12/10, David Johnston <polobo@yahoo.com>:

> Please reply-to the list, not just myself.

Sorry.

> If it requires coding something to provide the user the desired flexibility
> then whether or not such flexibility is wise or unwise is going to go into
> said decision.

So?

> Also, since you are begging others to solve your own
> problems

Now, not just a problem of mine - but of many others as well, as I
wrote. And I was "begging" already: "please, read carefully" (I meant:
with understanding).

> you are in many ways behaving as a child.

When someone doesn't design his/her clothes on his own, doesn't sew
boots, bake bread etc. - "behaves like a child"?

>  Especially with an Open
> Source project like PostgreSQL adults are welcomed and encouraged to solve
> their own problems by altering the source code and, in the spirit of
> community, contributing it back to the project (contribution also means you
> do not have to maintain your own custom version of the software).

If I knew the "innards" of Postgres as good, as its present developers
- maybe I could alter the code. But since I don't know them - I'm
perfectly sure, that the ones, who created the code, are able to
introduce such improvement during about 1/20 of the time, I had to
spend trying to do it by myself.

> Please do not take this as a personal affront;

Yes, I took this as personal affront, because IT IS a personal
affront. Do you really think, that your "please, do not take" is
changing this?

> It is easy to complain but apparently no one feels strongly enough to
> either code a solution themselves or sponsor someone else to do so.

From what I see, the development is going on - then my conclusion is,
that there are people "feeling strong enough" - and therefore I wanted
to let them know, where they got wrong.

> As I have not
> seen any core coders respond I cannot be certain whether there are
> underlying technical issues preventing this but there is at the least a
> resource allocation concern since neither code donors nor those sponsored
> by clients have made the time to implement this "simple feature".  It may be
> more productive, not being a core coder yourself, to simply ask why such a
> feature has not been implemented given the apparent demand instead of
> asserting (from ignorance) that such an implementation should be very
> simple
> to accomplish.  The later approach (as well as your response to me -
> personally) is much more confrontational and contrary (the direct reply at
> least) to the posted etiquette for this community.

No idea, what made you so upset with this "direct response" - just
clicked "Reply", and forgot to check the recipient's address.
Actually, the sender address should be pgsql-general@postgresql.org
already.

I've got a feeling, that all you have to say, is: "if this is the way
it is, it means, that this is good, and shouldn't be changed". You are
unable to explain, why - just the "common belief" etc. is your
rationale (while it's not that "common" at all, as I wrote). You have
no arguments against, but: "code it by yourself", "it must be perfect,
when it works so", and so on.

According to you the development can be stopped at this time, since
everything is perfect.
--
regards,
Zbigniew

Re: Problem with aborting entire transactions on error

От

Thomas Kellerer

Дата:

10 декабря 2012 г., 18:08:10

Zbigniew, 10.12.2012 04:20:
> Yes, I read about using "savepoints" - but I think we agree,
>it's just cumbersome workaround - and not real solution,

I disagree.

It might be a bit cumbersome, but it *is* a proper solution to the problem - not a workaround.

Thomas

Re: Problem with aborting entire transactions on error

От

Chris Angelico

Дата:

10 декабря 2012 г., 18:47:32

On Tue, Dec 11, 2012 at 1:15 AM, David Johnston <polobo@yahoo.com> wrote:
>> -----Original Message-----
>> From: Zbigniew [mailto:zbigniew2011@gmail.com]
>> Sent: Monday, December 10, 2012 6:26 AM
>> To: David Johnston
>> Subject: Re: [GENERAL] Problem with aborting entire transactions on error
>>
>> No idea, why common belief is "allowing the user to decide is unwise".
>> Is the "common assumption", that the users of Postgres are children
>> 4-5 years old?
>
> If it requires coding something to provide the user the desired flexibility
> then whether or not such flexibility is wise or unwise is going to go into
> said decision.

There's a fundamental difference here between PostgreSQL and, for
example, MySQL. With PG, you get a well-built database and your
application is permitted to access it in accordance with the database
admin's rules. With MySQL, your application stores data. There's a
vastly different focus there, and that's why you feel that PostgreSQL
is treating you like a child. The fact is that your application is NOT
in command here; the database is. You cannot violate the database's
rules (eg constraints, permissions lists, etc) in the application.
That means that you can be confident that those constraints will
always be followed!

The other part of the problem is that you're using transactions for
something other than transactional integrity. You're batching up
transactions in order to improve performance. That's certainly
possible, but you're using the database "oddly", so you can't expect
it to behave according to your modified set of rules. That's why the
transaction functions like a transaction, not like a commit-point.

I second the recommendation to look into savepoints. Chances are
that's all you need to do to make this work, but I wouldn't gamble my
job/reputation on it being perfect; so... have backups. Make sure you
can start the whole job over if need be.

ChrisA

Re: Problem with aborting entire transactions on error

От

Adrian Klaver

Дата:

10 декабря 2012 г., 18:47:56

On 12/10/2012 06:52 AM, Zbigniew wrote:
> 2012/12/10, David Johnston <polobo@yahoo.com>:
>

>
> I've got a feeling, that all you have to say, is: "if this is the way
> it is, it means, that this is good, and shouldn't be changed". You are
> unable to explain, why - just the "common belief" etc. is your
> rationale (while it's not that "common" at all, as I wrote). You have
> no arguments against, but: "code it by yourself", "it must be perfect,
> when it works so", and so on.
>
> According to you the development can be stopped at this time, since
> everything is perfect.

Nothing is ever perfect when it comes to software. An error is an error,
sooner or later you will have to deal with it. What you are facing is a
project decision to have the end user deal with the problem sooner
rather than later, one that I agree with by the way. As to why that is
so, experience has shown delaying the solution increases the difficulty
of the fix. I have been reminded of that lately while working on some
data stored in another Open Source database that allows one to run with
the safeties off the foot gun. The original coder went the expedient
route and now that data is a mess. Some of it can be cleaned up, but an
unknown amount is lost or defective. At this point in time there is
insufficient information left to decide on the validity of portions of
the data. The 'common belief' then is that it is better to make the end
user aware of issues from the start and force a solution, for the sake
of data integrity. If you follow the list regularly you will find that
to be the case.

--
Adrian Klaver
adrian.klaver@gmail.com

Re: Problem with aborting entire transactions on error

От

"David Johnston"

Дата:

10 декабря 2012 г., 18:50:03

> -----Original Message-----
> From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-
> owner@postgresql.org] On Behalf Of Zbigniew
> Sent: Monday, December 10, 2012 9:53 AM
> To: pgsql-general@postgresql.org
> Subject: Re: [GENERAL] Problem with aborting entire transactions on error
>
> 2012/12/10, David Johnston <polobo@yahoo.com>:
>
> > Please reply-to the list, not just myself.
>
> Sorry.
>
> > If it requires coding something to provide the user the desired
> > flexibility then whether or not such flexibility is wise or unwise is
> > going to go into said decision.
>
> So?
>
> > Also, since you are begging others to solve your own problems
>
> Now, not just a problem of mine - but of many others as well, as I wrote.
And
> I was "begging" already: "please, read carefully" (I meant:
> with understanding).
>
> > you are in many ways behaving as a child.
>
> When someone doesn't design his/her clothes on his own, doesn't sew
> boots, bake bread etc. - "behaves like a child"?

You are taking the analogy to an absurd extreme.  That said, you are not
generally asking Nike or the local baker to customize their recipes and then
give you the end result for free.

>
> >  Especially with an Open
> > Source project like PostgreSQL adults are welcomed and encouraged to
> > solve their own problems by altering the source code and, in the
> > spirit of community, contributing it back to the project (contribution
> > also means you do not have to maintain your own custom version of the
> software).
>
> If I knew the "innards" of Postgres as good, as its present developers
> - maybe I could alter the code. But since I don't know them - I'm
perfectly
> sure, that the ones, who created the code, are able to introduce such
> improvement during about 1/20 of the time, I had to spend trying to do it
by
> myself.
>
> > Please do not take this as a personal affront;
>
> Yes, I took this as personal affront, because IT IS a personal affront. Do
you
> really think, that your "please, do not take" is changing this?

I can hope.  Given that e-mail is a text-only medium it makes since to at
least attempt to put some comments into a proper context explicitly since
body language and tone-of-voice cannot be used to convey such.

>
> > It is easy to complain but apparently no one feels strongly enough to
> > either code a solution themselves or sponsor someone else to do so.
>
> From what I see, the development is going on - then my conclusion is, that
> there are people "feeling strong enough" - and therefore I wanted to let
> them know, where they got wrong.

You generalize where I am specifically referring to this explicit feature.

>
> > As I have not
> > seen any core coders respond I cannot be certain whether there are
> > underlying technical issues preventing this but there is at the least
> > a resource allocation concern since neither code donors nor those
> > sponsored by clients have made the time to implement this "simple
> > feature".  It may be more productive, not being a core coder yourself,
> > to simply ask why such a feature has not been implemented given the
> > apparent demand instead of asserting (from ignorance) that such an
> > implementation should be very simple to accomplish.  The later
> > approach (as well as your response to me -
> > personally) is much more confrontational and contrary (the direct
> > reply at
> > least) to the posted etiquette for this community.
>
> No idea, what made you so upset with this "direct response" - just clicked
> "Reply", and forgot to check the recipient's address.
> Actually, the sender address should be pgsql-general@postgresql.org
> already.
>
> I've got a feeling, that all you have to say, is: "if this is the way it
is, it means,
> that this is good, and shouldn't be changed". You are unable to explain,
why -
> just the "common belief" etc. is your rationale (while it's not that
"common"
> at all, as I wrote). You have no arguments against, but: "code it by
yourself",
> "it must be perfect, when it works so", and so on.
>
> According to you the development can be stopped at this time, since
> everything is perfect.

No.  But everything is the way it is for some reason.  "This would be a
useful feature" and "why do not things work this way" both are excellent
approaches to instigating change and obtaining the understanding as to the
"why" of how things are.  "This is a bug and the solution is simple - though
I have never actually coded the PostgreSQL" is a less effective way of doing
so.

I've already agreed that your idea has merit (maybe not much, but some).
However, since it is not that way currently (and the problem is not a new
one) there likely exists some underlying reason said feature has not been
implemented.  I offer, via induction and some experience, that said feature
is not in high demand, not as simple as you claim, and in many ways is
undesirable - thus no one has taken the time to implement it.  You either
need a more convincing (and novel) argument or you need to contact someone
like PGExperts, 2ndQuadrant, EnterpriseDB (for example and none of whom I am
affiliated with) and pay them to implement the feature for you (likely much
less expensive than learning C and the codebase and doing it yourself).

Given that we are both ignorant on the technical aspects of said feature,
and our conversation is getting too pointed, I suggest that you simply wait
for a response from someone more qualified and in the meantime (since even
if someone coded this feature today it would not be in production release
until 9.3) code your ETL according to the best-practice I suggested earlier.
If the staging table is too large a burden AND no one volunteers to
implement your feature your options have been put before you.

David J.

Re: Problem with aborting entire transactions on error

От

Thomas Kellerer

Дата:

10 декабря 2012 г., 19:02:29

Chris Angelico, 10.12.2012 16:47:
> The other part of the problem is that you're using transactions for
> something other than transactional integrity. You're batching up
> transactions in order to improve performance. That's certainly
> possible, but you're using the database "oddly", so you can't expect
> it to behave according to your modified set of rules. That's why the
> transaction functions like a transaction, not like a commit-point.


Just as a side-note: Oracle also allows you to commit a "transaction" even if some of the statements failed
e.g. the commit of 20 inserts would only "commit" the ones that were successful.

Regards
Thomas

Re: Problem with aborting entire transactions on error

От

Martin French

Дата:

10 декабря 2012 г., 19:04:40

> if { no error during query } { > do it > } else { > withdraw this one > if { ROLLBACK_ON_ERROR } { > rollback entire transaction > } > }
I fear that this is no-where near as simple as it's been taken for, and without looking at the code, I would imagine that this would touch so many different areas, that a simple implementation is just not possible.

Ask yourself; How do other DBMS systems handle this?

MySQL has "INSERT IGNORE"
MSSQL has PRIMARY KEY WITH IGNORE_DUP_KEY
Oracle has a couple of ways of doing this, one of which logs the data (using a DBMS package) the other is with MERGE.

So, maybe you need to have a look at a postgres sepcific way of doing this, seeing as other DBMS's can't seem to agree? Maybe shorter save points so you lose only partial data?

Also, you might want to read some of the discussions here, as in my opinion, what you're looking for is best solved with MERGE.
http://wiki.postgresql.org/wiki/SQL_MERGE

Cheers
=============================================

Romax Technology Limited
Rutherford House
Nottingham Science & Technology Park
Nottingham,
NG7 2PZ
England

Telephone numbers:
+44 (0)115 951 88 00 (main)

For other office locations see:
http://www.romaxtech.com/Contact
=================================
===============
E-mail: info@romaxtech.com
Website: www.romaxtech.com
=================================

================
Confidentiality Statement
This transmission is for the addressee only and contains information that is confidential and privileged.
Unless you are the named addressee, or authorised to receive it on behalf of the addressee
you may not copy or use it, or disclose it to anyone else.
If you have received this transmission in error please delete from your system and contact the sender. Thank you for your cooperation.
=================================================

Re: Problem with aborting entire transactions on error

От

Scott Marlowe

Дата:

10 декабря 2012 г., 20:45:18

On Sun, Dec 9, 2012 at 8:20 PM, Zbigniew <zbigniew2011@gmail.com> wrote:
> Hello,
>
> As I read while googling the web, many people complained about this
> before. Couldn't it be made optional (can be even with "default ON")?
> I understand, that there are situations, when it is a must - for
> example, when the rest of queries rely on the result of first ones -
> but there are numerous situations, when just skipping a faulty query
> is all we need.
>
> A simple - but very common - example: I wanted to perform really large
> number of inserts - using transaction, to make it faster - while being
> sure the duplicate entries will be skipped. Of course, this job will
> be done best by server itself, which is keeping an eye on "primary
> key" of the table. Unfortunately: not during a transaction! Any "dupe"
> will trash thousands other (proper) entries immediately.
>
> Why is this? My guess is, there is kind of logic in the code, like this:
>
> if { no error during query } {
>   do it
> } else {
>  withdraw this one
>  rollback entire transaction
> }
>
> Therefore my request - and, as I saw, of many others - would be just
> to introduce a little change:
>
> if { no error during query } {
>   do it
> } else {
>  withdraw this one
>  if { ROLLBACK_ON_ERROR } {
>    rollback entire transaction
>   }
> }
>
> (if there's no ROLLBACK_ON_ERROR - it should carry on with the
> remaining queries)
>
> Is it really so problematic to introduce such code change, allowing
> the users to control this behaviour? Yes, I read about using
> "savepoints" - but I think we agree, it's just cumbersome workaround -
> and not real solution, like my proposal. All we need is either a
> variable to set, or a command, that will allow to modify the present
> functionality in the way described above.

Databases aren't as simple as you imagine.  What you're basically
asking for from the pg engine is for it to enclose every insert into a
subtransaction (i.e. set a savepoint) to check for an error.  The
overhead cost of doing this in pgsql is not cheap.  Now what I would
use for this would be some tool written to interface with pgsql and do
the job for you, like pgloader.  It gives you the simple interface to
do what you're asking.

The idea that everything like this belongs in the database,
particularly coded by the core developers isn't how things generally
get done in pgsql.  We've got a small sharp team of developers working
on BIG things, like covering indexes, HOT updates, query planner
optimizations and so on.  Fixing this problem is core probably doesn't
tickle anyone's itch, and there are so many ways to work around it in
pg that would not work as well in other dbs, like loading to a temp
table and updating in one big query.  That method might run an oracle
db out of rollback space etc, but on pg it's the way it's done.

The thing that often happens is that people learn to do things a
certain way on another db and then think that that experience should
translate to postgresql straight across.  It often does not, because
postgresql is so different in many ways.

Keep in mind with pg transactions you can put almost anything into a
transaction, and the only real limit to the size of a transaction you
can run in it is the size of your hard drives.  This alone lets you
approach problems from a very different perspective than on most other
dbs.

Lastly it's a free (as in beer, as in freedom) database.  People work
on it as much out of love as for money, and if you want to get
traction for a someone else to volunteer their time to make a change,
YOU have to make the case. I'm not a pg hacker.  But I've made a case
several times and gotten someone to make a change a few times for me.
If you think you have a case here you'll have to convince a lot of
people who don't think you do.

Re: Problem with aborting entire transactions on error

От

Zbigniew

Дата:

10 декабря 2012 г., 21:29:56

2012/12/10, Scott Marlowe <scott.marlowe@gmail.com>:

> Databases aren't as simple as you imagine.  What you're basically
> asking for from the pg engine is for it to enclose every insert into a
> subtransaction (i.e. set a savepoint) to check for an error.

No, I'm not.

It's able (I mean Postgres) to detect an error, and it's able to react
on error. "What I'm basically asking for" is an option to change its
reaction; that's all. Just to change a bit the thing already exisiting
- not to add new one.

> The overhead cost of doing this in pgsql is not cheap.  Now what I would
> use for this would be some tool written to interface with pgsql and do
> the job for you, like pgloader.  It gives you the simple interface to
> do what you're asking.

You see? The people had to create sophisticated loaders, just to work
around the problem.

> The idea that everything like this belongs in the database,

But why "everything"? My question was about one specific thing, and I
quite clearly (I think so...) explained, why. Not to "ignore errors",
not to "add savepoints" - just to proceed with this, which it's
already doing (skipping faulty query), but NOT aborting whole
transaction - as an optional behaviour.

> particularly coded by the core developers isn't how things generally
> get done in pgsql.  We've got a small sharp team of developers working
> on BIG things, like covering indexes, HOT updates, query planner
> optimizations and so on.  Fixing this problem is core probably doesn't
> tickle anyone's itch, and there are so many ways to work around it in
> pg that would not work as well in other dbs, like loading to a temp
> table and updating in one big query.  That method might run an oracle
> db out of rollback space etc, but on pg it's the way it's done.

Just try to make a Google search on "postgresql abort transaction" -
you'll see, how much trouble this solution makes to the users. Pay
attention, that I didn't even mind "...on duplicate". How much time
the people are spending 1. To find out, what actually is going on
("such thing was working on X and Y"), 2. To look for solution
(non-existing), and then... 3. ...to work around the problem with
"savepoints" or other way.

All this can be spared to users just by adding an OPTION. This option
can be "off" by default. There can be even a description in manual:
"We discourage...", "...use it only, when you know, what are you
doing...", "...you have been warned". But I bet, many users would
appreciate this. Besides: as you noticed (and the others too), several
other servers allows this. Then maybe really it's not that evil thing,
as you think?

> The thing that often happens is that people learn to do things a
> certain way on another db and then think that that experience should
> translate to postgresql straight across.  It often does not, because
> postgresql is so different in many ways.
>
> Keep in mind with pg transactions you can put almost anything into a
> transaction, and the only real limit to the size of a transaction you
> can run in it is the size of your hard drives.  This alone lets you
> approach problems from a very different perspective than on most other
> dbs.

Yes, and exactly because of this I'm pretty sure, that you understand,
why I don't want to trash such long transaction. Why I would to commit
it - just skipping the dupes.

> Lastly it's a free (as in beer, as in freedom) database.  People work
> on it as much out of love as for money, and if you want to get
> traction for a someone else to volunteer their time to make a change,
> YOU have to make the case. I'm not a pg hacker.  But I've made a case
> several times and gotten someone to make a change a few times for me.
> If you think you have a case here you'll have to convince a lot of
> people who don't think you do.

Yes, I agree: it costs implementation time. But pay attention: few
afternoons (maybe I'm wrong, but I doubt, this could take more to
experienced PG-developers) of your team - and on the other hand, the
time wasted by thousands of people, who are looking how to do the
things that were so simple while using some other server (even Oracle,
as has been mentioned). And in the future THERE STILL WILL BE THE
SAME. And it can be spared to all this people just by adding an
option.

No idea, is it simple or very complicated. But if it is simple - why not?

Maybe someone of the devs team could answer this doubt: is the
proposed change simple to implement?
--
regards,
Zbigniew

Re: Problem with aborting entire transactions on error

От

Scott Marlowe

Дата:

10 декабря 2012 г., 21:35:40

On Mon, Dec 10, 2012 at 11:29 AM, Zbigniew <zbigniew2011@gmail.com> wrote:
> 2012/12/10, Scott Marlowe <scott.marlowe@gmail.com>:
>
>> Databases aren't as simple as you imagine.  What you're basically
>> asking for from the pg engine is for it to enclose every insert into a
>> subtransaction (i.e. set a savepoint) to check for an error.
>
> No, I'm not.

Whether you know it or not, that IS what you are asking for.  It's how
pg works.  I'm done.

Re: Problem with aborting entire transactions on error

От

Виктор Егоров

Дата:

10 декабря 2012 г., 22:25:22

2012/12/10 Thomas Kellerer <spam_eater@gmx.net>
>
> Zbigniew, 10.12.2012 04:20:
>>
>> Yes, I read about using "savepoints" - but I think we agree,
>> it's just cumbersome workaround - and not real solution,
>
>
> It might be a bit cumbersome, but it *is* a proper solution to the problem - not a workaround.

Writing scripts for ORACLE's sqlplus, I have the option to write
"whenever sqlerror exit sql.sqlcode;" and get the behavior similar to
setting psql's ON_ERROR_STOP variable.

On the other hand, it is possible to write "whenever sqlerror
continue;" and this will make ORACLE to process all the statements
inide the script, ignoring all errors. This is a general feature,
available not only for sqlplus scripts — as mentioned, a series of 100
INSERTs can have 5 failing ones and commit statement will result in 95
new records..

So, in order to get similar behavior using current PostgreSQL
features, one will have to:
1) use savepoints
2) use single-statement transactions;
3) (crazy stuff) use triggers on a fake relation to “test” inserts
into the real one and ignore filing ones;
4) do pre-processing in some external tool, like script or pgloader;
5) more options?

All these options will give significant performance and/or scripting
penalty, while, say ORACLE, does it “for free” — I don't know the
mechanics behind this feature, so looks like “free as in free speech”
to me.

The original OP's post is a feature request, IMHO.

I'm not sure whether it is a good thing to have in general, but I
really had to use it my scripts several times.


--
Victor Y. Yegorov

Re: Problem with aborting entire transactions on error

От

"Kevin Grittner"

Дата:

11 декабря 2012 г., 00:07:22

Zbigniew wrote:

> No idea, is it simple or very complicated. But if it is simple -
> why not?

When I first started using PostgreSQL I was used to a database
product which would roll back an entire statement on failure, but
leave the transaction open for another try. This is handy for
interactive work, which is why psql (an interactive tool) gives you
an easy way to do it, using subtransactions behind the scenes.
Doing it in the general case is risky partly because almost all
data definition language (DDL) statements are allowed in
transactions; so you could have a transaction which inserted all
data from one table into somewhere else and then dropped the source
table. If your insert fails, you probably don't want to drop the
table.

PostgreSQL behavior was a bit of a surprise to me at first, but it
wasn't a hard adjustment for me, and I never thought to suggest a
change because it just seemed safer this way. I've generally found
an intermediate table to be a good idea anyway.

> Maybe someone of the devs team could answer this doubt: is the
> proposed change simple to implement?

I now work on PostgreSQL internals as part of my job. To get a
handle on how much it would take to implement, you might want to
read these two README files which document the sub-systems most
directly involved.

http://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/access/transam/README;hb=HEAD
http://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/executor/README;hb=HEAD

For sample queries and diagrams showing how it works a little
more graphically, you might wnat to look at:

http://momjian.us/main/writings/pgsql/mvcc.pdf

The problem is that you might have affected a million rows in the
transaction before the failing statement, and then that statement
might affect another million before it failes. What would you do
with the xmin/xmax values in the affected tuples to get visibility
correct if you haven't executed that failing transaction within a
subtransaction? I don't see how you could do it without something
very like a subtransaction for each statement executed within the
transaction, and that carries significant overhead. Nobody wants to
slow down all their work to cater to this one special case.

Could an option like this be added to the server? Probably. Has
anyone worked on such a feature? Not that I know of. Would such a
feature be accepted if written? I think a better case would need to
be made for its value compared to the added code complexity and
maintenance burden than has been made so far. Since this is
primarily of interest for bulk loads and interactive work, and
there are already solutions for both of those, it would probably be
seen as not worth its weight.

It's only by careful gatekeeping that the source code has been kept
manageable. A quick count shows that there are 1,167,732 lines of
code, counting only *.c and *.h files. Every added line has a
maintenance cost.

I hope this helps.

-Kevin

Re: Problem with aborting entire transactions on error

От

Chris Angelico

Дата:

11 декабря 2012 г., 01:03:19

Caveat: I am not a PostgreSQL hacker, and have not looked into its
internals at all, though I've read a number of excellent articles and
blog posts on some of its features (TOAST, HOT updates, MVCC, etc).
I'm a programmer who has made use of PG from a number of languages,
and formed a strong opinion on the high quality and usability of
Pike's bindings.

On Tue, Dec 11, 2012 at 5:29 AM, Zbigniew <zbigniew2011@gmail.com> wrote:
> 2012/12/10, Scott Marlowe <scott.marlowe@gmail.com>:
>
>> Databases aren't as simple as you imagine.  What you're basically
>> asking for from the pg engine is for it to enclose every insert into a
>> subtransaction (i.e. set a savepoint) to check for an error.
>
> No, I'm not.
>
> It's able (I mean Postgres) to detect an error, and it's able to react
> on error. "What I'm basically asking for" is an option to change its
> reaction; that's all. Just to change a bit the thing already exisiting
> - not to add new one.

You're thinking that it's easy to just "back out" something that
failed. Perhaps this is true in the simple case where one statement
does one insert and that's all, but if this is to be a PostgreSQL
feature, it needs to handle the more complicated cases. In each of
these examples, what should be kept and what should be undone? (Let's
suppose that your language has a query() function that executes an SQL
query while handwaving questions of which database connection to use.)
Row ID 3 already exists, in each case.

query("insert into table (id) values (1),(2),(3)");
query("insert into table (id) select id from other_table where id in (1,2,3)");
query("insert into table (id) values (1); insert into table (id)
values (2); insert into table (id) values (3)");
query("create trigger after update on other_table for each row insert
into table values (new.id); update other_table set dummy=dummy+1 where
id in (1,2,3)");

I'm pretty sure the example trigger would need to be rewritten as a
procedure, but you get the idea. There's a LOT more complexity than
just "if Postgres detects an error, it should (optionally) just not do
that bit".

>> The overhead cost of doing this in pgsql is not cheap.  Now what I would
>> use for this would be some tool written to interface with pgsql and do
>> the job for you, like pgloader.  It gives you the simple interface to
>> do what you're asking.
>
> You see? The people had to create sophisticated loaders, just to work
> around the problem.

Any good project will work in layers. The core PostgreSQL engine
doesn't have support for the special backslash commands that function
at the command-line interface; they're implemented in the psql client.
Nor does PG need any code to handle the peculiarities of network
routing, because that's all managed by the TCP/IP sockets layer. Often
the best thing to do is to bolt something onto the outside, because
that adds zero complexity for everyone who isn't using this feature.

> All this can be spared to users just by adding an OPTION. This option
> can be "off" by default. There can be even a description in manual:
> "We discourage...", "...use it only, when you know, what are you
> doing...", "...you have been warned". But I bet, many users would
> appreciate this. Besides: as you noticed (and the others too), several
> other servers allows this. Then maybe really it's not that evil thing,
> as you think?

Are you a programmer? Are you aware how much complexity each option
adds? Every single combination must be tested and debugged. In this
instance, that means testing every part of Postgres before and after
several types of failure, to make sure everything works correctly in
both cases. That is not cheap. And then there's the user-facing
complexity (documenting the option, explaining when it's useful, etc),
and now everyone has to decide whether or not to use it. Also not
cheap.

> Yes, and exactly because of this I'm pretty sure, that you understand,
> why I don't want to trash such long transaction. Why I would to commit
> it - just skipping the dupes.

The normal way to use transactions is to guarantee atomicity (the
first letter from ACID). The classic example is a money transfer - you
deduct $100 from one account and add $100 to another. Naturally you
want this to be all-or-nothing - it's better to leave the money where
it is than to leave the account-holder (or the bank, if you do them in
the other order) short a hundred bucks.

What you're doing is fiddling with transactions as a means of
improving performance, and then discovering that they aren't exactly
what you want. Have you considered looking into some other methods of
improving performance? You may find that there's a completely
different way to achieve your goal.

> Maybe someone of the devs team could answer this doubt: is the
> proposed change simple to implement?

I'm not a PG dev, but I've fought the battle against complexity in
enough other situations that I know that it's much more usual to
underestimate than overestimate the cost.

ChrisA

Re: Problem with aborting entire transactions on error

От

Andrew Sullivan

Дата:

11 декабря 2012 г., 01:57:55

On Mon, Dec 10, 2012 at 06:29:49PM +0000, Zbigniew wrote:
>
> No, I'm not.
>
> It's able (I mean Postgres) to detect an error, and it's able to react
> on error. "What I'm basically asking for" is an option to change its
> reaction; that's all. Just to change a bit the thing already exisiting
> - not to add new one.

Yes, you are.  By the time Postgres has detected that there is this
thing to which it needs to react, you're already in the error state.
The only way around that is a subtransaction.

Moreover, what you're _really_ asking for is that Postgres do this
only in the right circumstances, and not when there's some other error
that isn't the error you were intending be automatically fixed.  This
is a special case of the Do What I Mean protocol, and lovely as it
would be if our programs all had artificial intelligence and could
read minds, I don't think such software is going to ship any time
soon.

Postgres is general-purpose software, so it has to be programmed for
the general purpose.  This does mean that some special cases that look
easy if you don't know anything about the underlying details require
more work on the part of the person using the general-purpose
software.  That's what optimizing for general cases buys you.

Best,

A

--
Andrew Sullivan
ajs@crankycanuck.ca

Re: Problem with aborting entire transactions on error

От

Andrew Sullivan

Дата:

11 декабря 2012 г., 02:02:30

On Mon, Dec 10, 2012 at 05:02:37PM +0100, Thomas Kellerer wrote:
> Just as a side-note: Oracle also allows you to commit a "transaction" even if some of the statements failed

True.  I always thought that was a bizarre bug.  Certainly, it would
be the day you did

BEGIN;
INSERT INTO sale $bunch of stuff;
INSERT INTO xact_log $stuff-with-money-deducted;
UPDATE account SET balance = balance-?COST WHERE customer_id = ?ID;
   ---ERROR here for no permission
COMMIT;

Or anyway, that's how I look at it.

A

--
Andrew Sullivan
ajs@crankycanuck.ca

Re: Problem with aborting entire transactions on error

От

Tom Lane

Дата:

11 декабря 2012 г., 04:53:35

"Kevin Grittner" <kgrittn@mail.com> writes:
> Zbigniew wrote:
>> No idea, is it simple or very complicated. But if it is simple -
>> why not?

> When I first started using PostgreSQL I was used to a database
> product which would roll back an entire statement on failure, but
> leave the transaction open for another try. This is handy for
> interactive work, which is why psql (an interactive tool) gives you
> an easy way to do it, using subtransactions behind the scenes.

FWIW, we've generally concluded that this sort of thing should be done
on the client side, ie, exactly like psql does it.  Fooling with
transactional behavior on the server side poses far too much risk of
breaking clients that aren't expecting the new behavior.  We learned an
object lesson on that point back in 7.3, when we attempted to implement
autocommit on/off behavior on server side, using a GUC setting to control
it.  (Autocommit isn't exactly what the OP is asking for, but it's close
enough that the lesson holds.)  Basically, turning the setting to the
non-default state broke just about every nontrivial client program.
The push-back was fierce enough that the setting went away again in 7.4,
and it won't be coming back as long as anybody who remembers that fiasco
is still around the project.  So now we have AUTOCOMMIT (and ON_ERROR_STOP
too) as psql behaviors, where they're available for interactive use but
won't break non-interactive program logic.

The complexity and performance costs that people have mentioned are other
good reasons not to change it; but even if the change were free on the
server side, history says it's not something we ought to mess with.

            regards, tom lane

Re: Problem with aborting entire transactions on error

От

Ondrej Ivanič

Дата:

11 декабря 2012 г., 05:02:45

Hi,

On 11 December 2012 06:25, Виктор Егоров <vyegorov@gmail.com> wrote:
> On the other hand, it is possible to write "whenever sqlerror
> continue;" and this will make ORACLE to process all the statements
> inide the script, ignoring all errors. This is a general feature,
> available not only for sqlplus scripts — as mentioned, a series of 100
> INSERTs can have 5 failing ones and commit statement will result in 95
> new records..

For example, nCluster (and Greenplum) has similar feature for COPY:
page 13 "Error Logging in COPY"
http://www.pgcon.org/2009/schedule/attachments/135_PGCon%202009%20-%20Aster%20v6.pdf.

Around the same time AsterData tried to push some COPY enhancements
(error logging and partitioning):
http://archives.postgresql.org/pgsql-hackers/2009-10/msg00247.php

--
Ondrej Ivanic
(http://www.linkedin.com/in/ondrejivanic)

Re: Problem with aborting entire transactions on error

От

Zbigniew

Дата:

11 декабря 2012 г., 14:26:48

2012/12/10, Kevin Grittner <kgrittn@mail.com>:

> Could an option like this be added to the server? Probably. Has
> anyone worked on such a feature? Not that I know of. Would such a
> feature be accepted if written? I think a better case would need to
> be made for its value compared to the added code complexity and
> maintenance burden than has been made so far. Since this is
> primarily of interest for bulk loads and interactive work, and
> there are already solutions for both of those, it would probably be
> seen as not worth its weight.

Thanks for detailed explanation. From what I see, it's rather not
going to happen.

> It's only by careful gatekeeping that the source code has been kept
> manageable. A quick count shows that there are 1,167,732 lines of
> code, counting only *.c and *.h files. Every added line has a
> maintenance cost.

I agree - but, on the other hand, every added line of code in the
software of the users (programmers using PG) has its maintenance cost
too. And it has to be multiplied by the number of the users, then
again: by the number of the users of their software...
--
regards,
Zbigniew

Re: Problem with aborting entire transactions on error

От

Zbigniew

Дата:

11 декабря 2012 г., 14:28:36

2012/12/11, Tom Lane <tgl@sss.pgh.pa.us>:

> The complexity and performance costs that people have mentioned are other
> good reasons not to change it; but even if the change were free on the
> server side, history says it's not something we ought to mess with.

Again: my suggestion was to add an option... with default value "keep
present functionality".
--
regards,
Zbigniew

Re: Problem with aborting entire transactions on error

От

Zbigniew

Дата:

11 декабря 2012 г., 14:36:37

2012/12/10, Chris Angelico <rosuav@gmail.com>:

> Are you a programmer? Are you aware how much complexity each option
> adds? Every single combination must be tested and debugged. In this
> instance, that means testing every part of Postgres before and after
> several types of failure, to make sure everything works correctly in
> both cases. That is not cheap. And then there's the user-facing
> complexity (documenting the option, explaining when it's useful, etc),
> and now everyone has to decide whether or not to use it. Also not
> cheap.

It's not that bad, that really "each option adds much complexity", and
has to be tested then in every single combination of
parameters/settings etc. Just one example: introducing an option "save
preferences" doesn't require this.

> I'm not a PG dev, but I've fought the battle against complexity in
> enough other situations that I know that it's much more usual to
> underestimate than overestimate the cost.

There are always TWO sides (at least two): creators/designers - and
the users. Considering how much complexity some kind of modification
adds to your - programmer's - code, and how it'll make your life more
difficult, at the same time try to consider, how much relief could it
mean to many of the users of your software.
--
regards,
Zbigniew

Re: Problem with aborting entire transactions on error

От

Abel Abraham Camarillo Ojeda

Дата:

11 декабря 2012 г., 14:42:56

I generally do:

DO $$
BEGIN
        INSERT INTO ...;
EXCEPTION
        WHEN UNIQUE_VIOLATION or EXCLUSION_VIOLATION THEN
                NULL; /* ignore this record */
END;
$$;

On Sun, Dec 9, 2012 at 9:20 PM, Zbigniew <zbigniew2011@gmail.com> wrote:

Hello,

As I read while googling the web, many people complained about this
before. Couldn't it be made optional (can be even with "default ON")?
I understand, that there are situations, when it is a must - for
example, when the rest of queries rely on the result of first ones -
but there are numerous situations, when just skipping a faulty query
is all we need.

A simple - but very common - example: I wanted to perform really large
number of inserts - using transaction, to make it faster - while being
sure the duplicate entries will be skipped. Of course, this job will
be done best by server itself, which is keeping an eye on "primary
key" of the table. Unfortunately: not during a transaction! Any "dupe"
will trash thousands other (proper) entries immediately.

Why is this? My guess is, there is kind of logic in the code, like this:

if { no error during query } {
do it
} else {
withdraw this one
rollback entire transaction
}

Therefore my request - and, as I saw, of many others - would be just
to introduce a little change:

if { no error during query } {
do it
} else {
withdraw this one
if { ROLLBACK_ON_ERROR } {
rollback entire transaction
}
}

(if there's no ROLLBACK_ON_ERROR - it should carry on with the
remaining queries)

Is it really so problematic to introduce such code change, allowing
the users to control this behaviour? Yes, I read about using
"savepoints" - but I think we agree, it's just cumbersome workaround -
and not real solution, like my proposal. All we need is either a
variable to set, or a command, that will allow to modify the present
functionality in the way described above.
--
regards,
Zbigniew

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: Problem with aborting entire transactions on error

От

Abel Abraham Camarillo Ojeda

Дата:

11 декабря 2012 г., 14:44:15

Obviously, it's not fast...

On Tue, Dec 11, 2012 at 5:42 AM, Abel Abraham Camarillo Ojeda <acamari@the00z.org> wrote:

I generally do:

DO $$
BEGIN
        INSERT INTO ...;
EXCEPTION
        WHEN UNIQUE_VIOLATION or EXCLUSION_VIOLATION THEN
                NULL; /* ignore this record */
END;
$$;

On Sun, Dec 9, 2012 at 9:20 PM, Zbigniew <zbigniew2011@gmail.com> wrote:
Hello,

As I read while googling the web, many people complained about this
before. Couldn't it be made optional (can be even with "default ON")?
I understand, that there are situations, when it is a must - for
example, when the rest of queries rely on the result of first ones -
but there are numerous situations, when just skipping a faulty query
is all we need.

A simple - but very common - example: I wanted to perform really large
number of inserts - using transaction, to make it faster - while being
sure the duplicate entries will be skipped. Of course, this job will
be done best by server itself, which is keeping an eye on "primary
key" of the table. Unfortunately: not during a transaction! Any "dupe"
will trash thousands other (proper) entries immediately.

Why is this? My guess is, there is kind of logic in the code, like this:

if { no error during query } {
do it
} else {
withdraw this one
rollback entire transaction
}

Therefore my request - and, as I saw, of many others - would be just
to introduce a little change:

if { no error during query } {
do it
} else {
withdraw this one
if { ROLLBACK_ON_ERROR } {
rollback entire transaction
}
}

(if there's no ROLLBACK_ON_ERROR - it should carry on with the
remaining queries)

Is it really so problematic to introduce such code change, allowing
the users to control this behaviour? Yes, I read about using
"savepoints" - but I think we agree, it's just cumbersome workaround -
and not real solution, like my proposal. All we need is either a
variable to set, or a command, that will allow to modify the present
functionality in the way described above.
--
regards,
Zbigniew

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: Problem with aborting entire transactions on error

От

Chris Angelico

Дата:

11 декабря 2012 г., 14:47:56

On Tue, Dec 11, 2012 at 10:36 PM, Zbigniew <zbigniew2011@gmail.com> wrote:
> There are always TWO sides (at least two): creators/designers - and
> the users. Considering how much complexity some kind of modification
> adds to your - programmer's - code, and how it'll make your life more
> difficult, at the same time try to consider, how much relief could it
> mean to many of the users of your software.

I have considered. It will be beneficial to a small group of users,
including yourself; I predict it will have significant costs
(performance, documentation complexity, etc) for all users. (Of
course, my prediction is decidedly non-authoritative. But I had to
deal with the docs complexity of issues like the autocommit that Tom
Lane mentioned; it's definitely a cost.)

ChrisA

Re: Problem with aborting entire transactions on error

От

Zbigniew

Дата:

11 декабря 2012 г., 14:48:34

2012/12/11, Abel Abraham Camarillo Ojeda <acamari@the00z.org>:

> Obviously, it's not fast...

Well, the problem was exactly speed-related. If not this - I could
simply dispose of the transaction...
--
regards,
Zbigniew

Re: Problem with aborting entire transactions on error

От

Abel Abraham Camarillo Ojeda

Дата:

11 декабря 2012 г., 14:50:48

It's faster than restarting the big transaction, maybe...

On Tue, Dec 11, 2012 at 5:48 AM, Zbigniew <zbigniew2011@gmail.com> wrote:

2012/12/11, Abel Abraham Camarillo Ojeda <acamari@the00z.org>:

> Obviously, it's not fast...

Well, the problem was exactly speed-related. If not this - I could
simply dispose of the transaction...
--
regards,
Zbigniew

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: Problem with aborting entire transactions on error

От

Zbigniew

Дата:

11 декабря 2012 г., 14:57:12

2012/12/11, Abel Abraham Camarillo Ojeda <acamari@the00z.org>:

> It's faster than restarting the big transaction, maybe...

Of course, but it's still _much_ slower, than conducting such
transaction without any abort.

Actually, I had to implement savepoint-based solution anyway - but at
a cost of making the relevant part of code much more complex. Pay
attention, that actually every developer in this thread wrote, that
adding such complexity isn't a particularly good idea.
--
regards,
Zbigniew

Re: Problem with aborting entire transactions on error

От

Zbigniew

Дата:

11 декабря 2012 г., 15:05:06

2012/12/11, Chris Angelico <rosuav@gmail.com>:

> I have considered. It will be beneficial to a small group of users,
> including yourself

No, not that small whatsoever; it's easy to find out using Google, how
many people were looking for some help because of this. You can easily
multiply this group at least by a factor of 100 - to add those, who
didn't post their askings anywhere (or whose postings have been
deleted meanwhile).

The fact, that many posts are several years old, is suggesting to me,
that there will be still such numerous questions in the future about
this.
--
regards,
Zbigniew

Re: Problem with aborting entire transactions on error

От

Craig Ringer

Дата:

11 декабря 2012 г., 16:30:40

On 12/11/2012 08:05 PM, Zbigniew wrote:
> 2012/12/11, Chris Angelico <rosuav@gmail.com>:
>
>> I have considered. It will be beneficial to a small group of users,
>> including yourself
> No, not that small whatsoever; it's easy to find out using Google, how
> many people were looking for some help because of this. You can easily
> multiply this group at least by a factor of 100 - to add those, who
> didn't post their askings anywhere (or whose postings have been
> deleted meanwhile).
>
> The fact, that many posts are several years old, is suggesting to me,
> that there will be still such numerous questions in the future about
> this.
Yes, but huge groups of these users' needs can be satisfied with changes
to a relatively small number of middleware tools (Rails ActiveRecord,
etc) to teach it about subtransactions and how to use them.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Problem with aborting entire transactions on error

От

David Johnston

Дата:

11 декабря 2012 г., 17:15:14

On Dec 11, 2012, at 6:28, Zbigniew <zbigniew2011@gmail.com> wrote:

> 2012/12/11, Tom Lane <tgl@sss.pgh.pa.us>:
>
>> The complexity and performance costs that people have mentioned are other
>> good reasons not to change it; but even if the change were free on the
>> server side, history says it's not something we ought to mess with.
>
> Again: my suggestion was to add an option... with default value "keep
> present functionality".
>

We heard you the first three times ...

All of the responses given so far, including the paragraph you are quoting, are given with that in mind.  You have made
yourpoint clearly and have been given direct responses.   

There is some merit but not enough to convince anyone to change their general belief so now it comes down to providing
aspecific implementation that can be approved or rejected as belonging in core or go with the various alternatives that
havebeen presented. 

David J.

Re: Problem with aborting entire transactions on error

От

David Johnston

Дата:

11 декабря 2012 г., 17:24:53

On Dec 11, 2012, at 6:26, Zbigniew <zbigniew2011@gmail.com> wrote:

> 2012/12/10, Kevin Grittner <kgrittn@mail.com>:
>
>> Could an option like this be added to the server? Probably. Has
>> anyone worked on such a feature? Not that I know of. Would such a
>> feature be accepted if written? I think a better case would need to
>> be made for its value compared to the added code complexity and
>> maintenance burden than has been made so far. Since this is
>> primarily of interest for bulk loads and interactive work, and
>> there are already solutions for both of those, it would probably be
>> seen as not worth its weight.
>
> Thanks for detailed explanation. From what I see, it's rather not
> going to happen.
>
>> It's only by careful gatekeeping that the source code has been kept
>> manageable. A quick count shows that there are 1,167,732 lines of
>> code, counting only *.c and *.h files. Every added line has a
>> maintenance cost.
>
> I agree - but, on the other hand, every added line of code in the
> software of the users (programmers using PG) has its maintenance cost
> too.
> And it has to be multiplied by the number of the users, then
> again: by the number of the users of their software...
> --
>

PostgreSQL itself has a very large installed extended user base and of that group the need for your feature is likely
notthat large.  The sub group of those that can use such a generic feature instead of coding something custom anyway is
likelythe larger portion as well. 

While it is can be rewarding to solve a problem for the masses the ambiguity and apparent difficulty means that likely
onlysomeone paid or experiencing pain directly is going to make the effort.  Ideally a large number of persons would
bandtogether and fund such an initiative (even if just for research) which would both cover development costs and give
thecommunity a better idea of the scope of the problem. 

David J.

Re: Problem with aborting entire transactions on error

От

David Johnston

Дата:

11 декабря 2012 г., 17:33:43

On Dec 11, 2012, at 6:48, Zbigniew <zbigniew2011@gmail.com> wrote:

> 2012/12/11, Abel Abraham Camarillo Ojeda <acamari@the00z.org>:
>
>> Obviously, it's not fast...
>
> Well, the problem was exactly speed-related. If not this - I could
> simply dispose of the transaction...
> --
> regards,
> Zbigniew
>
>

It seems like if you switched to the COPY API and a staging table your initial performance would be maximized and from
thedelta gain you can run the second transformation query.  Your desired feature probably wouldn't work with copy
anywayand high-performance bulk loading should use that API. 

David J.

Re: Problem with aborting entire transactions on error

От

Merlin Moncure

Дата:

11 декабря 2012 г., 18:01:31

On Sun, Dec 9, 2012 at 9:20 PM, Zbigniew <zbigniew2011@gmail.com> wrote:
> Hello,
>
> As I read while googling the web, many people complained about this
> before. Couldn't it be made optional (can be even with "default ON")?
> I understand, that there are situations, when it is a must - for
> example, when the rest of queries rely on the result of first ones -
> but there are numerous situations, when just skipping a faulty query
> is all we need.
>
> A simple - but very common - example: I wanted to perform really large
> number of inserts - using transaction, to make it faster - while being
> sure the duplicate entries will be skipped. Of course, this job will
> be done best by server itself, which is keeping an eye on "primary
> key" of the table. Unfortunately: not during a transaction! Any "dupe"
> will trash thousands other (proper) entries immediately.
>
> Why is this? My guess is, there is kind of logic in the code, like this:
>
> if { no error during query } {
>   do it
> } else {
>  withdraw this one
>  rollback entire transaction
> }
>
> Therefore my request - and, as I saw, of many others - would be just
> to introduce a little change:
>
> if { no error during query } {
>   do it
> } else {
>  withdraw this one
>  if { ROLLBACK_ON_ERROR } {
>    rollback entire transaction
>   }
> }
>
> (if there's no ROLLBACK_ON_ERROR - it should carry on with the
> remaining queries)
>
> Is it really so problematic to introduce such code change, allowing
> the users to control this behaviour? Yes, I read about using
> "savepoints" - but I think we agree, it's just cumbersome workaround -
> and not real solution, like my proposal. All we need is either a
> variable to set, or a command, that will allow to modify the present
> functionality in the way described above.

No.  The point of transactions is to define a 'unit of work' -- all
aspects of the transaction must pass or must all fail. there are
limited mechanisms to work around this (SAVEPOINTs) the allow finer
control but still maintain that basic principle.

Consolidating bulk inserts into a single transaction is widely done
for performance reasons.  I understand your pain point, but you're
much better off in the long run by using various well established
methods such as staging data before inserting it instead of hacking
the transaction rules.

merlin

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Problem with aborting entire transactions on error