Обсуждение: What happened to the is_ family of functions proposal?
Hi,<br /><br />Back in 2002 these were proposed, what happened to them?<br /><br /><a href="http://archives.postgresql.org/pgsql-sql/2002-09/msg00406.php">http://archives.postgresql.org/pgsql-sql/2002-09/msg00406.php</a><br /><br/><br /> Also I note:<br /><br />colin@ruby:~/workspace/eyedb$ psql<br />psql (8.4.4)<br />Type "help" for help.<br/><br />colin=> select to_date('731332', 'YYMMDD');<br /> to_date <br />------------<br /> 1974-02-01<br />(1 row)<br /><br />colin=> <br /><br /><br />The fact that this wraps would seem to me to make the implementation ofis_date() difficult.<br /><br /><br />I'm trying to query character strings for valid dates but can't see how to do thisquickly... but for that discussion I will move to pgsql-general :-)<br /><br />Cheers,<br /><br />Colin<br />
On 09/20/2010 10:29 AM, Colin 't Hart wrote: > Hi, > > Back in 2002 these were proposed, what happened to them? > > http://archives.postgresql.org/pgsql-sql/2002-09/msg00406.php 2002 is a looooooooong time ago. > > > Also I note: > > colin@ruby:~/workspace/eyedb$ psql > psql (8.4.4) > Type "help" for help. > > colin=> select to_date('731332', 'YYMMDD'); > to_date > ------------ > 1974-02-01 > (1 row) > > colin=> > > > The fact that this wraps would seem to me to make the implementation > of is_date() difficult. > > > I think to_date is the wrong gadget to use here. You should probably be using the date input routine and trapping any data exception. e.g.: test_date := date_in(textout(some_text)); In plpgsql you'd put that inside a begin/exception/end block that traps SQLSTATE '22000' which is the class covering data exceptions. cheers andrew
On 20 September 2010 16:54, Andrew Dunstan <andrew@dunslane.net> wrote: > > > On 09/20/2010 10:29 AM, Colin 't Hart wrote: >> >> Hi, >> >> Back in 2002 these were proposed, what happened to them? >> >> http://archives.postgresql.org/pgsql-sql/2002-09/msg00406.php > > > 2002 is a looooooooong time ago. <snip> > I think to_date is the wrong gadget to use here. You should probably be using the date input routine and trapping any dataexception. e.g.: > > test_date := date_in(textout(some_text)); > > In plpgsql you'd put that inside a begin/exception/end block that traps SQLSTATE '22000' which is the class covering dataexceptions. So it's not possible using pure SQL unless one writes a function? Are the is_<type> family of functions still desired? Also, where are the to_<type> conversions done? Thanks, Colin
On Mon, Sep 20, 2010 at 11:31 AM, Colin 't Hart <colinthart@gmail.com> wrote: >> I think to_date is the wrong gadget to use here. You should probably be using the date input routine and trapping anydata exception. e.g.: >> >> test_date := date_in(textout(some_text)); >> >> In plpgsql you'd put that inside a begin/exception/end block that traps SQLSTATE '22000' which is the class covering dataexceptions. > > So it's not possible using pure SQL unless one writes a function? I think that is true. > Are the is_<type> family of functions still desired? I think it would be useful to have a way of testing whether a cast to a given type will succeed. The biggest problem with the exception-catching method is not that it requires writing a function (which, IMHO, is no big deal) but that exception handling is pretty slow and inefficient. You end up doing things like... write a regexp to see whether the data is in approximately the right format and then if it is try the cast inside an exception block. Yuck. (On the other hand, whether the work that was done in 2002 is still relevant to today's code is questionable. Things have changed a lot.) > Also, where are the to_<type> conversions done? I think maybe you are looking for the type input functions? select typname, typinput::regprocedure from pg_type; There are also some functions with names of the form to_<type>. You can get a list of those with the following psql command: \dfS to_* -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
Robert Haas <robertmhaas@gmail.com> writes: > I think it would be useful to have a way of testing whether a cast to > a given type will succeed. The biggest problem with the > exception-catching method is not that it requires writing a function > (which, IMHO, is no big deal) but that exception handling is pretty > slow and inefficient. You end up doing things like... write a regexp > to see whether the data is in approximately the right format and then > if it is try the cast inside an exception block. Yuck. The problem here is that putting the exception handling in C doesn't make things any better: it's still slow and inefficient. And in the general case the only way to be sure that a string will be accepted by the input function is to try it. regards, tom lane
On Tue, Sep 21, 2010 at 11:49 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> I think it would be useful to have a way of testing whether a cast to >> a given type will succeed. The biggest problem with the >> exception-catching method is not that it requires writing a function >> (which, IMHO, is no big deal) but that exception handling is pretty >> slow and inefficient. You end up doing things like... write a regexp >> to see whether the data is in approximately the right format and then >> if it is try the cast inside an exception block. Yuck. > > The problem here is that putting the exception handling in C doesn't > make things any better: it's still slow and inefficient. And in the > general case the only way to be sure that a string will be accepted by > the input function is to try it. Given the current API, that is true. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
Excerpts from Robert Haas's message of mar sep 21 11:56:51 -0400 2010: > On Tue, Sep 21, 2010 at 11:49 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > The problem here is that putting the exception handling in C doesn't > > make things any better: it's still slow and inefficient. And in the > > general case the only way to be sure that a string will be accepted by > > the input function is to try it. > > Given the current API, that is true. So we could refactor the input functions so that there's an internal function that returns the accepted datum in the OK case and an ErrorData for the failure case. The regular input function would just throw the error data in the latter case; but this would allow another function to just return whether it worked or not. -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Alvaro Herrera <alvherre@commandprompt.com> writes: >> On Tue, Sep 21, 2010 at 11:49 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> The problem here is that putting the exception handling in C doesn't >>> make things any better: > So we could refactor the input functions so that there's an internal > function that returns the accepted datum in the OK case and an ErrorData > for the failure case. This makes the untenable assumption that there are no elog(ERROR)s in the "internal" input function *or anything it calls*. Short of truly massive restructuring, including uglifying many internal APIs to have error return codes instead of allowing elog within the callee, you will never make this work for anything more complicated than say float8in(). regards, tom lane
On Tue, Sep 21, 2010 at 6:02 PM, Alvaro Herrera <alvherre@commandprompt.com> wrote: > So we could refactor the input functions so that there's an internal > function that returns the accepted datum in the OK case and an ErrorData > for the failure case. The regular input function would just throw the > error data in the latter case; but this would allow another function to > just return whether it worked or not. You're assuming the input function won't have any work it has to undo which it would need the savepoint for anyways. For most of the built-in datatypes -- all of the ones intended for holding real data -- that's true. But for things like regclass or regtype it might not be and for user-defined data types.... who knows? Of course all people really want is to test whether something is a valid integer, floating point value, etc. -- greg
On Tue, Sep 21, 2010 at 1:45 PM, Greg Stark <gsstark@mit.edu> wrote: > On Tue, Sep 21, 2010 at 6:02 PM, Alvaro Herrera > <alvherre@commandprompt.com> wrote: >> So we could refactor the input functions so that there's an internal >> function that returns the accepted datum in the OK case and an ErrorData >> for the failure case. The regular input function would just throw the >> error data in the latter case; but this would allow another function to >> just return whether it worked or not. > > You're assuming the input function won't have any work it has to undo > which it would need the savepoint for anyways. For most of the > built-in datatypes -- all of the ones intended for holding real data > -- that's true. But for things like regclass or regtype it might not > be and for user-defined data types.... who knows? > > Of course all people really want is to test whether something is a > valid integer, floating point value, etc. Right. Or a date - that's a case that comes up for me pretty frequently. It's not too hard to write a regular expression to test whether something is an integer -- although there is the question of whether it will overflow, which is sometimes relevant -- but a date or timestamp field is a bit harder. I don't understand the argument that we need type input functions to be protected by a savepoint. That seems crazy to me. We're taking a huge performance penalty here to protect against something that seems insane to me in the first instance. Not to mention cutting ourselves off from really important features, like the ability to recover from errors during COPY. I don't understand why we can't just make some rules about what type input functions are allowed to do. And if you break those rules then you get to keep both pieces. Why is this unreasonable? A savepoint can hardly protect you against damage inflicted by the execution of arbitrary code; IOW, we're already relying on the user to follow some rules. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
Excerpts from Tom Lane's message of mar sep 21 13:41:32 -0400 2010: > Alvaro Herrera <alvherre@commandprompt.com> writes: > >> On Tue, Sep 21, 2010 at 11:49 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >>> The problem here is that putting the exception handling in C doesn't > >>> make things any better: > > > So we could refactor the input functions so that there's an internal > > function that returns the accepted datum in the OK case and an ErrorData > > for the failure case. > > This makes the untenable assumption that there are no elog(ERROR)s in > the "internal" input function *or anything it calls*. Short of truly > massive restructuring, including uglifying many internal APIs to have > error return codes instead of allowing elog within the callee, you will > never make this work for anything more complicated than say float8in(). ... which is what people want anyway. I mean, the day someone requests is_sthcomplex, we could happily tell them that they need to use the expensive workaround involving savepoints. I don't think we really need to support the ones that would require truly expensive refactoring; the simple ones would cover 99% of the use cases. -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Robert Haas <robertmhaas@gmail.com> writes: > I don't understand the argument that we need type input functions to > be protected by a savepoint. That seems crazy to me. We're taking a > huge performance penalty here to protect against something that seems > insane to me in the first instance. Not to mention cutting ourselves > off from really important features, like the ability to recover from > errors during COPY. I don't understand why we can't just make some > rules about what type input functions are allowed to do. There are many rules that you could possibly make for type input functions. But "you cannot throw an error" is not one of them --- or at least, not one that you can usefully expect to be followed for anything more than trivial straightline code. The poster child for this is of course domain_in(). But even without that, I don't think you can realistically legislate that no errors be thrown by something of the complexity of, say, the timestamp input functions. Just for starters, what of a palloc() failure? regards, tom lane
On Wednesday 22 September 2010 01:05:39 Tom Lane wrote: > Robert Haas <robertmhaas@gmail.com> writes: > > I don't understand the argument that we need type input functions to > > be protected by a savepoint. That seems crazy to me. We're taking a > > huge performance penalty here to protect against something that seems > > insane to me in the first instance. Not to mention cutting ourselves > > off from really important features, like the ability to recover from > > errors during COPY. I don't understand why we can't just make some > > rules about what type input functions are allowed to do. > > There are many rules that you could possibly make for type input > functions. But "you cannot throw an error" is not one of them --- > or at least, not one that you can usefully expect to be followed > for anything more than trivial straightline code. > > The poster child for this is of course domain_in(). But even without > that, I don't think you can realistically legislate that no errors be > thrown by something of the complexity of, say, the timestamp input > functions. Just for starters, what of a palloc() failure? Uhm. Isnt a palloc failure a really, really bad example because it will kill the session anyway? FATAL+ is not relevant in that context, right? Andres
Alvaro Herrera <alvherre@commandprompt.com> writes: > Excerpts from Tom Lane's message of mar sep 21 13:41:32 -0400 2010: >> ...never make this work for anything more complicated than say float8in(). > ... which is what people want anyway. I mean, the day someone requests > is_sthcomplex, we could happily tell them that they need to use the > expensive workaround involving savepoints. I don't think we really need > to support the ones that would require truly expensive refactoring; the > simple ones would cover 99% of the use cases. Robert was complaining about COPY in particular. It's hard to see how you make any progress on that if you don't have pretty near 100% coverage of datatypes. I don't object if someone puts in is_integer, is_float, etc; but that's nowhere near a general purpose solution. regards, tom lane
Andres Freund <andres@anarazel.de> writes: > On Wednesday 22 September 2010 01:05:39 Tom Lane wrote: >> Just for starters, what of a palloc() failure? > Uhm. Isnt a palloc failure a really, really bad example because it will kill > the session anyway? FATAL+ is not relevant in that context, right? Huh? It's not fatal, just elog(ERROR, "out of memory"). regards, tom lane
On Tue, Sep 21, 2010 at 7:05 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> I don't understand the argument that we need type input functions to >> be protected by a savepoint. That seems crazy to me. We're taking a >> huge performance penalty here to protect against something that seems >> insane to me in the first instance. Not to mention cutting ourselves >> off from really important features, like the ability to recover from >> errors during COPY. I don't understand why we can't just make some >> rules about what type input functions are allowed to do. > > There are many rules that you could possibly make for type input > functions. But "you cannot throw an error" is not one of them --- > or at least, not one that you can usefully expect to be followed > for anything more than trivial straightline code. OK. This is one of the things I don't understand. Why does throwing an error imply that we need to abort the current transaction? Why can't we just catch the longjmp() and trundle onwards? Obviously, that's unsafe if a pretty wide variety of cases, but if you're just scrutinizing the input string (even with a little bit of read-only database access) it's not obvious to me what can go wrong. (I assume there is something, but I don't know what it is...) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
Robert Haas wrote: > On Tue, Sep 21, 2010 at 7:05 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Robert Haas <robertmhaas@gmail.com> writes: > >> I don't understand the argument that we need type input functions to > >> be protected by a savepoint. ?That seems crazy to me. ?We're taking a > >> huge performance penalty here to protect against something that seems > >> insane to me in the first instance. ?Not to mention cutting ourselves > >> off from really important features, like the ability to recover from > >> errors during COPY. ?I don't understand why we can't just make some > >> rules about what type input functions are allowed to do. > > > > There are many rules that you could possibly make for type input > > functions. ?But "you cannot throw an error" is not one of them --- > > or at least, not one that you can usefully expect to be followed > > for anything more than trivial straightline code. > > OK. This is one of the things I don't understand. Why does throwing > an error imply that we need to abort the current transaction? Why > can't we just catch the longjmp() and trundle onwards? Obviously, > that's unsafe if a pretty wide variety of cases, but if you're just > scrutinizing the input string (even with a little bit of read-only > database access) it's not obvious to me what can go wrong. (I assume > there is something, but I don't know what it is...) That would be interesting. You would need to flag that this was not a longjump requiring cleanup, but the harder part would be getting back to where the error occurred. I guess you could rerun the query. :-| -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On Tue, 2010-09-21 at 19:55 -0400, Robert Haas wrote: > OK. This is one of the things I don't understand. Why does throwing > an error imply that we need to abort the current transaction? Why > can't we just catch the longjmp() and trundle onwards? Obviously, > that's unsafe if a pretty wide variety of cases, but if you're just > scrutinizing the input string (even with a little bit of read-only > database access) it's not obvious to me what can go wrong. (I assume > there is something, but I don't know what it is...) The worry (from me) would be the "little bit of read-only database access". If you SPI_connect() without an SPI_finish(), that sounds like a potential problem (as would anything else that requires cleanup). Regards,Jeff Davis
Robert Haas <robertmhaas@gmail.com> writes: > On Tue, Sep 21, 2010 at 7:05 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> There are many rules that you could possibly make for type input >> functions. �But "you cannot throw an error" is not one of them --- >> or at least, not one that you can usefully expect to be followed >> for anything more than trivial straightline code. > OK. This is one of the things I don't understand. Why does throwing > an error imply that we need to abort the current transaction? Why > can't we just catch the longjmp() and trundle onwards? Obviously, > that's unsafe if a pretty wide variety of cases, but if you're just > scrutinizing the input string (even with a little bit of read-only > database access) it's not obvious to me what can go wrong. The problem is to know that "all you did" was scrutinize the input string. If it's simple straightline code (even with some C library calls) then you can know that, but then you can write such code without including any elog(ERROR) in it in the first place. If you are trapping longjmps then what you'd need to assert is that no error thrown from anywhere in any of the code reachable from that place represents a problem that requires transaction abort to clean up after. This gets unmaintainable remarkably quickly, especially if you invoke anything as complicated as database access. And then there are asynchronous error reasons (query cancel) which you shouldn't trap in any case. regards, tom lane
On Fri, Sep 24, 2010 at 3:41 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Tue, Sep 21, 2010 at 7:05 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> There are many rules that you could possibly make for type input >>> functions. But "you cannot throw an error" is not one of them --- >>> or at least, not one that you can usefully expect to be followed >>> for anything more than trivial straightline code. > >> OK. This is one of the things I don't understand. Why does throwing >> an error imply that we need to abort the current transaction? Why >> can't we just catch the longjmp() and trundle onwards? Obviously, >> that's unsafe if a pretty wide variety of cases, but if you're just >> scrutinizing the input string (even with a little bit of read-only >> database access) it's not obvious to me what can go wrong. > > The problem is to know that "all you did" was scrutinize the input > string. If it's simple straightline code (even with some C library > calls) then you can know that, but then you can write such code without > including any elog(ERROR) in it in the first place. If you are trapping > longjmps then what you'd need to assert is that no error thrown from > anywhere in any of the code reachable from that place represents a > problem that requires transaction abort to clean up after. This gets > unmaintainable remarkably quickly, especially if you invoke anything > as complicated as database access. And then there are asynchronous > error reasons (query cancel) which you shouldn't trap in any case. Hmm. So the problem is that we don't want to accidentally catch an error that isn't actually safe to catch. We could probably mitigate this problem to a considerable degree by throwing data validation errors using some special flag that say "this is a recoverable error".And if that flag isn't set then we abort the wholetransaction, but if it is then we continue on. It's still possible for the person writing the typinput function to set that flag when they should not, but at least it's less likely to happen by accident. Another alternative would be to create some kind of explicit way for the function to RETURN an error instead of throwing it. But neither of these things is totally bullet-proof, because you could still do something that requires clean-up and then lie about it. To protect against that, you'd presumably need to set some kind of a flag whenever, say, a heap tuple gets modified, and then you could assert said flag false. What, other than writing to the database, requires subtransaction cleanup? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
Colin 't Hart wrote: > The fact that this wraps would seem to me to make the implementation of > is_date() difficult. Having separate is_foo() syntax per type is a bad design idea, same as having a different equality test like eq_int() or assignment syntax like assign_str() per type. There should just be a single syntax that works for all types, in the general case, for testing whether a value is a member of that type, or alternately whether a value can be cast to a particular type. For example, one could say "is_type( <value>, <type-name> )" or it could be spelled "isa()" or if you wanted to be more ambitious it could be an infix op, like "<value> isa <type-name>" to test when a value is of a type already. Pg already gets it right in this regard by having a single general syntax for type casting, the "<value>::<type-name>" and value membership of a type should be likewise. Maybe to test if a value can be cast as a type, you can continue the :: mnemonic, say adding a "?" for yes and a "!" for no. For example, "<value>?::<type-name>" tests if the value can be cast as the type and "<value>!::<type-name>" or "not <value>?::<type-name>" tests the opposite. An expression like this results in a boolean. -- Darren Duncan
On 25/09/2010 11:51 AM, Darren Duncan wrote: > Colin 't Hart wrote: >> The fact that this wraps would seem to me to make the implementation >> of is_date() difficult. > > Having separate is_foo() syntax per type is a bad design idea, same as > having a different equality test like eq_int() or assignment syntax like > assign_str() per type. > > There should just be a single syntax that works for all types, in the > general case, for testing whether a value is a member of that type, or > alternately whether a value can be cast to a particular type. Good point. That also permits a general-case implementation that catches any exceptions thrown, with optimized exception-free cases for int/date/string etc behind the scenes. That'd do a good job of protecting the SQL programmer from having to deal with which types had tests and which they had to use exception handling for. It'd have to be documented due to the performance differences, but it'd be great to be able to do this in a general purpose way at the SQL level. > Pg already gets it right in this regard by having a single general > syntax for type casting, the "<value>::<type-name>" and value membership > of a type should be likewise. or the standard: CAST(value AS typename) > Maybe to test if a value can be cast as a type, you can continue the :: > mnemonic, say adding a "?" for yes and a "!" for no. > > For example, "<value>?::<type-name>" tests if the value can be cast as > the type and "<value>!::<type-name>" or "not <value>?::<type-name>" > tests the opposite. An expression like this results in a boolean. Personal opinion here: Blech, if I wanted to use Perl6 I'd do so ;-) Good shorthand, I guess, but a CAST syntax extension or alternate CAST version would be a bonus for readability. -- Craig Ringer Tech-related writing at http://soapyfrogs.blogspot.com/
Craig Ringer wrote: > On 25/09/2010 11:51 AM, Darren Duncan wrote: >> There should just be a single syntax that works for all types, in the >> general case, for testing whether a value is a member of that type, or >> alternately whether a value can be cast to a particular type. <snip> >> Pg already gets it right in this regard by having a single general >> syntax for type casting, the "<value>::<type-name>" and value membership >> of a type should be likewise. > > or the standard: > > CAST(value AS typename) Indeed. The exact syntax doesn't matter to me but the point is that the type name is its own lexical element, conceptually a function argument, rather than being a substring of another one. >> Maybe to test if a value can be cast as a type, you can continue the :: >> mnemonic, say adding a "?" for yes and a "!" for no. >> >> For example, "<value>?::<type-name>" tests if the value can be cast as >> the type and "<value>!::<type-name>" or "not <value>?::<type-name>" >> tests the opposite. An expression like this results in a boolean. > > Personal opinion here: Blech, if I wanted to use Perl6 I'd do so ;-) I see that someone has been paying attention. Yes, the idea of using ? or ! to derive a boolean expression from some other expression did indeed come from Perl 6. The ? means "is so", ! means "is not". A very useful mnemonic in general. > Good shorthand, I guess, but a CAST syntax extension or alternate CAST > version would be a bonus for readability. Well, sure. But something consistent with cast syntax that Pg provides. -- Darren Duncan
On Fri, Sep 24, 2010 at 11:15 PM, Robert Haas <robertmhaas@gmail.com> wrote:
Andrew suggested upthread:
<snip>
test_date := date_in(textout(some_text));
In plpgsql you'd put that inside a begin/exception/end block that traps SQLSTATE '22000' which is the class covering data exceptions.
</snip>
In context of COPY, can we check for this SQLSTATE in PG_CATCH() and avoid PG_RE_THROW(). This would require that all type input functions call errcode() with proper MAKE_SQLSTATE() as part of ereport()/elog().
Regards,
-- On Fri, Sep 24, 2010 at 3:41 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:Hmm. So the problem is that we don't want to accidentally catch an
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Tue, Sep 21, 2010 at 7:05 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> There are many rules that you could possibly make for type input
>>> functions. But "you cannot throw an error" is not one of them ---
>>> or at least, not one that you can usefully expect to be followed
>>> for anything more than trivial straightline code.
>
>> OK. This is one of the things I don't understand. Why does throwing
>> an error imply that we need to abort the current transaction? Why
>> can't we just catch the longjmp() and trundle onwards? Obviously,
>> that's unsafe if a pretty wide variety of cases, but if you're just
>> scrutinizing the input string (even with a little bit of read-only
>> database access) it's not obvious to me what can go wrong.
>
> The problem is to know that "all you did" was scrutinize the input
> string. If it's simple straightline code (even with some C library
> calls) then you can know that, but then you can write such code without
> including any elog(ERROR) in it in the first place. If you are trapping
> longjmps then what you'd need to assert is that no error thrown from
> anywhere in any of the code reachable from that place represents a
> problem that requires transaction abort to clean up after. This gets
> unmaintainable remarkably quickly, especially if you invoke anything
> as complicated as database access. And then there are asynchronous
> error reasons (query cancel) which you shouldn't trap in any case.
error that isn't actually safe to catch. We could probably mitigate
this problem to a considerable degree by throwing data validation
errors using some special flag that say "this is a recoverable error".
And if that flag isn't set then we abort the whole transaction, but
if it is then we continue on. It's still possible for the person
writing the typinput function to set that flag when they should not,
but at least it's less likely to happen by accident. Another
alternative would be to create some kind of explicit way for the
function to RETURN an error instead of throwing it.
But neither of these things is totally bullet-proof, because you could
still do something that requires clean-up and then lie about it. To
protect against that, you'd presumably need to set some kind of a flag
whenever, say, a heap tuple gets modified, and then you could assert
said flag false. What, other than writing to the database, requires
subtransaction cleanup?
Andrew suggested upthread:
<snip>
test_date := date_in(textout(some_text));
In plpgsql you'd put that inside a begin/exception/end block that traps SQLSTATE '22000' which is the class covering data exceptions.
</snip>
In context of COPY, can we check for this SQLSTATE in PG_CATCH() and avoid PG_RE_THROW(). This would require that all type input functions call errcode() with proper MAKE_SQLSTATE() as part of ereport()/elog().
Regards,
gurjeet.singh
@ EnterpriseDB - The Enterprise Postgres Company
http://www.EnterpriseDB.com
singh.gurjeet@{ gmail | yahoo }.com
Twitter/Skype: singh_gurjeet
Mail sent from my BlackLaptop device
Gurjeet Singh <singh.gurjeet@gmail.com> writes: > On Fri, Sep 24, 2010 at 11:15 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> Hmm. So the problem is that we don't want to accidentally catch an >> error that isn't actually safe to catch. We could probably mitigate >> this problem to a considerable degree by throwing data validation >> errors using some special flag that say "this is a recoverable error". > Andrew suggested upthread: > <snip> > test_date := date_in(textout(some_text)); > In plpgsql you'd put that inside a begin/exception/end block that traps > SQLSTATE '22000' which is the class covering data exceptions. > </snip> > In context of COPY, can we check for this SQLSTATE in PG_CATCH() and > avoid PG_RE_THROW(). This would require that all type input functions call > errcode() with proper MAKE_SQLSTATE() as part of ereport()/elog(). This is all pretty much a dead end, because it offers no confidence whatsoever. Suppose that COPY calls type X's input function, which calls function Y, which calls function Z. Z doesn't like what it sees so it throws an error, which it marks "recoverable" since Z hasn't done anything dangerous. Unfortunately, Y *did* do something that requires cleanup. If COPY catches the longjmp and decides that it can skip doing a transaction abort, you're screwed. Again, the problem doesn't really arise as long as you're thinking about simple code. You can fairly easily verify that no unsafe situation arises so long as there's only one or two layers of functions involved. But this approach just doesn't scale to anything complicated. What I'm wondering is whether we can fix this by reducing the overhead of subtransactions, enough so that we can afford to run each row's input function calls within a subxact. In the past that was dismissed because you'd run out of subxact XIDs at 4G rows. But we have "lazy" assignment of XIDs now, so a subxact that didn't actually try to modify the database shouldn't need to consume any permanent resources. Then we're just looking at the time needed to call all the per-module subxact start and subxact cleanup functions, which seems like something that might be optimizable for the typical case where nothing actually needs to be done. regards, tom lane
On Sat, Sep 25, 2010 at 10:34 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > This is all pretty much a dead end, because it offers no confidence > whatsoever. Suppose that COPY calls type X's input function, which > calls function Y, which calls function Z. Z doesn't like what it sees > so it throws an error, which it marks "recoverable" since Z hasn't > done anything dangerous. Unfortunately, Y *did* do something that > requires cleanup. If COPY catches the longjmp and decides that it > can skip doing a transaction abort, you're screwed. Yep. Although it seems a bit pathological for Z to do that, because if Y is doing something like taking an LWLock then Z is some low-level internals function that is not in a good position to judge whether error recovery is feasible. The point is not to recover from as many errors as possible, but to recover specifically from *data validation* errors, which I would not expect to be the sort of thing thrown from someplace deep down in the call stack where we're deep in the middle of things. The toplevel typinput is a pretty good position to know whether it's done anything shady. > What I'm wondering is whether we can fix this by reducing the overhead > of subtransactions, enough so that we can afford to run each row's input > function calls within a subxact. In the past that was dismissed because > you'd run out of subxact XIDs at 4G rows. But we have "lazy" assignment > of XIDs now, so a subxact that didn't actually try to modify the > database shouldn't need to consume any permanent resources. Then we're > just looking at the time needed to call all the per-module subxact start > and subxact cleanup functions, which seems like something that might be > optimizable for the typical case where nothing actually needs to be > done. Well, reducing the overhead of subtransaction cleanup would certainly be VERY nice, as it would benefit a FAR broader set of use cases than just typinput functions. It seems a bit tricky though, because AbortSubTransaction() calls a whole LOT of cleanup functions, and many of them already have fast-paths. Where do you anticipate getting a further large speed-up out of that? The problem seems particularly tricky because those functions are cleaning up different subsystems. Maybe you could group them in some way and figure out some method of skipping entire groups with some kind of super-duper fast path, but it's not obvious to me how to make that work. And I think you'd need a pretty considerable speed-up, too. My gut says that even knocking 50% off, while it might be really nice for other reasons, is not going to be enough to make sticking it inside COPY workable. I bet you need an order-of-magnitude speed-up, maybe more. It seems like a good slice of the problem here comes from the difficulties of being certain what the state is after a longjmp. It seems like you could get around all of these difficulties almost completely if the type input function were empowered to return either (1) a Datum which is the result of the conversion or (2) an SQLSTATE and error message indicating what went wrong. We're already willing to believe that cleanup isn't required when the function returns successfully, so we ought to also believe it when the function returns a failure result (as opposed to throwing an error indicating a failure). The conditions that require cleanup here are probably transient: take an LWLock, do something, release the LWLock. As long as you know that you haven't stopped somewhere in the middle of that sequence, it seems like it should be reasonably safe. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company