Обсуждение: What happened to the is_ family of functions proposal?

Поиск
Список
Период
Сортировка

What happened to the is_ family of functions proposal?

От
"Colin 't Hart"
Дата:
Hi,<br /><br />Back in 2002 these were proposed, what happened to them?<br /><br /><a
href="http://archives.postgresql.org/pgsql-sql/2002-09/msg00406.php">http://archives.postgresql.org/pgsql-sql/2002-09/msg00406.php</a><br
/><br/><br /> Also I note:<br /><br />colin@ruby:~/workspace/eyedb$ psql<br />psql (8.4.4)<br />Type "help" for
help.<br/><br />colin=> select to_date('731332', 'YYMMDD');<br />  to_date   <br />------------<br /> 1974-02-01<br
/>(1 row)<br /><br />colin=> <br /><br /><br />The fact that this wraps would seem to me to make the implementation
ofis_date() difficult.<br /><br /><br />I'm trying to query character strings for valid dates but can't see how to do
thisquickly... but for that discussion I will move to pgsql-general :-)<br /><br />Cheers,<br /><br />Colin<br /> 

Re: What happened to the is_ family of functions proposal?

От
Andrew Dunstan
Дата:

On 09/20/2010 10:29 AM, Colin 't Hart wrote:
> Hi,
>
> Back in 2002 these were proposed, what happened to them?
>
> http://archives.postgresql.org/pgsql-sql/2002-09/msg00406.php


2002 is a looooooooong time ago.

>
>
> Also I note:
>
> colin@ruby:~/workspace/eyedb$ psql
> psql (8.4.4)
> Type "help" for help.
>
> colin=> select to_date('731332', 'YYMMDD');
>   to_date
> ------------
>  1974-02-01
> (1 row)
>
> colin=>
>
>
> The fact that this wraps would seem to me to make the implementation 
> of is_date() difficult.
>
>
>

I think to_date is the wrong gadget to use here. You should probably be 
using the date input routine and trapping any data exception. e.g.:
    test_date := date_in(textout(some_text));

In plpgsql you'd put that inside a begin/exception/end block that traps 
SQLSTATE '22000' which is the class covering data exceptions.

cheers

andrew


Re: What happened to the is_ family of functions proposal?

От
"Colin 't Hart"
Дата:
On 20 September 2010 16:54, Andrew Dunstan <andrew@dunslane.net> wrote:
>
>
> On 09/20/2010 10:29 AM, Colin 't Hart wrote:
>>
>> Hi,
>>
>> Back in 2002 these were proposed, what happened to them?
>>
>> http://archives.postgresql.org/pgsql-sql/2002-09/msg00406.php
>
>
> 2002 is a looooooooong time ago.

<snip>

> I think to_date is the wrong gadget to use here. You should probably be using the date input routine and trapping any
dataexception. e.g.: 
>
>    test_date := date_in(textout(some_text));
>
> In plpgsql you'd put that inside a begin/exception/end block that traps SQLSTATE '22000' which is the class covering
dataexceptions. 

So it's not possible using pure SQL unless one writes a function?

Are the is_<type> family of functions still desired?

Also, where are the to_<type> conversions done?

Thanks,

Colin


Re: What happened to the is_ family of functions proposal?

От
Robert Haas
Дата:
On Mon, Sep 20, 2010 at 11:31 AM, Colin 't Hart <colinthart@gmail.com> wrote:
>> I think to_date is the wrong gadget to use here. You should probably be using the date input routine and trapping
anydata exception. e.g.: 
>>
>>    test_date := date_in(textout(some_text));
>>
>> In plpgsql you'd put that inside a begin/exception/end block that traps SQLSTATE '22000' which is the class covering
dataexceptions. 
>
> So it's not possible using pure SQL unless one writes a function?

I think that is true.

> Are the is_<type> family of functions still desired?

I think it would be useful to have a way of testing whether a cast to
a given type will succeed.  The biggest problem with the
exception-catching method is not that it requires writing a function
(which, IMHO, is no big deal) but that exception handling is pretty
slow and inefficient.  You end up doing things like... write a regexp
to see whether the data is in approximately the right format and then
if it is try the cast inside an exception block.  Yuck.

(On the other hand, whether the work that was done in 2002 is still
relevant to today's code is questionable.  Things have changed a lot.)

> Also, where are the to_<type> conversions done?

I think maybe you are looking for the type input functions?

select typname, typinput::regprocedure from pg_type;

There are also some functions with names of the form to_<type>.  You
can get a list of those with the following psql command:

\dfS to_*

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


Re: What happened to the is_ family of functions proposal?

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> I think it would be useful to have a way of testing whether a cast to
> a given type will succeed.  The biggest problem with the
> exception-catching method is not that it requires writing a function
> (which, IMHO, is no big deal) but that exception handling is pretty
> slow and inefficient.  You end up doing things like... write a regexp
> to see whether the data is in approximately the right format and then
> if it is try the cast inside an exception block.  Yuck.

The problem here is that putting the exception handling in C doesn't
make things any better: it's still slow and inefficient.  And in the
general case the only way to be sure that a string will be accepted by
the input function is to try it.
        regards, tom lane


Re: What happened to the is_ family of functions proposal?

От
Robert Haas
Дата:
On Tue, Sep 21, 2010 at 11:49 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> I think it would be useful to have a way of testing whether a cast to
>> a given type will succeed.  The biggest problem with the
>> exception-catching method is not that it requires writing a function
>> (which, IMHO, is no big deal) but that exception handling is pretty
>> slow and inefficient.  You end up doing things like... write a regexp
>> to see whether the data is in approximately the right format and then
>> if it is try the cast inside an exception block.  Yuck.
>
> The problem here is that putting the exception handling in C doesn't
> make things any better: it's still slow and inefficient.  And in the
> general case the only way to be sure that a string will be accepted by
> the input function is to try it.

Given the current API, that is true.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


Re: What happened to the is_ family of functions proposal?

От
Alvaro Herrera
Дата:
Excerpts from Robert Haas's message of mar sep 21 11:56:51 -0400 2010:
> On Tue, Sep 21, 2010 at 11:49 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> > The problem here is that putting the exception handling in C doesn't
> > make things any better: it's still slow and inefficient.  And in the
> > general case the only way to be sure that a string will be accepted by
> > the input function is to try it.
> 
> Given the current API, that is true.

So we could refactor the input functions so that there's an internal
function that returns the accepted datum in the OK case and an ErrorData
for the failure case.  The regular input function would just throw the
error data in the latter case; but this would allow another function to
just return whether it worked or not.

-- 
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: What happened to the is_ family of functions proposal?

От
Tom Lane
Дата:
Alvaro Herrera <alvherre@commandprompt.com> writes:
>> On Tue, Sep 21, 2010 at 11:49 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> The problem here is that putting the exception handling in C doesn't
>>> make things any better:

> So we could refactor the input functions so that there's an internal
> function that returns the accepted datum in the OK case and an ErrorData
> for the failure case.

This makes the untenable assumption that there are no elog(ERROR)s in
the "internal" input function *or anything it calls*.  Short of truly
massive restructuring, including uglifying many internal APIs to have
error return codes instead of allowing elog within the callee, you will
never make this work for anything more complicated than say float8in().
        regards, tom lane


Re: What happened to the is_ family of functions proposal?

От
Greg Stark
Дата:
On Tue, Sep 21, 2010 at 6:02 PM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:
> So we could refactor the input functions so that there's an internal
> function that returns the accepted datum in the OK case and an ErrorData
> for the failure case.  The regular input function would just throw the
> error data in the latter case; but this would allow another function to
> just return whether it worked or not.

You're assuming the input function won't have any work it has to undo
which it would need the savepoint for anyways. For most of the
built-in datatypes -- all of the ones intended for holding real data
-- that's true. But for things like regclass or regtype it might not
be and for user-defined data types.... who knows?

Of course all people really want is to test whether something is a
valid integer, floating point value, etc.

--
greg


Re: What happened to the is_ family of functions proposal?

От
Robert Haas
Дата:
On Tue, Sep 21, 2010 at 1:45 PM, Greg Stark <gsstark@mit.edu> wrote:
> On Tue, Sep 21, 2010 at 6:02 PM, Alvaro Herrera
> <alvherre@commandprompt.com> wrote:
>> So we could refactor the input functions so that there's an internal
>> function that returns the accepted datum in the OK case and an ErrorData
>> for the failure case.  The regular input function would just throw the
>> error data in the latter case; but this would allow another function to
>> just return whether it worked or not.
>
> You're assuming the input function won't have any work it has to undo
> which it would need the savepoint for anyways. For most of the
> built-in datatypes -- all of the ones intended for holding real data
> -- that's true. But for things like regclass or regtype it might not
> be and for user-defined data types.... who knows?
>
> Of course all people really want is to test whether something is a
> valid integer, floating point value, etc.

Right.  Or a date - that's a case that comes up for me pretty
frequently.  It's not too hard to write a regular expression to test
whether something is an integer -- although there is the question of
whether it will overflow, which is sometimes relevant -- but a date or
timestamp field is a bit harder.

I don't understand the argument that we need type input functions to
be protected by a savepoint.  That seems crazy to me.  We're taking a
huge performance penalty here to protect against something that seems
insane to me in the first instance.  Not to mention cutting ourselves
off from really important features, like the ability to recover from
errors during COPY.  I don't understand why we can't just make some
rules about what type input functions are allowed to do.  And if you
break those rules then you get to keep both pieces.  Why is this
unreasonable?  A savepoint can hardly protect you against damage
inflicted by the execution of arbitrary code; IOW, we're already
relying on the user to follow some rules.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


Re: What happened to the is_ family of functions proposal?

От
Alvaro Herrera
Дата:
Excerpts from Tom Lane's message of mar sep 21 13:41:32 -0400 2010:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
> >> On Tue, Sep 21, 2010 at 11:49 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >>> The problem here is that putting the exception handling in C doesn't
> >>> make things any better:
> 
> > So we could refactor the input functions so that there's an internal
> > function that returns the accepted datum in the OK case and an ErrorData
> > for the failure case.
> 
> This makes the untenable assumption that there are no elog(ERROR)s in
> the "internal" input function *or anything it calls*.  Short of truly
> massive restructuring, including uglifying many internal APIs to have
> error return codes instead of allowing elog within the callee, you will
> never make this work for anything more complicated than say float8in().

... which is what people want anyway.  I mean, the day someone requests
is_sthcomplex, we could happily tell them that they need to use the
expensive workaround involving savepoints.  I don't think we really need
to support the ones that would require truly expensive refactoring; the
simple ones would cover 99% of the use cases.

-- 
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: What happened to the is_ family of functions proposal?

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> I don't understand the argument that we need type input functions to
> be protected by a savepoint.  That seems crazy to me.  We're taking a
> huge performance penalty here to protect against something that seems
> insane to me in the first instance.  Not to mention cutting ourselves
> off from really important features, like the ability to recover from
> errors during COPY.  I don't understand why we can't just make some
> rules about what type input functions are allowed to do.

There are many rules that you could possibly make for type input
functions.  But "you cannot throw an error" is not one of them ---
or at least, not one that you can usefully expect to be followed
for anything more than trivial straightline code.

The poster child for this is of course domain_in().  But even without
that, I don't think you can realistically legislate that no errors be
thrown by something of the complexity of, say, the timestamp input
functions.  Just for starters, what of a palloc() failure?
        regards, tom lane


Re: What happened to the is_ family of functions proposal?

От
Andres Freund
Дата:
On Wednesday 22 September 2010 01:05:39 Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > I don't understand the argument that we need type input functions to
> > be protected by a savepoint.  That seems crazy to me.  We're taking a
> > huge performance penalty here to protect against something that seems
> > insane to me in the first instance.  Not to mention cutting ourselves
> > off from really important features, like the ability to recover from
> > errors during COPY.  I don't understand why we can't just make some
> > rules about what type input functions are allowed to do.
> 
> There are many rules that you could possibly make for type input
> functions.  But "you cannot throw an error" is not one of them ---
> or at least, not one that you can usefully expect to be followed
> for anything more than trivial straightline code.
> 
> The poster child for this is of course domain_in().  But even without
> that, I don't think you can realistically legislate that no errors be
> thrown by something of the complexity of, say, the timestamp input
> functions.  Just for starters, what of a palloc() failure?
Uhm. Isnt a palloc failure a really, really bad example because it will kill 
the session anyway? FATAL+ is not relevant in that context, right?

Andres


Re: What happened to the is_ family of functions proposal?

От
Tom Lane
Дата:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Excerpts from Tom Lane's message of mar sep 21 13:41:32 -0400 2010:
>> ...never make this work for anything more complicated than say float8in().

> ... which is what people want anyway.  I mean, the day someone requests
> is_sthcomplex, we could happily tell them that they need to use the
> expensive workaround involving savepoints.  I don't think we really need
> to support the ones that would require truly expensive refactoring; the
> simple ones would cover 99% of the use cases.

Robert was complaining about COPY in particular.  It's hard to see how
you make any progress on that if you don't have pretty near 100%
coverage of datatypes.  I don't object if someone puts in is_integer,
is_float, etc; but that's nowhere near a general purpose solution.
        regards, tom lane


Re: What happened to the is_ family of functions proposal?

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> On Wednesday 22 September 2010 01:05:39 Tom Lane wrote:
>> Just for starters, what of a palloc() failure?

> Uhm. Isnt a palloc failure a really, really bad example because it will kill 
> the session anyway? FATAL+ is not relevant in that context, right?

Huh?  It's not fatal, just elog(ERROR, "out of memory").
        regards, tom lane


Re: What happened to the is_ family of functions proposal?

От
Robert Haas
Дата:
On Tue, Sep 21, 2010 at 7:05 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> I don't understand the argument that we need type input functions to
>> be protected by a savepoint.  That seems crazy to me.  We're taking a
>> huge performance penalty here to protect against something that seems
>> insane to me in the first instance.  Not to mention cutting ourselves
>> off from really important features, like the ability to recover from
>> errors during COPY.  I don't understand why we can't just make some
>> rules about what type input functions are allowed to do.
>
> There are many rules that you could possibly make for type input
> functions.  But "you cannot throw an error" is not one of them ---
> or at least, not one that you can usefully expect to be followed
> for anything more than trivial straightline code.

OK.  This is one of the things I don't understand.  Why does throwing
an error imply that we need to abort the current transaction?  Why
can't we just catch the longjmp() and trundle onwards?  Obviously,
that's unsafe if a pretty wide variety of cases, but if you're just
scrutinizing the input string (even with a little bit of read-only
database access) it's not obvious to me what can go wrong.  (I assume
there is something, but I don't know what it is...)

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


Re: What happened to the is_ family of functions proposal?

От
Bruce Momjian
Дата:
Robert Haas wrote:
> On Tue, Sep 21, 2010 at 7:05 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Robert Haas <robertmhaas@gmail.com> writes:
> >> I don't understand the argument that we need type input functions to
> >> be protected by a savepoint. ?That seems crazy to me. ?We're taking a
> >> huge performance penalty here to protect against something that seems
> >> insane to me in the first instance. ?Not to mention cutting ourselves
> >> off from really important features, like the ability to recover from
> >> errors during COPY. ?I don't understand why we can't just make some
> >> rules about what type input functions are allowed to do.
> >
> > There are many rules that you could possibly make for type input
> > functions. ?But "you cannot throw an error" is not one of them ---
> > or at least, not one that you can usefully expect to be followed
> > for anything more than trivial straightline code.
> 
> OK.  This is one of the things I don't understand.  Why does throwing
> an error imply that we need to abort the current transaction?  Why
> can't we just catch the longjmp() and trundle onwards?  Obviously,
> that's unsafe if a pretty wide variety of cases, but if you're just
> scrutinizing the input string (even with a little bit of read-only
> database access) it's not obvious to me what can go wrong.  (I assume
> there is something, but I don't know what it is...)

That would be interesting.  You would need to flag that this was not a
longjump requiring cleanup, but the harder part would be getting back to
where the error occurred.  I guess you could rerun the query.  :-|

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Re: What happened to the is_ family of functions proposal?

От
Jeff Davis
Дата:
On Tue, 2010-09-21 at 19:55 -0400, Robert Haas wrote:
> OK.  This is one of the things I don't understand.  Why does throwing
> an error imply that we need to abort the current transaction?  Why
> can't we just catch the longjmp() and trundle onwards?  Obviously,
> that's unsafe if a pretty wide variety of cases, but if you're just
> scrutinizing the input string (even with a little bit of read-only
> database access) it's not obvious to me what can go wrong.  (I assume
> there is something, but I don't know what it is...)

The worry (from me) would be the "little bit of read-only database
access". If you SPI_connect() without an SPI_finish(), that sounds like
a potential problem (as would anything else that requires cleanup).

Regards,Jeff Davis



Re: What happened to the is_ family of functions proposal?

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> On Tue, Sep 21, 2010 at 7:05 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> There are many rules that you could possibly make for type input
>> functions. �But "you cannot throw an error" is not one of them ---
>> or at least, not one that you can usefully expect to be followed
>> for anything more than trivial straightline code.

> OK.  This is one of the things I don't understand.  Why does throwing
> an error imply that we need to abort the current transaction?  Why
> can't we just catch the longjmp() and trundle onwards?  Obviously,
> that's unsafe if a pretty wide variety of cases, but if you're just
> scrutinizing the input string (even with a little bit of read-only
> database access) it's not obvious to me what can go wrong.

The problem is to know that "all you did" was scrutinize the input
string.  If it's simple straightline code (even with some C library
calls) then you can know that, but then you can write such code without
including any elog(ERROR) in it in the first place.  If you are trapping
longjmps then what you'd need to assert is that no error thrown from
anywhere in any of the code reachable from that place represents a
problem that requires transaction abort to clean up after.  This gets
unmaintainable remarkably quickly, especially if you invoke anything
as complicated as database access.  And then there are asynchronous
error reasons (query cancel) which you shouldn't trap in any case.
        regards, tom lane


Re: What happened to the is_ family of functions proposal?

От
Robert Haas
Дата:
On Fri, Sep 24, 2010 at 3:41 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Tue, Sep 21, 2010 at 7:05 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> There are many rules that you could possibly make for type input
>>> functions.  But "you cannot throw an error" is not one of them ---
>>> or at least, not one that you can usefully expect to be followed
>>> for anything more than trivial straightline code.
>
>> OK.  This is one of the things I don't understand.  Why does throwing
>> an error imply that we need to abort the current transaction?  Why
>> can't we just catch the longjmp() and trundle onwards?  Obviously,
>> that's unsafe if a pretty wide variety of cases, but if you're just
>> scrutinizing the input string (even with a little bit of read-only
>> database access) it's not obvious to me what can go wrong.
>
> The problem is to know that "all you did" was scrutinize the input
> string.  If it's simple straightline code (even with some C library
> calls) then you can know that, but then you can write such code without
> including any elog(ERROR) in it in the first place.  If you are trapping
> longjmps then what you'd need to assert is that no error thrown from
> anywhere in any of the code reachable from that place represents a
> problem that requires transaction abort to clean up after.  This gets
> unmaintainable remarkably quickly, especially if you invoke anything
> as complicated as database access.  And then there are asynchronous
> error reasons (query cancel) which you shouldn't trap in any case.

Hmm.  So the problem is that we don't want to accidentally catch an
error that isn't actually safe to catch.  We could probably mitigate
this problem to a considerable degree by throwing data validation
errors using some special flag that say "this is a recoverable error".And if that flag isn't set then we abort the
wholetransaction, but 
if it is then we continue on.  It's still possible for the person
writing the typinput function to set that flag when they should not,
but at least it's less likely to happen by accident.  Another
alternative would be to create some kind of explicit way for the
function to RETURN an error instead of throwing it.

But neither of these things is totally bullet-proof, because you could
still do something that requires clean-up and then lie about it.  To
protect against that, you'd presumably need to set some kind of a flag
whenever, say, a heap tuple gets modified, and then you could assert
said flag false.  What, other than writing to the database, requires
subtransaction cleanup?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


Re: What happened to the is_ family of functions proposal?

От
Darren Duncan
Дата:
Colin 't Hart wrote:
> The fact that this wraps would seem to me to make the implementation of 
> is_date() difficult.

Having separate is_foo() syntax per type is a bad design idea, same as having a 
different equality test like eq_int() or assignment syntax like assign_str() per 
type.

There should just be a single syntax that works for all types, in the general 
case, for testing whether a value is a member of that type, or alternately 
whether a value can be cast to a particular type.

For example, one could say "is_type( <value>, <type-name> )" or it could be 
spelled "isa()" or if you wanted to be more ambitious it could be an infix op, 
like "<value> isa <type-name>" to test when a value is of a type already.

Pg already gets it right in this regard by having a single general syntax for 
type casting, the "<value>::<type-name>" and value membership of a type should 
be likewise.

Maybe to test if a value can be cast as a type, you can continue the :: 
mnemonic, say adding a "?" for yes and a "!" for no.

For example, "<value>?::<type-name>" tests if the value can be cast as the type 
and "<value>!::<type-name>" or "not <value>?::<type-name>" tests the opposite. 
An expression like this results in a boolean.

-- Darren Duncan


Re: What happened to the is_ family of functions proposal?

От
Craig Ringer
Дата:
On 25/09/2010 11:51 AM, Darren Duncan wrote:
> Colin 't Hart wrote:
>> The fact that this wraps would seem to me to make the implementation
>> of is_date() difficult.
>
> Having separate is_foo() syntax per type is a bad design idea, same as
> having a different equality test like eq_int() or assignment syntax like
> assign_str() per type.
>
> There should just be a single syntax that works for all types, in the
> general case, for testing whether a value is a member of that type, or
> alternately whether a value can be cast to a particular type.

Good point. That also permits a general-case implementation that catches 
any exceptions thrown, with optimized exception-free cases for 
int/date/string etc behind the scenes.

That'd do a good job of protecting the SQL programmer from having to 
deal with which types had tests and which they had to use exception 
handling for. It'd have to be documented due to the performance 
differences, but it'd be great to be able to do this in a general 
purpose way at the SQL level.

> Pg already gets it right in this regard by having a single general
> syntax for type casting, the "<value>::<type-name>" and value membership
> of a type should be likewise.

or the standard:

CAST(value AS typename)

> Maybe to test if a value can be cast as a type, you can continue the ::
> mnemonic, say adding a "?" for yes and a "!" for no.
>
> For example, "<value>?::<type-name>" tests if the value can be cast as
> the type and "<value>!::<type-name>" or "not <value>?::<type-name>"
> tests the opposite. An expression like this results in a boolean.

Personal opinion here: Blech, if I wanted to use Perl6 I'd do so ;-)

Good shorthand, I guess, but a CAST syntax extension or alternate CAST 
version would be a bonus for readability.

-- 
Craig Ringer

Tech-related writing at http://soapyfrogs.blogspot.com/


Re: What happened to the is_ family of functions proposal?

От
Darren Duncan
Дата:
Craig Ringer wrote:
> On 25/09/2010 11:51 AM, Darren Duncan wrote:
>> There should just be a single syntax that works for all types, in the
>> general case, for testing whether a value is a member of that type, or
>> alternately whether a value can be cast to a particular type.
<snip>
>> Pg already gets it right in this regard by having a single general
>> syntax for type casting, the "<value>::<type-name>" and value membership
>> of a type should be likewise.
> 
> or the standard:
> 
> CAST(value AS typename)

Indeed.  The exact syntax doesn't matter to me but the point is that the type 
name is its own lexical element, conceptually a function argument, rather than 
being a substring of another one.

>> Maybe to test if a value can be cast as a type, you can continue the ::
>> mnemonic, say adding a "?" for yes and a "!" for no.
>>
>> For example, "<value>?::<type-name>" tests if the value can be cast as
>> the type and "<value>!::<type-name>" or "not <value>?::<type-name>"
>> tests the opposite. An expression like this results in a boolean.
> 
> Personal opinion here: Blech, if I wanted to use Perl6 I'd do so ;-)

I see that someone has been paying attention.

Yes, the idea of using ? or ! to derive a boolean expression from some other 
expression did indeed come from Perl 6.  The ? means "is so", ! means "is not".  A very useful mnemonic in general.

> Good shorthand, I guess, but a CAST syntax extension or alternate CAST 
> version would be a bonus for readability.

Well, sure.  But something consistent with cast syntax that Pg provides.

-- Darren Duncan


Re: What happened to the is_ family of functions proposal?

От
Gurjeet Singh
Дата:
On Fri, Sep 24, 2010 at 11:15 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Sep 24, 2010 at 3:41 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Tue, Sep 21, 2010 at 7:05 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> There are many rules that you could possibly make for type input
>>> functions.  But "you cannot throw an error" is not one of them ---
>>> or at least, not one that you can usefully expect to be followed
>>> for anything more than trivial straightline code.
>
>> OK.  This is one of the things I don't understand.  Why does throwing
>> an error imply that we need to abort the current transaction?  Why
>> can't we just catch the longjmp() and trundle onwards?  Obviously,
>> that's unsafe if a pretty wide variety of cases, but if you're just
>> scrutinizing the input string (even with a little bit of read-only
>> database access) it's not obvious to me what can go wrong.
>
> The problem is to know that "all you did" was scrutinize the input
> string.  If it's simple straightline code (even with some C library
> calls) then you can know that, but then you can write such code without
> including any elog(ERROR) in it in the first place.  If you are trapping
> longjmps then what you'd need to assert is that no error thrown from
> anywhere in any of the code reachable from that place represents a
> problem that requires transaction abort to clean up after.  This gets
> unmaintainable remarkably quickly, especially if you invoke anything
> as complicated as database access.  And then there are asynchronous
> error reasons (query cancel) which you shouldn't trap in any case.

Hmm.  So the problem is that we don't want to accidentally catch an
error that isn't actually safe to catch.  We could probably mitigate
this problem to a considerable degree by throwing data validation
errors using some special flag that say "this is a recoverable error".
 And if that flag isn't set then we abort the whole transaction, but
if it is then we continue on.  It's still possible for the person
writing the typinput function to set that flag when they should not,
but at least it's less likely to happen by accident.  Another
alternative would be to create some kind of explicit way for the
function to RETURN an error instead of throwing it.

But neither of these things is totally bullet-proof, because you could
still do something that requires clean-up and then lie about it.  To
protect against that, you'd presumably need to set some kind of a flag
whenever, say, a heap tuple gets modified, and then you could assert
said flag false.  What, other than writing to the database, requires
subtransaction cleanup?


Andrew suggested upthread:

<snip>
   test_date := date_in(textout(some_text));

In plpgsql you'd put that inside a begin/exception/end block that traps SQLSTATE '22000' which is the class covering data exceptions.
  </snip>


    In context of COPY, can we check for this SQLSTATE in PG_CATCH() and avoid PG_RE_THROW(). This would require that all type input functions call errcode() with proper MAKE_SQLSTATE() as part of ereport()/elog().

Regards,
--
gurjeet.singh
@ EnterpriseDB - The Enterprise Postgres Company
http://www.EnterpriseDB.com

singh.gurjeet@{ gmail | yahoo }.com
Twitter/Skype: singh_gurjeet

Mail sent from my BlackLaptop device

Re: What happened to the is_ family of functions proposal?

От
Tom Lane
Дата:
Gurjeet Singh <singh.gurjeet@gmail.com> writes:
> On Fri, Sep 24, 2010 at 11:15 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> Hmm.  So the problem is that we don't want to accidentally catch an
>> error that isn't actually safe to catch.  We could probably mitigate
>> this problem to a considerable degree by throwing data validation
>> errors using some special flag that say "this is a recoverable error".

> Andrew suggested upthread:
> <snip>
>    test_date := date_in(textout(some_text));
> In plpgsql you'd put that inside a begin/exception/end block that traps
> SQLSTATE '22000' which is the class covering data exceptions.
>   </snip>

>     In context of COPY, can we check for this SQLSTATE in PG_CATCH() and
> avoid PG_RE_THROW(). This would require that all type input functions call
> errcode() with proper MAKE_SQLSTATE() as part of ereport()/elog().

This is all pretty much a dead end, because it offers no confidence
whatsoever.  Suppose that COPY calls type X's input function, which
calls function Y, which calls function Z.  Z doesn't like what it sees
so it throws an error, which it marks "recoverable" since Z hasn't
done anything dangerous.  Unfortunately, Y *did* do something that
requires cleanup.  If COPY catches the longjmp and decides that it
can skip doing a transaction abort, you're screwed.

Again, the problem doesn't really arise as long as you're thinking about
simple code.  You can fairly easily verify that no unsafe situation
arises so long as there's only one or two layers of functions involved.
But this approach just doesn't scale to anything complicated.


What I'm wondering is whether we can fix this by reducing the overhead
of subtransactions, enough so that we can afford to run each row's input
function calls within a subxact.  In the past that was dismissed because
you'd run out of subxact XIDs at 4G rows.  But we have "lazy" assignment
of XIDs now, so a subxact that didn't actually try to modify the
database shouldn't need to consume any permanent resources.  Then we're
just looking at the time needed to call all the per-module subxact start
and subxact cleanup functions, which seems like something that might be
optimizable for the typical case where nothing actually needs to be
done.
        regards, tom lane


Re: What happened to the is_ family of functions proposal?

От
Robert Haas
Дата:
On Sat, Sep 25, 2010 at 10:34 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> This is all pretty much a dead end, because it offers no confidence
> whatsoever.  Suppose that COPY calls type X's input function, which
> calls function Y, which calls function Z.  Z doesn't like what it sees
> so it throws an error, which it marks "recoverable" since Z hasn't
> done anything dangerous.  Unfortunately, Y *did* do something that
> requires cleanup.  If COPY catches the longjmp and decides that it
> can skip doing a transaction abort, you're screwed.

Yep.  Although it seems a bit pathological for Z to do that, because
if Y is doing something like taking an LWLock then Z is some low-level
internals function that is not in a good position to judge whether
error recovery is feasible.  The point is not to recover from as many
errors as possible, but to recover specifically from *data validation*
errors, which I would not expect to be the sort of thing thrown from
someplace deep down in the call stack where we're deep in the middle
of things.  The toplevel typinput is a pretty good position to know
whether it's done anything shady.

> What I'm wondering is whether we can fix this by reducing the overhead
> of subtransactions, enough so that we can afford to run each row's input
> function calls within a subxact.  In the past that was dismissed because
> you'd run out of subxact XIDs at 4G rows.  But we have "lazy" assignment
> of XIDs now, so a subxact that didn't actually try to modify the
> database shouldn't need to consume any permanent resources.  Then we're
> just looking at the time needed to call all the per-module subxact start
> and subxact cleanup functions, which seems like something that might be
> optimizable for the typical case where nothing actually needs to be
> done.

Well, reducing the overhead of subtransaction cleanup would certainly
be VERY nice, as it would benefit a FAR broader set of use cases than
just typinput functions.  It seems a bit tricky though, because
AbortSubTransaction() calls a whole LOT of cleanup functions, and many
of them already have fast-paths.  Where do you anticipate getting a
further large speed-up out of that?  The problem seems particularly
tricky because those functions are cleaning up different subsystems.
Maybe you could group them in some way and figure out some method of
skipping entire groups with some kind of super-duper fast path, but
it's not obvious to me how to make that work.  And I think you'd need
a pretty considerable speed-up, too.  My gut says that even knocking
50% off, while it might be really nice for other reasons, is not going
to be enough to make sticking it inside COPY workable.  I bet you need
an order-of-magnitude speed-up, maybe more.

It seems like a good slice of the problem here comes from the
difficulties of being certain what the state is after a longjmp.  It
seems like you could get around all of these difficulties almost
completely if the type input function were empowered to return either
(1) a Datum which is the result of the conversion or (2) an SQLSTATE
and error message indicating what went wrong.  We're already willing
to believe that cleanup isn't required when the function returns
successfully, so we ought to also believe it when the function returns
a failure result (as opposed to throwing an error indicating a
failure).  The conditions that require cleanup here are probably
transient: take an LWLock, do something, release the LWLock.  As long
as you know that you haven't stopped somewhere in the middle of that
sequence, it seems like it should be reasonably safe.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company