Обсуждение: BUG #4862: different results in to_date() between 8.3.7 & 8.4.RC1
The following bug has been logged online: Bug reference: 4862 Logged by: Jeremy Ford Email address: jeremford@gmail.com PostgreSQL version: 8.4 RC1 Operating system: Fedora 10 i386 Description: different results in to_date() between 8.3.7 & 8.4.RC1 Details: Running the following query on PG 8.3.7 select to_char(2009,'9999') as year, to_char(3,'09') as month, to_date(to_char(2009,'9999')||to_char(3,'99') ,'YYYYMM') as method1, to_date(to_char(2009,'9999')||'-'||to_char(3,'09') || '-01','YYYY-MM-DD') as method2 results in: year, month, method1, method2 " 2009";" 03";"2009-03-01";"2009-03-01" Running exactly the same query on PG 8.4.RC1 gives: year, month, method1, method2 " 2009";" 03";"0200-09-01";"2009-03-01" PG 8.3.7 âmethod1â = "2009-03-01" PG 8.4.RC1 âmethod1â = "0200-09-01" Both databases were on the same machine â Fedora 10 â 32bit (GCC 4.3.2). Server configured for Australian timezone/usage. I realize I should have slapped a trim() around the to_char(), but thought it worth noting the difference anyway. Cheers, Jeremy.
"Jeremy Ford" <jeremford@gmail.com> writes:
> select
>   to_char(2009,'9999') as year,
>   to_char(3,'09') as month,
>   to_date(to_char(2009,'9999')||to_char(3,'99') ,'YYYYMM') as method1,
>   to_date(to_char(2009,'9999')||'-'||to_char(3,'09') || '-01','YYYY-MM-DD')
> as method2
Or, eliminating the extraneous stuff, the point is that
regression=# select to_date(' 2009 07', 'YYYYMM');
  to_date
------------
 0200-09-01
(1 row)
doesn't do what it used to.  Ordinarily I might say "well, if you want
leading spaces you need to say that in the format", viz
regression=# select to_date(' 2009 07', ' YYYYMM');
  to_date
------------
 2009-07-01
(1 row)
However, that just begs the question --- it seems that leading space is
allowed in MM, just not in YYYY.  Brendan, is that intentional or is it
a bug?
            regards, tom lane
			
		2009/6/19 Tom Lane <tgl@sss.pgh.pa.us>:
> regression=3D# select to_date(' 2009 07', ' YYYYMM');
> =A0to_date
> ------------
> =A02009-07-01
> (1 row)
>
> However, that just begs the question --- it seems that leading space is
> allowed in MM, just not in YYYY. =A0Brendan, is that intentional or is it
> a bug?
>
The reason the space between YYYY and MM is ignored isn't to do with
any special treatment of MM, rather it's to do with special treatment
of the end-of-string.  Observe:
postgres=3D# select to_date('200906 19', 'YYYYMMDD');
  to_date
------------
 2009-06-19
(1 row)
What's going on here is that from_char_parse_int_len() has two
different approaches to capturing characters from the input.  The
normal mode is to pull an exact number of characters out of the
string, as per the format node; for DD we pull 2 characters, for YYYY
we pull 4, and so on.  However, if the FM (fill mode) flag is given,
or if the next node is known to be a non-digit character, we take a
much more tolerant approach and let strtol() grab as many characters
as it cares to. [1]
The reason for this technique is that it allows us to get away with
things like this:
postgres=3D# select to_date('2-6-2009', 'DD-MM-YYYY');
  to_date
------------
 2009-06-02
(1 row)
Or, to put it another way, the presence of separator characters trumps
a strict character-by-character interpretation of the format string.
The code treats the end-of-string as such a separator, so in your '
MM' example, the code lets strtol() off its leash and all the
remaining characters are fed into the month field.
This special treatment of separators was actually in the code long
before I got my hands on it, and I wasn't keen to change it -- I
feared that flipping that little quirk on its head would cause even
more breakage.
I hope that answers your question.  to_date() is by nature a weird
beast with many strange corners in its behaviour, and it's hard to
strike a balance between backwards compatibility and Least
Astonishment.  My personal preference would be for a 100% strict
interpretation of the format pattern, and a pox on anyone who has been
relying on sloppy patterns!  But that's not very practical.  I would
welcome any suggestions for further refinements.
Cheers,
BJ
[1] src/backend/utils/adt/formatting.c:1846
			
		Brendan Jurd <direvus@gmail.com> writes:
> I hope that answers your question.  to_date() is by nature a weird
> beast with many strange corners in its behaviour, and it's hard to
> strike a balance between backwards compatibility and Least
> Astonishment.  My personal preference would be for a 100% strict
> interpretation of the format pattern, and a pox on anyone who has been
> relying on sloppy patterns!  But that's not very practical.  I would
> welcome any suggestions for further refinements.
My feeling about it is that we usually try to match Oracle's behavior
for to_date/to_char, so the $64 question is whether Oracle allows a
leading space in these same cases.  Anyone have it handy to test?
            regards, tom lane
			
		Oracle 9i: YEAR MONTH METHOD1 METHOD2 2009 03 1/03/2009 1/03/2009 Oracle 10g: YEAR MONTH METHOD1 METHOD2 2009 03 1/03/2009 1/03/2009 Regards, Jeremy. On Sat, Jun 20, 2009 at 2:03 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Brendan Jurd <direvus@gmail.com> writes: > > I hope that answers your question. to_date() is by nature a weird > > beast with many strange corners in its behaviour, and it's hard to > > strike a balance between backwards compatibility and Least > > Astonishment. My personal preference would be for a 100% strict > > interpretation of the format pattern, and a pox on anyone who has been > > relying on sloppy patterns! But that's not very practical. I would > > welcome any suggestions for further refinements. > > My feeling about it is that we usually try to match Oracle's behavior > for to_date/to_char, so the $64 question is whether Oracle allows a > leading space in these same cases. Anyone have it handy to test? > > regards, tom lane >
2009/6/22 Jeremy Ford <jeremford@gmail.com>:
> Oracle 9i:
> =A0YEAR MONTH METHOD1=A0=A0=A0=A0 METHOD2
>
> =A02009 03=A0=A0 1/03/2009=A0=A0 1/03/2009
>
> Oracle 10g:
> YEAR=A0=A0=A0=A0=A0 MONTH=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 ME=
THOD1=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 METHOD2
>
> =A02009=A0=A0=A0=A0=A0 03=A0=A0=A0=A0=A0=A0=A0=A0=A0 1/03/2009=A0=A0=A0=
=A0=A0=A0=A0=A0=A0=A0=A0 1/03/2009
>
Hi Jeremy,
The query you used above might not tell us the whole story, because
you also have calls to to_char() in there (which is where those
leading spaces are coming from), and I have no idea whether Oracle's
to_char() also adds those leading spaces.
It would be very helpful if you'd test the following against Oracle
and let us know what you get back, just to totally rule out any
interference from to_char():
SELECT to_date(' 2009 03', 'YYYYMM') as nospace;
SELECT to_date(' 2009 03', 'YYYY MM') as monthspace;
SELECT to_date(' 2009 03', ' YYYY MM') as bothspaces;
SELECT to_date(' 2009 03', '  YYYY MM') as extraspace;
SELECT to_date('2009 03', ' YYYY MM') as bogusspace;
> On Sat, Jun 20, 2009 at 2:03 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> My feeling about it is that we usually try to match Oracle's behavior
>> for to_date/to_char, so the $64 question is whether Oracle allows a
>> leading space in these same cases. =A0Anyone have it handy to test?
If it turns out that Oracle does ignore leading spaces, we might want
to just add a special case in the input string scanner to skip over
any whitespace at the beginning of the string before we begin
processing in earnest.
Depending on the results from the test cases I posted above, we might
want to do something similar for the format string also.
Cheers,
BJ
			
		Hi Brendan.
Results as requested - Oracle 10g:
SELECT TO_DATE(' 2009 03', 'YYYYMM') AS nospace FROM dual
NOSPACE
----------------------
1/03/2009
1 row selected
SELECT TO_DATE(' 2009 03', 'YYYY MM') AS monthspace FROM dual
MONTHSPACE
----------------------
1/03/2009
1 row selected
SELECT TO_DATE(' 2009 03', ' YYYY MM') AS bothspaces FROM dual
BOTHSPACES
----------------------
1/03/2009
1 row selected
SELECT TO_DATE(' 2009 03', '  YYYY MM') AS extraspace FROM dual
EXTRASPACE
----------------------
1/03/2009
1 row selected
SELECT TO_DATE('2009 03', ' YYYY MM') AS bogusspace FROM dual
BOGUSSPACE
----------------------
1/03/2009
1 row selected
Hope this helps,
Jeremy.
On Mon, Jun 22, 2009 at 4:39 PM, Brendan Jurd <direvus@gmail.com> wrote:
> 2009/6/22 Jeremy Ford <jeremford@gmail.com>:
> > Oracle 9i:
> >  YEAR MONTH METHOD1     METHOD2
> >
> >  2009 03   1/03/2009   1/03/2009
> >
> > Oracle 10g:
> > YEAR      MONTH                METHOD1           METHOD2
> >
> >  2009      03          1/03/2009            1/03/2009
> >
>
> Hi Jeremy,
>
> The query you used above might not tell us the whole story, because
> you also have calls to to_char() in there (which is where those
> leading spaces are coming from), and I have no idea whether Oracle's
> to_char() also adds those leading spaces.
>
> It would be very helpful if you'd test the following against Oracle
> and let us know what you get back, just to totally rule out any
> interference from to_char():
>
> SELECT to_date(' 2009 03', 'YYYYMM') as nospace;
> SELECT to_date(' 2009 03', 'YYYY MM') as monthspace;
> SELECT to_date(' 2009 03', ' YYYY MM') as bothspaces;
> SELECT to_date(' 2009 03', '  YYYY MM') as extraspace;
> SELECT to_date('2009 03', ' YYYY MM') as bogusspace;
>
> > On Sat, Jun 20, 2009 at 2:03 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> My feeling about it is that we usually try to match Oracle's behavior
> >> for to_date/to_char, so the $64 question is whether Oracle allows a
> >> leading space in these same cases.  Anyone have it handy to test?
>
> If it turns out that Oracle does ignore leading spaces, we might want
> to just add a special case in the input string scanner to skip over
> any whitespace at the beginning of the string before we begin
> processing in earnest.
>
> Depending on the results from the test cases I posted above, we might
> want to do something similar for the format string also.
>
> Cheers,
> BJ
>
			
		2009/6/22 Jeremy Ford <jeremford@gmail.com>: > Hi Brendan. > > Results as requested - Oracle 10g: Cheers for that Jeremy. I think we can safely conclude from these results that Oracle pays no attention whatsoever to leading spaces in either the input string, or the format string. If we want to copy this behaviour, I think the best way forward is to add some code that deliberately skips over any initial whitespace in either string. Any comments about this approach? I should be able to work up a patch over the next couple days. Cheers, BJ
Brendan Jurd <direvus@gmail.com> writes:
> I think we can safely conclude from these results that Oracle pays no
> attention whatsoever to leading spaces in either the input string, or
> the format string.
> If we want to copy this behaviour, I think the best way forward is to
> add some code that deliberately skips over any initial whitespace in
> either string.  Any comments about this approach?  I should be able to
> work up a patch over the next couple days.
I'm hesitant to fool with this post-RC, and we don't have "a couple of
days" to wait for a fix --- 8.4.0 is scheduled to wrap on Thursday.
However, I see that the first of these cases actually represents a
regression from 8.3 and before; existing releases get 2009-03-01 out
of it while 8.4 fails to.  So it seems we'd better do something.
            regards, tom lane
			
		2009/6/23 Tom Lane <tgl@sss.pgh.pa.us>:
> I'm hesitant to fool with this post-RC, and we don't have "a couple of
> days" to wait for a fix --- 8.4.0 is scheduled to wrap on Thursday.
> However, I see that the first of these cases actually represents a
> regression from 8.3 and before; existing releases get 2009-03-01 out
> of it while 8.4 fails to. =A0So it seems we'd better do something.
Ugh, I just noticed that even something like the following will work in 8.3:
8.3=3D# select to_date(' 2009 03 01', 'YYYYMMDD');
So it's not as simple as just treating whitespace at the *beginning*
of the string specially.  There's something else going on.
[after trawling in the 8.3 code]
I think I've found the regression.  The 8.3 code was using sscanf to
harvest integers, and was then calling a local function strspace_len()
to skip over any whitespace immediately before the integer.
So, any whitespace preceding any integer field was being explicitly
bypassed inside the "no separator" code path.  Although oddly it
doesn't bypass whitespace preceding *text* fields:
8.3=3D# select to_date('2009 Mar02', 'YYYYMonDD');
ERROR:  invalid value for MON/Mon/mon
I should be able to get the same results by snipping an extra
strspace_len() characters in the new code path in
from_char_parse_int_len().  This ought to be a one-line fix that
doesn't clobber the good parts of my work so far.  I'll run some tests
and then post a patch within the hour.  Does that work for you?
Cheers,
BJ
			
		Brendan Jurd <direvus@gmail.com> writes:
> I should be able to get the same results by snipping an extra
> strspace_len() characters in the new code path in
> from_char_parse_int_len().  This ought to be a one-line fix that
> doesn't clobber the good parts of my work so far.  I'll run some tests
> and then post a patch within the hour.  Does that work for you?
+1
            regards, tom lane
			
		2009/6/23 Tom Lane <tgl@sss.pgh.pa.us>: > Brendan Jurd <direvus@gmail.com> writes: >> I should be able to get the same results by snipping an extra >> strspace_len() characters in the new code path in >> from_char_parse_int_len(). This ought to be a one-line fix that >> doesn't clobber the good parts of my work so far. I'll run some tests >> and then post a patch within the hour. Does that work for you? > > +1 > [cross-posting to -hackers] Here's a one-line patch to fix a regression in the new from_char code I introduced into 8.4. Versions <= 8.3 skipped over any whitespace immediately preceding any integer field, and this behaviour was lost in HEAD when my from_char patch was committed back in September '08 [1]. Fortunately, since the code has been refactored, this now only needs to be repaired in one place =) With thanks to Jeremy Ford for sending in his bug report, and an abject mea culpa. Cheers, BJ [1] http://git.postgresql.org/gitweb?p=postgresql.git;a=commitdiff;h=6f09ab2c24491a217f8b88012aababf3b723b902
Вложения
Brendan Jurd <direvus@gmail.com> writes:
> Here's a one-line patch to fix a regression in the new from_char code
> I introduced into 8.4.
> Versions <= 8.3 skipped over any whitespace immediately preceding any
> integer field, and this behaviour was lost in HEAD when my from_char
> patch was committed back in September '08 [1].
Applied along with some regression test additions.  Thanks for the
quick response.
        regards, tom lane
			
		2009/6/24 Jeremy Ford <jeremford@gmail.com>:
> I've just compiled and run the 8.4.RC2 code. For both of the following
> queries I get "0009-03-01"
>
> SELECT to_date(' 2009 03', '  YYYY MM') as extraspace; --returns
> "0009-03-01"
> SELECT to_date('2009 03', ' YYYY MM') as bogusspace; --returns "0009-03-01"
>
> Was it the intention to imitate Oracle behavior for these two cases in this
> release? (8.3.7 returns "0009-03-01" as well)
I think, at this stage (so close to release) we're just trying to keep
up a reasonable compatibility with 8.3 and earlier.  The fact that the
"bogus space" case doesn't match the Oracle behaviour might be fertile
ground for future improvement in the 8.5 cycle.
Thanks for testing!
Cheers,
BJ
			
		I've just compiled and run the 8.4.RC2 code. For both of the following queries I get "0009-03-01"<br /><br /> SELECT
to_date('2009 03', '  YYYY MM') as extraspace; --returns "0009-03-01"<br /> SELECT to_date('2009 03', ' YYYY MM') as
bogusspace;--returns "0009-03-01"<br /><br /> Was it the intention to imitate Oracle behavior for these two cases in
thisrelease? (8.3.7 returns "0009-03-01" as well)<br /><br /> The others in that set of queries all work as expected
("2009-03-01"):<br/> SELECT to_date(' 2009 03', 'YYYYMM') as nospace; --returns "2009-03-01"<br /> SELECT to_date('
200903', 'YYYY MM') as monthspace; --returns "2009-03-01"<br /> SELECT to_date(' 2009 03', ' YYYY MM') as bothspaces;
--returns"2009-03-01"<br /><br />cheers, jeremy.<br /><br /><div class="gmail_quote">On Tue, Jun 23, 2009 at 3:55 AM,
TomLane <span dir="ltr"><<a href="mailto:tgl@sss.pgh.pa.us">tgl@sss.pgh.pa.us</a>></span> wrote:<br /><blockquote
class="gmail_quote"style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left:
1ex;"><divclass="im">Brendan Jurd <<a href="mailto:direvus@gmail.com">direvus@gmail.com</a>> writes:<br
/></div><divclass="im">> Here's a one-line patch to fix a regression in the new from_char code<br /> > I
introducedinto 8.4.<br /><br /> > Versions <= 8.3 skipped over any whitespace immediately preceding any<br />
>integer field, and this behaviour was lost in HEAD when my from_char<br /> > patch was committed back in
September'08 [1].<br /><br /></div>Applied along with some regression test additions.  Thanks for the<br /> quick
response.<br/><br />                        regards, tom lane<br /></blockquote></div><br />