Обсуждение: Created feature for to_date() conversion using patterns 'YYYY-WW','YYYY-WW-D', 'YYYY-MM-W' and 'YYYY-MM-W-D'

Поиск
Список
Период
Сортировка
Hi,

some days ago I ran into a problem with the to_date() function. I 
originally described it on StackExchange:
https://dba.stackexchange.com/questions/250111/unexpected-behaviour-for-to-date-with-week-number-and-week-day

The problem:

If you want to parse a date string with year, week and day of week, you 
can do this using the ISO week pattern: 'IYYY-IW-ID'. This works as 
expected:

date string |  to_date()
------------+------------
'2019-1-1'  |  2018-12-31  -> Monday of the first week of the year 
(defined as the week that includes the 4th of January)
'2019-1-2'  |  2019-01-01
'2019-1-3'  |  2019-01-02
'2019-1-4'  |  2019-01-03
'2019-1-5'  |  2019-01-04
'2019-1-6'  |  2019-01-05
'2019-1-7'  |  2019-01-06

'2019-2-1'  |  2019-01-07
'2019-2-2'  |  2019-01-08

But if you are trying this with the non-ISO pattern 'YYYY-WW-D', the 
result was not expected:

date string |  to_date()
-------------------------
'2019-1-1'  |  2019-01-01
'2019-1-2'  |  2019-01-01
'2019-1-3'  |  2019-01-01
'2019-1-4'  |  2019-01-01
'2019-1-5'  |  2019-01-01
'2019-1-6'  |  2019-01-01
'2019-1-7'  |  2019-01-01

'2019-2-1'  |  2019-01-08
'2019-2-2'  |  2019-01-08

As you can see, the 'D' part of the pattern doesn't influence the 
resulting date.

The answer of Laurenz Albe pointed to a part of the documentation, I 
missed so far:

"In to_timestamp and to_date, weekday names or numbers (DAY, D, and 
related field types) are accepted but are ignored for purposes of 
computing the result. The same is true for quarter (Q) fields." 
(https://www.postgresql.org/docs/12/functions-formatting.html)

So, I had a look at the relevant code part. I decided to try a patch by 
myself. Now it works as I would expect it:

date string |  to_date()
-------------------------
'2019-1-1'  |  2018-12-30 -> Sunday (!) of the first week of the year 
(the first week is at the first day of year)
'2019-1-2'  |  2018-12-31
'2019-1-3'  |  2019-01-01
'2019-1-4'  |  2019-01-02
'2019-1-5'  |  2019-01-03
'2019-1-6'  |  2019-01-04
'2019-1-7'  |  2019-01-05

'2019-2-1'  |  2019-01-06
'2019-2-2'  |  2019-01-07

Furthermore, if you left the 'D' part, the date would be always set to 
the first day of the corresponding week (in that case it is Sunday, in 
contrast to the ISO week, which starts mondays).

To be consistent, I added similar code for the week of month pattern 
('W'). So, using the pattern 'YYYY-MM-W-D' yields in:

date string   |  to_date()
---------------------------
'2018-12-5-1' |  2018-12-23
'2018-12-6-1' |  2018-12-30
'2019-1-1-1'  |  2018-12-30 -> First day (Su) of the first week of the 
first month of the year
'2019-2-2-1'  |  2019-02-03 -> First day (Su) of the second week of 
February
'2019-10-3-5' |  2019-10-17 -> Fifth day (Th) of the third week of 
October

If you left the 'D', it would be set to 1 as well.

The code can be seen here:
https://github.com/S-Man42/postgres/commit/534e6bd70e23864f385d60ecf187496c7f4387c9

I hope, keeping the code style of the surrounding code (especially the 
ISO code) is ok for you.

Now the questions:
1. Although the ignorance of the 'D' pattern is well documented, does 
the new behaviour might be interesting for you?
2. Does it work as you'd expect it?
3. Because this could be my very first contribution to the PostgreSQL 
code base, I really want you to be as critical as possible. I am not 
quite sure if I didn't miss something important.
4. Currently something like '2019-1-8' does not throw an exception but 
results in the same as '2019-2-1' (8th is the same as the 1st of the 
next week). On the other hand, currently, the ISO week conversion gives 
out the result of '2019-1-7' for every 'D' >= 7. I am not sure if this 
is better. I think a consistent exception handling should be discussed 
separately (date roll over vs. out of range exception vs. ISO week 
behaviour)

So far, I am very curious about your opinions!

Kind regards,
Mark/S-Man42



Hi,

I apologize for the mistake.

For the mailing list correspondence I created this mail account. But I 
forgot to change the sender name. So, the "postgres" name appeared as 
sender name in the mailing list. I changed it.

Kind regards,
Mark/S-Man42

> Hi,
> 
> some days ago I ran into a problem with the to_date() function. I
> originally described it on StackExchange:
> https://dba.stackexchange.com/questions/250111/unexpected-behaviour-for-to-date-with-week-number-and-week-day
> 
> The problem:
> 
> If you want to parse a date string with year, week and day of week,
> you can do this using the ISO week pattern: 'IYYY-IW-ID'. This works
> as expected:
> 
> date string |  to_date()
> ------------+------------
> '2019-1-1'  |  2018-12-31  -> Monday of the first week of the year
> (defined as the week that includes the 4th of January)
> '2019-1-2'  |  2019-01-01
> '2019-1-3'  |  2019-01-02
> '2019-1-4'  |  2019-01-03
> '2019-1-5'  |  2019-01-04
> '2019-1-6'  |  2019-01-05
> '2019-1-7'  |  2019-01-06
> 
> '2019-2-1'  |  2019-01-07
> '2019-2-2'  |  2019-01-08
> 
> But if you are trying this with the non-ISO pattern 'YYYY-WW-D', the
> result was not expected:
> 
> date string |  to_date()
> -------------------------
> '2019-1-1'  |  2019-01-01
> '2019-1-2'  |  2019-01-01
> '2019-1-3'  |  2019-01-01
> '2019-1-4'  |  2019-01-01
> '2019-1-5'  |  2019-01-01
> '2019-1-6'  |  2019-01-01
> '2019-1-7'  |  2019-01-01
> 
> '2019-2-1'  |  2019-01-08
> '2019-2-2'  |  2019-01-08
> 
> As you can see, the 'D' part of the pattern doesn't influence the
> resulting date.
> 
> The answer of Laurenz Albe pointed to a part of the documentation, I
> missed so far:
> 
> "In to_timestamp and to_date, weekday names or numbers (DAY, D, and
> related field types) are accepted but are ignored for purposes of
> computing the result. The same is true for quarter (Q) fields."
> (https://www.postgresql.org/docs/12/functions-formatting.html)
> 
> So, I had a look at the relevant code part. I decided to try a patch
> by myself. Now it works as I would expect it:
> 
> date string |  to_date()
> -------------------------
> '2019-1-1'  |  2018-12-30 -> Sunday (!) of the first week of the year
> (the first week is at the first day of year)
> '2019-1-2'  |  2018-12-31
> '2019-1-3'  |  2019-01-01
> '2019-1-4'  |  2019-01-02
> '2019-1-5'  |  2019-01-03
> '2019-1-6'  |  2019-01-04
> '2019-1-7'  |  2019-01-05
> 
> '2019-2-1'  |  2019-01-06
> '2019-2-2'  |  2019-01-07
> 
> Furthermore, if you left the 'D' part, the date would be always set to
> the first day of the corresponding week (in that case it is Sunday, in
> contrast to the ISO week, which starts mondays).
> 
> To be consistent, I added similar code for the week of month pattern
> ('W'). So, using the pattern 'YYYY-MM-W-D' yields in:
> 
> date string   |  to_date()
> ---------------------------
> '2018-12-5-1' |  2018-12-23
> '2018-12-6-1' |  2018-12-30
> '2019-1-1-1'  |  2018-12-30 -> First day (Su) of the first week of the
> first month of the year
> '2019-2-2-1'  |  2019-02-03 -> First day (Su) of the second week of 
> February
> '2019-10-3-5' |  2019-10-17 -> Fifth day (Th) of the third week of 
> October
> 
> If you left the 'D', it would be set to 1 as well.
> 
> The code can be seen here:
> https://github.com/S-Man42/postgres/commit/534e6bd70e23864f385d60ecf187496c7f4387c9
> 
> I hope, keeping the code style of the surrounding code (especially the
> ISO code) is ok for you.
> 
> Now the questions:
> 1. Although the ignorance of the 'D' pattern is well documented, does
> the new behaviour might be interesting for you?
> 2. Does it work as you'd expect it?
> 3. Because this could be my very first contribution to the PostgreSQL
> code base, I really want you to be as critical as possible. I am not
> quite sure if I didn't miss something important.
> 4. Currently something like '2019-1-8' does not throw an exception but
> results in the same as '2019-2-1' (8th is the same as the 1st of the
> next week). On the other hand, currently, the ISO week conversion
> gives out the result of '2019-1-7' for every 'D' >= 7. I am not sure
> if this is better. I think a consistent exception handling should be
> discussed separately (date roll over vs. out of range exception vs.
> ISO week behaviour)
> 
> So far, I am very curious about your opinions!
> 
> Kind regards,
> Mark/S-Man42



Hi,

while preparing the patch for the Commitfest, I found a bug in the 
to_char() function that is quite correlated with this issue:

SELECT to_char('1997-02-01'::date, 'YYYY-WW-D')

returns: 1997-05-7 -> which is ok, I believe. Feb, 1st was on Saturday, 
so counting from Sundays, it was day 7 of week 5.

SELECT to_char('1997-02-03'::date, 'YYYY-WW-D')

returns: 1997-05-2 -> This cannot be. The input date is two days laters, 
but the result is 5 days earlier. I'd expect 1997-06-2 as result, but 
this occurs another week later:

SELECT to_char('1997-02-10'::date, 'YYYY-WW-D')

This is wrong, because this should be week 7 instead. On the other hand, 
the ISO week formats work very well.

I'll have a look at the code and try to fix it in the patch as well.

Kind regards,
Mark


Am 2019-10-08 17:49, schrieb Mark Lorenz:
> Hi,
> 
> I apologize for the mistake.
> 
> For the mailing list correspondence I created this mail account. But I
> forgot to change the sender name. So, the "postgres" name appeared as
> sender name in the mailing list. I changed it.
> 
> Kind regards,
> Mark/S-Man42
> 
>> Hi,
>> 
>> some days ago I ran into a problem with the to_date() function. I
>> originally described it on StackExchange:
>> https://dba.stackexchange.com/questions/250111/unexpected-behaviour-for-to-date-with-week-number-and-week-day
>> 
>> The problem:
>> 
>> If you want to parse a date string with year, week and day of week,
>> you can do this using the ISO week pattern: 'IYYY-IW-ID'. This works
>> as expected:
>> 
>> date string |  to_date()
>> ------------+------------
>> '2019-1-1'  |  2018-12-31  -> Monday of the first week of the year
>> (defined as the week that includes the 4th of January)
>> '2019-1-2'  |  2019-01-01
>> '2019-1-3'  |  2019-01-02
>> '2019-1-4'  |  2019-01-03
>> '2019-1-5'  |  2019-01-04
>> '2019-1-6'  |  2019-01-05
>> '2019-1-7'  |  2019-01-06
>> 
>> '2019-2-1'  |  2019-01-07
>> '2019-2-2'  |  2019-01-08
>> 
>> But if you are trying this with the non-ISO pattern 'YYYY-WW-D', the
>> result was not expected:
>> 
>> date string |  to_date()
>> -------------------------
>> '2019-1-1'  |  2019-01-01
>> '2019-1-2'  |  2019-01-01
>> '2019-1-3'  |  2019-01-01
>> '2019-1-4'  |  2019-01-01
>> '2019-1-5'  |  2019-01-01
>> '2019-1-6'  |  2019-01-01
>> '2019-1-7'  |  2019-01-01
>> 
>> '2019-2-1'  |  2019-01-08
>> '2019-2-2'  |  2019-01-08
>> 
>> As you can see, the 'D' part of the pattern doesn't influence the
>> resulting date.
>> 
>> The answer of Laurenz Albe pointed to a part of the documentation, I
>> missed so far:
>> 
>> "In to_timestamp and to_date, weekday names or numbers (DAY, D, and
>> related field types) are accepted but are ignored for purposes of
>> computing the result. The same is true for quarter (Q) fields."
>> (https://www.postgresql.org/docs/12/functions-formatting.html)
>> 
>> So, I had a look at the relevant code part. I decided to try a patch
>> by myself. Now it works as I would expect it:
>> 
>> date string |  to_date()
>> -------------------------
>> '2019-1-1'  |  2018-12-30 -> Sunday (!) of the first week of the year
>> (the first week is at the first day of year)
>> '2019-1-2'  |  2018-12-31
>> '2019-1-3'  |  2019-01-01
>> '2019-1-4'  |  2019-01-02
>> '2019-1-5'  |  2019-01-03
>> '2019-1-6'  |  2019-01-04
>> '2019-1-7'  |  2019-01-05
>> 
>> '2019-2-1'  |  2019-01-06
>> '2019-2-2'  |  2019-01-07
>> 
>> Furthermore, if you left the 'D' part, the date would be always set to
>> the first day of the corresponding week (in that case it is Sunday, in
>> contrast to the ISO week, which starts mondays).
>> 
>> To be consistent, I added similar code for the week of month pattern
>> ('W'). So, using the pattern 'YYYY-MM-W-D' yields in:
>> 
>> date string   |  to_date()
>> ---------------------------
>> '2018-12-5-1' |  2018-12-23
>> '2018-12-6-1' |  2018-12-30
>> '2019-1-1-1'  |  2018-12-30 -> First day (Su) of the first week of the
>> first month of the year
>> '2019-2-2-1'  |  2019-02-03 -> First day (Su) of the second week of 
>> February
>> '2019-10-3-5' |  2019-10-17 -> Fifth day (Th) of the third week of 
>> October
>> 
>> If you left the 'D', it would be set to 1 as well.
>> 
>> The code can be seen here:
>> https://github.com/S-Man42/postgres/commit/534e6bd70e23864f385d60ecf187496c7f4387c9
>> 
>> I hope, keeping the code style of the surrounding code (especially the
>> ISO code) is ok for you.
>> 
>> Now the questions:
>> 1. Although the ignorance of the 'D' pattern is well documented, does
>> the new behaviour might be interesting for you?
>> 2. Does it work as you'd expect it?
>> 3. Because this could be my very first contribution to the PostgreSQL
>> code base, I really want you to be as critical as possible. I am not
>> quite sure if I didn't miss something important.
>> 4. Currently something like '2019-1-8' does not throw an exception but
>> results in the same as '2019-2-1' (8th is the same as the 1st of the
>> next week). On the other hand, currently, the ISO week conversion
>> gives out the result of '2019-1-7' for every 'D' >= 7. I am not sure
>> if this is better. I think a consistent exception handling should be
>> discussed separately (date roll over vs. out of range exception vs.
>> ISO week behaviour)
>> 
>> So far, I am very curious about your opinions!
>> 
>> Kind regards,
>> Mark/S-Man42



Hi,

I fixed the described issue in the to char() function.

The output of the current version is:

postgres=# SELECT to_char('1997-02-01'::date, 'YYYY-WW-D');
  to_char
---------
  1997-05-7
(1 row)

postgres=# SELECT to_char('1997-02-03'::date, 'YYYY-WW-D');
  to_char
---------
  1997-05-2
(1 row)

postgres=# SELECT to_char('1997-02-10'::date, 'YYYY-WW-D');
  to_char
---------
  1997-06-2
(1 row)

As you can see, the week day of the Feb 3rd - which is two days AFTER 
Feb 1st - yields in a result which is 5 days BEFORE the earlier date, 
which obviously cannot be. Furthermore, using the Gregorian calendar, 
Feb 3rd is in week 6. So, the Feb 10th cannot be in week 6 as well.

The bug was, that the week day of Jan 1st was not considered in the 
calculation of the week number. So, a possible offset has not been set.

New output:

postgres=# SELECT to_char('1997-02-03'::date, 'YYYY-WW-D');
  to_char
---------
  1997-06-2
(1 row)

postgres=# SELECT to_char('1997-02-01'::date, 'YYYY-WW-D');
  to_char
---------
  1997-05-7
(1 row)

postgres=# SELECT to_char('1997-02-10'::date, 'YYYY-WW-D');
  to_char
---------
  1997-07-2
(1 row)

-------------------

Furthermore I adjusted the to_date() functionality for the WW-D pattern 
as well. As said before in the thread, I know, ignoring the D part is 
known and documented, but I think, if the ISO format recognizes the day 
part, the non-ISO format should as well - especially when the "back" 
operation does as well (meaning to_char()):

Output in the current version:

postgres=# SELECT to_date('2019-1-1', 'YYYY-WW-D');
   to_date
------------
  2019-01-01
(1 row)

postgres=# SELECT to_date('2019-1-2', 'YYYY-WW-D');
   to_date
------------
  2019-01-01
(1 row)

postgres=# SELECT to_date('2019-1-3', 'YYYY-WW-D');
   to_date
------------
  2019-01-01
(1 row)

postgres=# SELECT to_date('2019-1-7', 'YYYY-WW-D');
   to_date
------------
  2019-01-01
(1 row)

postgres=# SELECT to_date('2019-2-1', 'YYYY-WW-D');
   to_date
------------
  2019-01-08
(1 row)

New output:

postgres=# SELECT to_date('2019-1-1', 'YYYY-WW-D');
   to_date
------------
  2018-12-30
(1 row)

postgres=# SELECT to_date('2019-1-2', 'YYYY-WW-D');
   to_date
------------
  2018-12-31
(1 row)

postgres=# SELECT to_date('2019-1-3', 'YYYY-WW-D');
   to_date
------------
  2019-01-01
(1 row)

postgres=# SELECT to_date('2019-1-7', 'YYYY-WW-D');
   to_date
------------
  2019-01-05
(1 row)

postgres=# SELECT to_date('2019-2-1', 'YYYY-WW-D');
   to_date
------------
  2019-01-06
(1 row)

I added the patch as plain text attachment. It contains the code and, of 
course, the regression tests. Some existing tests failed, because they 
worked with the old output. I have changed their expected output.

Hope you'll find it helpful.

Best regards,
Mark Lorenz
Вложения
Hi,

I got the advice to split the patches for:
- fixing the to_char() function
- changing the to_date()/to_timestamp() behaviour

So I appended the split patches.

Kind regards,
Mark Lorenz

Вложения
Mark Lorenz <postgres@four-two.de> writes:
> I got the advice to split the patches for:
> - fixing the to_char() function
> - changing the to_date()/to_timestamp() behaviour
> So I appended the split patches.

I'm a bit skeptical of the premise here.  The fine manual says

    In to_timestamp and to_date, weekday names or numbers (DAY, D, and
    related field types) are accepted but are ignored for purposes of
    computing the result. The same is true for quarter (Q) fields.

You appear to be trying to change that, but it's not at all clear
what behavior you're changing it to, or whether the result is going
to be any more sensible than it was before.  In any case, this is
certainly not a "bug fix", because the code is working as documented.
It's a redefinition, and you haven't specified the new definition.

Another point is that these functions are meant to be Oracle-compatible,
so I wonder what Oracle does in not-terribly-well-defined cases like
these.

            regards, tom lane



Mark Lorenz <postgres@four-two.de> writes:
> while preparing the patch for the Commitfest, I found a bug in the
> to_char() function that is quite correlated with this issue:

> SELECT to_char('1997-02-01'::date, 'YYYY-WW-D')

> returns: 1997-05-7 -> which is ok, I believe. Feb, 1st was on Saturday,
> so counting from Sundays, it was day 7 of week 5.

> SELECT to_char('1997-02-03'::date, 'YYYY-WW-D')

> returns: 1997-05-2 -> This cannot be.

Why not?  These format codes are specified as

D    day of the week, Sunday (1) to Saturday (7)
WW    week number of year (1–53) (the first week starts on the first day of the year)

I don't see anything there that says that "D" is correlated with "WW".
We do have a connection between "ID" and "IW", so that ID ought to
specify a day within an IW week, but there's no connection between "D"
and either "W" or "WW" week numbering.  It's a day of the week, as
per the regular calendar.  Trying to define it as something else is
just going to break stuff.

The only way to make "D" as it stands compatible with a week-numbering
system is to ensure that your weeks always start on Sundays, that is,
just as confusing as ISO weeks but slightly differently confusing.

Perhaps it would be worth inventing format codes that do have the
same relationship to "W" and/or "WW" as "ID" does to "IW".  But
repurposing "D" for that is a bad idea.

            regards, tom lane



Hi Tom,

thanks for answering!

I commited two different patches:

-------

The first one is for the strange behaviour of to_char(), which could be 
seen as a bug, I believe. As described earlier, to_char() with the 
'WW-D' pattern could return wrong week numbers.

The non-ISO week number is defined for weeks beginning with Sundays and 
ending with Saturdays. The first week of the year is the week with 
January, 1st.

For example:

postgres=# SELECT to_char('1997-01-01'::date, 'YYYY-WW-D');
   to_char
---------
   1997-01-4
(1 row)

1997-01-01 was a Wednesday. So the first week in 1997 was from Jan 1st 
to Jan 4th (Saturday). Week 2 started on Jan 5th. But to_char() gives 
out week number 1 until Tuesday (!), Jan 7th.

postgres=# SELECT to_char('1997-01-07'::date, 'YYYY-WW-D');
   to_char
---------
   1997-01-3
(1 row)

After that, on Jan 8th, the output switches from 01-3 to 02-4, which 
makes no sense in my personal opinion. The week number should be 
consistent from Sun to Sat and should not switch during any day in the 
week. Furthermore, it is not clear why Jan 7th should return an earlier 
week day (3) than Jan 1st (4).

The bug is, that the calculation of the week number only considers the 
number of days of the current year. But it ignores the first week day, 
which defines an offset. This has been fixed in the first patch.

-------

Second patch:

As you stated correctly, this is not a bug fix, because the current 
behaviour is documented and it works as the documentation states. I 
tried to describe my confusion in the very first post of this thread:

I was wondering why the D part is not recognized in the non-ISO week 
pattern while the ISO day is working very well. Although this is 
documented, there could be a chance that this simply was not implemented 
right now - so I tried.

The main aspect, I believe, is, that to_date() or to_timestamp() is some 
kind of "back" operation of the to_char() function. So, a new definition 
simply should recognize the week day as the to_char() function does, 
instead of setting the day part fix to any number (please see the 
examples in the very first post for that).

-------

Combining both patches, the to_char() fix and the to_date() change, it 
is possible to calculate the non-ISO week pattern in both directions:

SELECT to_date(to_char(anydate, 'YYYY-WW-D'), 'YYYY-WW-D')

would result in "anydate". Currently it does not:

postgres=# SELECT to_date(to_char('1997-01-07'::date, 'YYYY-WW-D'), 
'YYYY-WW-D')
   to_char
---------
   1997-01-01
(1 row)

postgres=# SELECT to_char(to_date('1997-01-07', 'YYYY-WW-D'), 
'YYYY-WW-D')
   to_char
---------
   1997-01-04
(1 row)

On the other hand, the ISO week calculations work as expected, 
especially the there-and-back operation results in the original value:

postgres=# SELECT to_date(to_char('1997-01-07'::date, 'IYYY-IW-ID'), 
'IYYY-IW-ID')
   to_char
---------
   1997-01-07
(1 row)

postgres=# SELECT to_char(to_date('1997-01-07', 'IYYY-IW-ID'), 
'IYYY-IW-ID')
   to_char
---------
   1997-01-7
(1 row)

The only difference between ISO and non-ISO weeks is the beginning on 
Mondays and the definition of the first week. But this cannot be the 
reason why one operation results in right values (comparing with a 
calendar) and the other one does not.

Does this explanation make it clearer?



>> while preparing the patch for the Commitfest, I found a bug in the
>> to_char() function that is quite correlated with this issue:
> 
>> SELECT to_char('1997-02-01'::date, 'YYYY-WW-D')
> 
>> returns: 1997-05-7 -> which is ok, I believe. Feb, 1st was on 
>> Saturday,
>> so counting from Sundays, it was day 7 of week 5.
> 
>> SELECT to_char('1997-02-03'::date, 'YYYY-WW-D')
> 
>> returns: 1997-05-2 -> This cannot be.
> 
> Why not?  These format codes are specified as
> 
> D    day of the week, Sunday (1) to Saturday (7)
> WW    week number of year (1–53) (the first week starts on the first day
> of the year)
> 

Because 1997-05-2 is earlier than 1997-05-7. But 1997-02-03 is later 
than 1997-02-01. From my point of view, this is confusing.

> I don't see anything there that says that "D" is correlated with "WW".
> We do have a connection between "ID" and "IW", so that ID ought to
> specify a day within an IW week, but there's no connection between "D"
> and either "W" or "WW" week numbering.  It's a day of the week, as
> per the regular calendar.  Trying to define it as something else is
> just going to break stuff.
> 
> The only way to make "D" as it stands compatible with a week-numbering
> system is to ensure that your weeks always start on Sundays, that is,
> just as confusing as ISO weeks but slightly differently confusing.
> 
> Perhaps it would be worth inventing format codes that do have the
> same relationship to "W" and/or "WW" as "ID" does to "IW".  But
> repurposing "D" for that is a bad idea.
> 
>             regards, tom lane

I don't want to create any connection here. The day is calculated 
correctly. But the week number is wrong. 1997-02-03 was in week number 
6, as well as 1997-02-04. But Postgres returns 5. The problem with 
to_char() is, that the week number is considering only the nmber of days 
in the year and divides them by 7. So, there is no diffence whether the 
year starts on Sunday or any other week day. So, an offset is missing, 
which yields in wrong week numbers, as I can see...



Mark Lorenz <postgres@four-two.de> writes:
>> Why not?  These format codes are specified as
>> D    day of the week, Sunday (1) to Saturday (7)
>> WW    week number of year (1–53) (the first week starts on the first day
>> of the year)

> I don't want to create any connection here. The day is calculated 
> correctly. But the week number is wrong. 1997-02-03 was in week number 
> 6, as well as 1997-02-04. But Postgres returns 5.

The week number is only wrong if you persist in ignoring the very clear
definition given in the manual.  According to the stated definition of WW,
"week 1" consists of Jan 1 to Jan 7, "week 2" to Jan 8-14, etc.  So it's
correct for both of those dates to be in "week 5".  There are other
possible definitions of "week" of course, such as the ISO week, under
which both those dates would be in week 6 (of 1997 anyway, not all other
years).  But if you want ISO week you should ask for it with "IW", not
expect that we'll change the longstanding behavior of "WW" to match.

As far as I can see, the only way to make a week definition that
gives sensible results in combination with "D" is to do something
like what ISO does, but with Sunday as the start day instead of Monday.
But having three different week definitions seems more likely to
confuse people (even more) than to be helpful.  Plus you'd also need
analogs of IYYY, IDDD, etc.

Why not just use IYYY-IW-ID, instead?  You'd have to adapt to
week-starts-on-Monday, but you'd be using a notation that a lot
of people are already familiar with, instead of inventing your own.

Another possibility, perhaps, is to use WW in combination with
some new field that counts 1-7, 1-7, 1-7, ... starting on Jan 1.
But then that wouldn't have any easy mapping to day names, so
there's no free lunch.

Throwing MM into the mix makes it even more exciting, as month
boundaries don't correspond with week boundaries either.  I don't
see any rational way to make YYYY-MM-W or YYYY-MM-W-D patterns
that behave in a numerically consistent fashion.  (Note that ISO
didn't try --- there is no "ISO month".)

The bottom line is that these various definitions aren't mutually
consistent, and that's just a fact of life, not something that can
be fixed.

In any case, backwards compatibility alone would be a sufficient
reason to reject a patch that changes the established behavior
of the existing format codes.  Whether you think they're buggy or
not, other people are relying on the existing documented behavior.

Perhaps we'd consider a patch that adds some new format codes with
new behavior.  But personally I'd vote against implementing new
format codes unless you can point to well-established standards
supporting their definitions.  to_char/to_date are impossibly
complex and unmaintainable already; we don't need to add more
features with narrow use-cases to them.

            regards, tom lane



Updated the chg_to_date_yyyywwd.patch with additional tests (because it 
works not only for 'D' pattern but also for all day patterns like 'Day' 
or 'DY'). Added the necessary documentation change.

(The fix_to_char_yyyywwd.patch from 
f4e740a8de3ad1e762a28f6ff253ea4f%40four-two.de is still up-to-date)
Вложения
Em sex., 10 de jan. de 2020 às 09:22, Mark Lorenz <postgres@four-two.de> escreveu:
Updated the chg_to_date_yyyywwd.patch with additional tests (because it
works not only for 'D' pattern but also for all day patterns like 'Day'
or 'DY'). Added the necessary documentation change.

(The fix_to_char_yyyywwd.patch from
f4e740a8de3ad1e762a28f6ff253ea4f%40four-two.de is still up-to-date)


The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:       tested, passed
Spec compliant:           tested, passed
Documentation:            not tested

Hi Mark,

this is a review of the patch: chg_to_date_yyyywwd.patch

There hasn't been any problem, at least that I've been able to find.

This one applies cleanly.

The entire compilation went without error as well.

############# Without patch #############

postgres=# SELECT to_date('2019-1-1', 'YYYY-WW-D');
  to_date  
------------
 2019-01-01
(1 row)

postgres=# SELECT to_date('2019-1-2', 'YYYY-WW-D');
  to_date  
------------
 2019-01-01
(1 row)

postgres=# SELECT to_date('2019-1-9', 'YYYY-WW-D');
  to_date  
------------
 2019-01-01
(1 row)


############# With patch #############

postgres=# SELECT to_date('2019-1-1', 'YYYY-WW-D');
  to_date  
------------
 2018-12-30
(1 row)

postgres=# SELECT to_date('2019-1-2', 'YYYY-WW-D');
  to_date  
------------
 2018-12-31
(1 row)

postgres=# SELECT to_date('2019-1-9', 'YYYY-WW-D');
  to_date  
------------
 2019-01-07
(1 row)

+1 for committer review

--
Cleysson Lima

Cleysson Lima <cleyssondba@gmail.com> writes:
> this is a review of the patch: chg_to_date_yyyywwd.patch
> There hasn't been any problem, at least that I've been able to find.

AFAICS, the point of this patch is to make to_date symmetrical
with the definition of WW that the other patch wants for to_char.
But the other patch is wrong, for the reasons I explained upthread,
so I doubt that we want this one either.

I still think that it'd be necessary to invent at least one new
format field code in order to get to a sane version of this feature.
As they stand, 'WW' and 'D' do not agree on what a week is, and
changing the behavior of either one in order to make them agree
is just not going to happen.

BTW, I went to check on what Oracle thinks about this, since these
functions are allegedly Oracle-compatible.  On PG, I get this
for the WW and D values for the next few days:

select to_char(current_date+n, 'YYYY-MM-DD YYYY-WW-D Day')
from generate_series(0,10) n;
            to_char             
--------------------------------
 2020-01-31 2020-05-6 Friday   
 2020-02-01 2020-05-7 Saturday 
 2020-02-02 2020-05-1 Sunday   
 2020-02-03 2020-05-2 Monday   
 2020-02-04 2020-05-3 Tuesday  
 2020-02-05 2020-06-4 Wednesday
 2020-02-06 2020-06-5 Thursday 
 2020-02-07 2020-06-6 Friday   
 2020-02-08 2020-06-7 Saturday 
 2020-02-09 2020-06-1 Sunday   
 2020-02-10 2020-06-2 Monday   
(11 rows)

I did the same calculations using Oracle 11g R2 on sqlfiddle.com
and got the same results.  Interestingly, though, I also tried it on

https://rextester.com/l/oracle_online_compiler

and here's what I get there:

2020-01-31 2020-05-5 Freitag
2020-02-01 2020-05-6 Samstag
2020-02-02 2020-05-7 Sonntag
2020-02-03 2020-05-1 Montag
2020-02-04 2020-05-2 Dienstag
2020-02-05 2020-06-3 Mittwoch
2020-02-06 2020-06-4 Donnerstag
2020-02-07 2020-06-5 Freitag
2020-02-08 2020-06-6 Samstag
2020-02-09 2020-06-7 Sonntag
2020-02-10 2020-06-1 Montag

(I don't know how to switch locales on these sites, so I don't have
any way to check what happens in other locales.)

So we agree with Oracle on what WW means, but they count D as 1-7
starting on either Sunday or Monday according to locale.  I wonder
whether we should change to match that?  Maybe "TMD" should act that
way?  It's already the case that their "Day" acts like our "TMDay",
evidently.

Either way, though, the WW weeks don't line up with the D weeks,
and we're not likely to make them do so.

So I think an acceptable version of this feature has to involve
defining at least one new format code and maybe as many as three,
to produce year, week and day values that agree on whichever
definition of "a week" you want to use, and then to_date has to
enforce that input uses matching year/week/day field types,
very much like it already does for ISO versus Gregorian dates.

I also notice that neither patch touches the documentation.
A minimum requirement here is defining what you think the underlying
"week" is, if it's neither ISO nor the existing WW definition.
As I said before, it'd also be a good idea to provide some
evidence that there are other people using that same week definition.

            regards, tom lane



I wrote:
> Either way, though, the WW weeks don't line up with the D weeks,
> and we're not likely to make them do so.
> So I think an acceptable version of this feature has to involve
> defining at least one new format code and maybe as many as three,
> to produce year, week and day values that agree on whichever
> definition of "a week" you want to use, and then to_date has to
> enforce that input uses matching year/week/day field types,
> very much like it already does for ISO versus Gregorian dates.

A different line of thought could be to accept the current to_char()
behavior for WW and D, and go ahead and teach to_date() to invert that.
That is, take YYYY plus WW as specifying a seven-day interval, and then
D chooses the matching day within that interval.  This would still have
the property you complained about originally that WW-plus-D don't form
a monotonically increasing sequence, but I think that ship has sailed.

            regards, tom lane



I just noticed that this patch has been classified under "bug fixes",
but per Tom's comments, this is not a bug fix -- it seems we would need
a new format code to implement some different week numbering mechanism.
That seems a new feature, not a bug fix.

Therefore I propose to move this in Commitfest from "Bug fixes" to
"Server features".  This has implications such as not automatically
moving to next commitfest if no update appears during this one.


I've never personally had to write calendaring applications, so I don't
have an opinion on whether this is useful.  Why isn't it sufficient to
rely on ISO week/day numbering (IW/ID), which appears to be more
consistent?  I think we should consider adding more codes only if
real-world use cases exist for them.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Hi Tom,

with a bit space to this issue, I re-read your comments. I am beginning 
to understand what you mean or - better - what's wrong with my thoughts. 
When I understand you correctly, you say, the WW can start at any 
weekday, and is not fixed to Sunday, right? In your opinion the WW 
starts with the weekday of Jan, 1st? That's what could be my problem: I 
always thought (maybe triggered through the D pattern), that WW has to 
start sundays. But, now I agree with you, the docs fit better to your 
interpretation:

"the first week starts on the first day of the year"

I interpreted it with: It starts on the week, which includes the first 
of the year, but the Sunday before.

Did I understand you correctly? In that case, I accept, that my patch is 
no bugfix (I think, it would be one, if my interpretion would be the 
expected behaviour.).

But, nevertheless, what about adding the function to accept the DAY, D 
(and maybe the Q) patterns for to_date() - in this case, of course, in 
the uncorrelated version? to_char() handles them properly. And, from my 
point of view, there is no reason why they should give "1" instead the 
real day number. What do you think?



Mark Lorenz <postgres@four-two.de> writes:
> with a bit space to this issue, I re-read your comments. I am beginning 
> to understand what you mean or - better - what's wrong with my thoughts. 
> When I understand you correctly, you say, the WW can start at any 
> weekday, and is not fixed to Sunday, right? In your opinion the WW 
> starts with the weekday of Jan, 1st? That's what could be my problem: I 
> always thought (maybe triggered through the D pattern), that WW has to 
> start sundays. But, now I agree with you, the docs fit better to your 
> interpretation:
> "the first week starts on the first day of the year"

Yes, that's clearly what our code, and what Oracle's does too, given
the tests I showed upthread.

> But, nevertheless, what about adding the function to accept the DAY, D 
> (and maybe the Q) patterns for to_date() - in this case, of course, in 
> the uncorrelated version? to_char() handles them properly. And, from my 
> point of view, there is no reason why they should give "1" instead the 
> real day number. What do you think?

The trick is to produce something sane.  I think that a reasonable
precedent for this would be what to_date does with ISO-week fields:
you can ask it to parse IYYY-IW-ID but you can't mix that with regular
month/day/year fields.  So for example, it seems like it'd be possible
to reconstruct a date from YYYY-WW-D, because that's enough to uniquely
identify a day.  The D field isn't monotonically increasing within a
week, but nonetheless there's exactly one day in each YYYY-WW week that
has a particular D value.  However you probably don't want to allow
inconsistent mixtures like YYYY-WW-ID, because that's just a mess (and
more than likely, it's a mistake).  And I would not be in favor of
allowing YYYY-Q either, because that would not be enough to uniquely
identify a day, so there's really no point in allowing Q to enter into
to_date's considerations at all.

Whether there is actually any field demand for such a feature is
not clear to me.  AFAICT Oracle doesn't support it.

            regards, tom lane



I wrote:
> Mark Lorenz <postgres@four-two.de> writes:
>> But, nevertheless, what about adding the function to accept the DAY, D 
>> (and maybe the Q) patterns for to_date() - in this case, of course, in 
>> the uncorrelated version? to_char() handles them properly. And, from my 
>> point of view, there is no reason why they should give "1" instead the 
>> real day number. What do you think?

> The trick is to produce something sane.  I think that a reasonable
> precedent for this would be what to_date does with ISO-week fields:
> you can ask it to parse IYYY-IW-ID but you can't mix that with regular
> month/day/year fields.  So for example, it seems like it'd be possible
> to reconstruct a date from YYYY-WW-D, because that's enough to uniquely
> identify a day.  The D field isn't monotonically increasing within a
> week, but nonetheless there's exactly one day in each YYYY-WW week that
> has a particular D value.  However you probably don't want to allow
> inconsistent mixtures like YYYY-WW-ID, because that's just a mess (and
> more than likely, it's a mistake).  And I would not be in favor of
> allowing YYYY-Q either, because that would not be enough to uniquely
> identify a day, so there's really no point in allowing Q to enter into
> to_date's considerations at all.
> Whether there is actually any field demand for such a feature is
> not clear to me.  AFAICT Oracle doesn't support it.

Since we're certainly not going to commit these patches as-presented,
and nothing has happened on this thread since early April, I've marked
both the CF entries as Returned With Feedback.  If you do write a patch
to make to_date work as above, please file a new CF entry.

(BTW, having two CF entries pointing at the same email thread is
pretty confusing to our not-that-bright tools.  It's probably
better to have just one entry per thread in the future.)

            regards, tom lane