Обсуждение: Re: [PATCHES] to_date() validation
On Sun, Sep 7, 2008 at 5:58 AM, Alex Hunsaker <badalex@gmail.com> wrote: > Im just following this: > http://wiki.postgresql.org/wiki/Reviewing_a_Patch so lets get started. > Hi Alex. Thanks for taking the time to review my patch. > Feature test: Everything seems to work as advertised. However before > via sscanf() most limited the max length of the input. Before they > were just silently ignored now they get added to the result: > > patched: > # SELECT to_timestamp('11111', 'HHMM'); > to_timestamp > ------------------------------ > 0009-03-19 11:00:00-06:59:56 > > 8.3.3: > # SELECT to_timestamp('11111', 'HHMM'); > to_timestamp > --------------------------- > 0001-11-01 11:00:00-07 BC > It's an interesting point. I had considered what would happen if the input string was too short, but hadn't given much thought to what would happen if it was too long. In your example case, it isn't clear whether the user wanted to specify 11 hours and 11 months (and the final '1' is just junk), or if they really wanted to specify 11 hours and 111 months. But consider a case like the following: # SELECT to_date('07-09-20008', 'DD-MM-YYYY'); The documentation says that 'YYYY' is "4 and more digits", so you have to assume that the user meant to say the year 20,008. That's why the code in the patch parses all the remaining digits in the input string on the final node. HEAD actually gets this one wrong; in defiance of the documentation it returns 2000-09-07. So, it seems to me that the patch shifts the behaviour in the right direction. Barring actually teaching the code that some nodes (like YYYY) can take an open-ended number of characters, while others (like MM) must take an exact number of characters, I'm not sure what can be done to improve this. Perhaps something for a later patch? > Code review: one minor nit > *** a/src/backend/utils/adt/formatting.c > --- b/src/backend/utils/adt/formatting.c > *************** > *** 781,787 **** static const KeyWord DCH_keywords[] = { > {"y", 1, DCH_Y, TRUE, FROM_CHAR_DATE_GREGORIAN}, > > /* last */ > ! {NULL, 0, 0, 0} > }; > > /* ---------- > --- 781,787 ---- > {"y", 1, DCH_Y, TRUE, FROM_CHAR_DATE_GREGORIAN}, > > /* last */ > ! {NULL, 0, 0, 0, 0} > }; Ah, thank you for picking that up. I will correct this and send in a new patch version in the next 24 hours. Cheers, BJ
On Mon, Sep 8, 2008 at 6:24 PM, Brendan Jurd <direvus@gmail.com> wrote: > On Sun, Sep 7, 2008 at 5:58 AM, Alex Hunsaker <badalex@gmail.com> wrote: >> Code review: one minor nit >> *** a/src/backend/utils/adt/formatting.c >> --- b/src/backend/utils/adt/formatting.c >> *************** >> *** 781,787 **** static const KeyWord DCH_keywords[] = { >> {"y", 1, DCH_Y, TRUE, FROM_CHAR_DATE_GREGORIAN}, >> >> /* last */ >> ! {NULL, 0, 0, 0} >> }; >> >> /* ---------- >> --- 781,787 ---- >> {"y", 1, DCH_Y, TRUE, FROM_CHAR_DATE_GREGORIAN}, >> >> /* last */ >> ! {NULL, 0, 0, 0, 0} >> }; > > Ah, thank you for picking that up. I will correct this and send in a > new patch version in the next 24 hours. > New version attached (and link added to wiki). Cheers, BJ
Вложения
On Mon, Sep 08, 2008 at 06:24:14PM +1000, Brendan Jurd wrote: > On Sun, Sep 7, 2008 at 5:58 AM, Alex Hunsaker <badalex@gmail.com> wrote: > > Im just following this: > > http://wiki.postgresql.org/wiki/Reviewing_a_Patch so lets get started. > > > > Hi Alex. Thanks for taking the time to review my patch. I actually had a look at this patch also, though not as thoroughly as Alex. There was one part that I had some thoughts about in from_char_parse_int_len: ! /* ! * We need to pull exactly the number of characters given in 'len' out ! * of the string, and convert those. ! */ <snip> ! first = palloc(len + 1); ! strncpy(first, init, len); ! first[len] = '\0'; The use of palloc/pfree in this routine seems excessive. Does len have upper bound? If so a simple array will do it. ! if (strlen(first) < len) Here you check the length of the remaining string and complain if it's too short, but: <snip> ! result = strtol(first, &last, 10); ! *src += (last - first); Here you do not note if we didn't convert the entire string. So it seems you are allowed to provide too few characters, as long as it's not the last field? Other than that it looks like a good patch. Have a ncie day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Please line up in a tree and maintain the heap invariant while > boarding. Thank you for flying nlogn airlines.
On Tue, Sep 9, 2008 at 7:29 PM, Martijn van Oosterhout <kleptog@svana.org> wrote: > I actually had a look at this patch also, though not as thoroughly as > Alex. There was one part that I had some thoughts about in from_char_parse_int_len: > Hi Martijn. Thanks for your comments. > The use of palloc/pfree in this routine seems excessive. Does len have > upper bound? If so a simple array will do it. > There isn't any *theoretical* upper bound on len. However, in practice, with the current set of formatting nodes, the largest len that will ever be passed to rom_char_parse_int_len() is 6 (for the microsecond 'US' node). I suppose I could define a constant FORMATNODE_MAX_LEN, make it something like 10 and just use that for copying the string, rather than palloc(). I'll give it a try. > ! if (strlen(first) < len) > > Here you check the length of the remaining string and complain if it's > too short, but: > > <snip> > ! result = strtol(first, &last, 10); > ! *src += (last - first); > > Here you do not note if we didn't convert the entire string. So it > seems you are allowed to provide too few characters, as long as it's > not the last field? That's true. The only way to hit that condition would be to provide a semi-bogus value in a string with no separators between the numbers. For example: postgres=# SELECT to_date('20081o13', 'YYYYMMDD'); ERROR: invalid value for "DD" in source string DETAIL: Value must be an integer. What happens here is that strtol() happily consumes the '1' for the format node MM, and as you point out it does not complain that it expected to consume two characters and only got one. On the next node (DD), the function tries to start parsing an integer, but the first character it encounters is 'o', so it freaks out. Certainly the error message here should be more apropos, and tell the user that the problem is in the MM node. Blaming the DD node is pure red herring. However, if the mistake is at the end of the string, there is no error at all: postgres=# SELECT to_date('2008101!', 'YYYYMMDD'); to_date ------------2008-10-01 This is because the end of the string counts as a "separator", so we just run an unbounded strtol() on whatever characters remain in the string. As discussed in my response to Alex's review, I made the end of the string a separator so that things like 'DD-MM-YYYY' would actually work for years with more than four digits. Now I'm wondering if that was the wrong way to go. The case for years with more than four digits is extremely narrow, and if somebody really wanted to parse '01-01-20008' as 1 Jan 20,008 they could always use the 'FM' modifier to get the behaviour they want ('DD-MM-FMYYYY'). I'll send in a new version which addresses these issues. Cheers, BJ
On Tue, Sep 9, 2008 at 9:04 PM, Brendan Jurd <direvus@gmail.com> wrote: > On Tue, Sep 9, 2008 at 7:29 PM, Martijn van Oosterhout > <kleptog@svana.org> wrote: >> The use of palloc/pfree in this routine seems excessive. Does len have >> upper bound? If so a simple array will do it. >> > > I suppose I could define a constant FORMATNODE_MAX_LEN, make it > something like 10 and just use that for copying the string, rather > than palloc(). I'll give it a try. > Turns out there was already a relevant constant defined in formatting.c: DCH_MAX_ITEM_SIZ, set to 9. So I just used that, +1 for the trailing null. >> >> Here you do not note if we didn't convert the entire string. So it >> seems you are allowed to provide too few characters, as long as it's >> not the last field? > > That's true. The only way to hit that condition would be to provide a > semi-bogus value in a string with no separators between the numbers. I've now plugged this hole. I added the following error for fixed-width inputs that are too short: postgres=# SELECT to_date('200%1010', 'YYYYMMDD'); ERROR: invalid value for "YYYY" in source string DETAIL: Field requires 4 characters, but only 3 could be parsed. HINT: If your source string is not fixed-width, try using the "FM" modifier. I've attached a new version of the patch (version 3), as well as an incremental patch to show the differences between versions 2 and 3. Cheers, BJ
Вложения
On Mon, Sep 8, 2008 at 2:24 AM, Brendan Jurd <direvus@gmail.com> wrote: > HEAD actually gets this one wrong; in defiance of the documentation it > returns 2000-09-07. So, it seems to me that the patch shifts the > behaviour in the right direction. > > Barring actually teaching the code that some nodes (like YYYY) can > take an open-ended number of characters, while others (like MM) must > take an exact number of characters, I'm not sure what can be done to > improve this. Perhaps something for a later patch? Sound good to me and I would probably argue that things like MM should not be hard coded to take only 2 chars... But then again to play devils advocate I can just as easily do things like to_char(...) + '30 months'::interval;
On Tue, Sep 9, 2008 at 6:46 AM, Brendan Jurd <direvus@gmail.com> wrote: > On Tue, Sep 9, 2008 at 9:04 PM, Brendan Jurd <direvus@gmail.com> wrote: >> On Tue, Sep 9, 2008 at 7:29 PM, Martijn van Oosterhout >> <kleptog@svana.org> wrote: >>> The use of palloc/pfree in this routine seems excessive. Does len have >>> upper bound? If so a simple array will do it. >>> >> Good catch Martijn! >> I suppose I could define a constant FORMATNODE_MAX_LEN, make it >> something like 10 and just use that for copying the string, rather >> than palloc(). I'll give it a try. >> > > Turns out there was already a relevant constant defined in > formatting.c: DCH_MAX_ITEM_SIZ, set to 9. So I just used that, +1 for > the trailing null. Cool. >>> >>> Here you do not note if we didn't convert the entire string. So it >>> seems you are allowed to provide too few characters, as long as it's >>> not the last field? >> >> That's true. The only way to hit that condition would be to provide a >> semi-bogus value in a string with no separators between the numbers. > > I've now plugged this hole. I added the following error for > fixed-width inputs that are too short: > > postgres=# SELECT to_date('200%1010', 'YYYYMMDD'); > ERROR: invalid value for "YYYY" in source string > DETAIL: Field requires 4 characters, but only 3 could be parsed. > HINT: If your source string is not fixed-width, try using the "FM" modifier. I think thats a big improvement. > I've attached a new version of the patch (version 3), as well as an > incremental patch to show the differences between versions 2 and 3. I looked it over, looks good to me! > Cheers, > BJ >
"Brendan Jurd" <direvus@gmail.com> writes: > I've attached a new version of the patch (version 3), as well as an > incremental patch to show the differences between versions 2 and 3. Applied with minimal modifications. I changed a couple of error messages that didn't seem to meet the style guidelines, and I moved all of the to_timestamp tests into horology.sql, rather than having essentially duplicate tests in timestamp.sql and timestamptz.sql. regards, tom lane
On Fri, Sep 12, 2008 at 3:34 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Applied with minimal modifications. I changed a couple of error > messages that didn't seem to meet the style guidelines, Great, thanks Tom. And thanks again to Alex and Martijn for helping me refine the patch. > and I moved all > of the to_timestamp tests into horology.sql, rather than having > essentially duplicate tests in timestamp.sql and timestamptz.sql. > Nice. That will make maintaining/extending the tests easier in future. Cheers, BJ