Dear Horiguchi-san, Fujii-san,
Perfect work... Thank you for replying and analyzing!
> A. "^-?[0-9]+.*" : returns valid padding. p goes after the last digit.
> B. "^[^0-9-].*" : padding = 0, p doesn't advance.
> C. "^-[^0-9].*" : padding = 0, p advances by 1 byte.
> D. "^-" : padding = 0, p advances by 1 byte.
> (if *p == 0 then breaks)
I confirmed them and your patterns are correct.
> If we wan to make the behaviors C and D same with the current, the
> else clause should be like the follows, but I don't think we need to
> do that.
> else
> {
> padding = 0;
> if (*p == '-')
> p++;
> }
This treatments is not complex so I want to add them if possible.
> One possible cause of a difference in behavior is character class
> handling including multibyte characters of isdigit and strtol. If
> isdigit accepts '一' as a digit (some platforms might do this) , and
> strtol doesn't (I believe it is universal behavior), '%一0p' is
> converted to '%' and the pointer moves onto '一'. But I don't think we
> need to do something for such a crazy specification.
Does isdigit() understand multi-byte character correctly? The arguments
of isdigit() is just a unsigned char, and this is 1byte.
Hence I thought that they cannot distinguish 'ー'.
Actually I considered about another thing. Maybe isdigit() just checks
whether the value of the argument is in (int)48 and (int)57, and that means that
the first part of some multi-byte characters may be accepted as digit in some locales.
But, of cause I agreed this is the crazy case.
Best Regards,
Hayato Kuroda
FUJITSU LIMITED