Re: custom function for converting human readable sizes to bytes

Поиск
Список
Период
Сортировка
От Pavel Stehule
Тема Re: custom function for converting human readable sizes to bytes
Дата
Msg-id CAFj8pRA5_JQ+2ytwkFefeenWG8v=1+jo_MmYc86BUr_t3vwtmA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: custom function for converting human readable sizes to bytes  (Vitaly Burovoy <vitaly.burovoy@gmail.com>)
Ответы Re: custom function for converting human readable sizes to bytes  (Vitaly Burovoy <vitaly.burovoy@gmail.com>)
Re: custom function for converting human readable sizes to bytes  (Vitaly Burovoy <vitaly.burovoy@gmail.com>)
Список pgsql-hackers


2016-01-19 4:45 GMT+01:00 Vitaly Burovoy <vitaly.burovoy@gmail.com>:
On 1/4/16, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> 2016-01-04 18:13 GMT+01:00 Shulgin, Oleksandr <oleksandr.shulgin@zalando.de> :
>> On Mon, Jan 4, 2016 at 6:03 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>> > 2016-01-04 17:48 GMT+01:00 Shulgin, Oleksandr <oleksandr.shulgin@zalando.de>:
>> >> On Mon, Jan 4, 2016 at 4:51 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> >>
>> >> I'm also inclined on dropping that explicit check for empty string
>> >> below and let numeric_in() error out on that.  Does this look OK, or can
>> >> it confuse someone:
>> >> postgres=# select pg_size_bytes('');
>> >> ERROR:  invalid input syntax for type numeric: ""
>> >
>> > both fixed
>>
>> Hm...
>>
>> > + switch (*strptr)
>> > + {
>> > + /* ignore plus symbol */
>> > + case '+':
>> > + case '-':
>> > + *bufptr++ = *strptr++;
>> > + break;
>> > + }
>>
>> Well, to that amount you don't need any special checks, I'm just not sure
>> if reported error message is not misleading if we let numeric_in() handle
>> all the errors.  At least it can cope with the leading spaces, +/- and
>> empty input quite well.
>>
>
> I don't would to catch a exception from numeric_in - so I try to solve some
> simple situations, where I can raise possible better error message.

There are several cases where your behavior gives strange errors (see below).

Next batch of notes:

src/include/catalog/pg_proc.h:
---
+ DATA(insert OID = 3317 ( pg_size_bytes...
now oid 3317 is used (by pg_stat_get_wal_receiver), 3318 is free

fixed
 

---
+ DESCR("convert a human readable text with size units to big int bytes");
May be the best way is to copy the first sentence from the doc?
("convert a size in human-readable format with size units into bytes")

fixed


====
src/backend/utils/adt/dbsize.c:
+ text             *arg = PG_GETARG_TEXT_PP(0);
+ char             *str = text_to_cstring(arg);
...
+       /* working buffer cannot be longer than original string */
+       buffer = (char *) palloc(VARSIZE_ANY_EXHDR(arg) + 1);
Is there any reason to get TEXT for only converting it to cstring and
call VARSIZE_ANY_EXHDR instead of strlen?

performance - but these strings should be short, so I can use strlen

fixed
 

---
+       text               *arg = PG_GETARG_TEXT_PP(0);
+       char               *str = text_to_cstring(arg);
+       char    *strptr = str;
+       char               *buffer;
There are wrong offsets of variable names after their types (among all
body of the "pg_size_bytes" function).
See variable declarations in nearby functions (
  >> "make the new code look like the existing code around it"
  http://www.postgresql.org/docs/devel/static/source-format.html
)


fixed
 
---
+                                        errmsg("\"%s\" is not number", str)));
s/is not number/is not a number/
(the second version can be found in a couple places besides translations)

fixed

but this message can be little bit no intuitive - better text is "is not a valid number"
 

---
+       if (*strptr != '\0')
...
+               while (*strptr && !isspace(*strptr))
Sometimes it explicitly compares to '\0', sometimes implicitly.
Common use is explicit comparison and it is preferred due to different
compilers (their conversions to boolean).

fixed

---
+       /* Skip leading spaces */
...
+               /* ignore plus symbol */
...
+       /* copy digits to working buffer */
...
+       /* allow whitespace between integer and unit */
I'm also inclined on dropping that explicit skipping spaces, checking
for +/- symbols, but copying all digits, spaces, dots and '+-' symbols
and let numeric_in() error out on that.

This is difficult - you have to divide string to two parts and first world is number, second world is unit.

For example "+912+ kB" is correct number +912 and broken unit "+ kB".


It allows to get correct error messages for something like:
postgres=# select pg_size_bytes('.+912');
ERROR:  invalid input syntax for type numeric: ".+912"
postgres=# select pg_size_bytes('+912+ kB');
ERROR:  invalid input syntax for type numeric: "+912+ "
postgres=# select pg_size_bytes('++123 kB');
ERROR:  invalid input syntax for type numeric: "++123 "

instead of current:
postgres=# select pg_size_bytes('.+912');
ERROR:  invalid input syntax for type numeric: "."
postgres=# select pg_size_bytes('+912+ kB');
ERROR:  invalid unit: "+ kB"
postgres=# select pg_size_bytes('++123 kB');
ERROR:  invalid input syntax for type numeric: "+"


I redesigned this check. Now it is

popostgres=# select pg_size_bytes('.+912');
ERROR:  22023: ".+912" is not a valid number
stgres=# select pg_size_bytes('++123 kB');
ERROR:  22023: "++123 kB" is not a valid number

 
---
+       while (isspace((unsigned char) *strptr))
...
+       while (isspace(*strptr))
...
+               while (*strptr && !isspace(*strptr))
...
+               while (isspace(*strptr))
The first occurece of isspace's parameter is casting to "unsigned
char" whereas the others are not.
Note:
"The behavior is undefined if the value of ch is not representable as
unsigned char and is not equal to EOF"


 
Proof:
http://en.cppreference.com/w/c/string/byte/isspace

fixed

 


---
+       pfree(buffer);
+       pfree(str);
pfree-s here are not necessary. See:
http://www.neilconway.org/talks/hacking/hack_slides.pdf (page 17)

Automatic memory deallocation doesn't cover all possible situations where the function can be used - for example DirectFunctionCall - so explicit deallocation can descrease a memory requirements when you call these functions from C.

New version is attached

Regards

Pavel
 

--
Best regards,
Vitaly Burovoy

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: Releasing in September
Следующее
От: Simon Riggs
Дата:
Сообщение: Re: Releasing in September