Re: [GENERAL] workaround for lack of REPLACE() function

Поиск

Список

Период

Сортировка

От	Tatsuo Ishii
Тема	Re: [GENERAL] workaround for lack of REPLACE() function
Дата	12 августа 2002 г. 00:07:45
Msg-id	20020812.100720.91313063.t-ishii@sra.co.jp обсуждение исходный текст
Ответ на	Re: [GENERAL] workaround for lack of REPLACE() function (Joe Conway <mail@joeconway.com>)
Список	pgsql-hackers

Дерево обсуждения

> Now, in the multibyte case, again in textlen(), I see:
> 
>    /* optimization for single byte encoding */
>    if (pg_database_encoding_max_length() <= 1)
>      PG_RETURN_INT32(VARSIZE(t) - VARHDRSZ);
> 
>    PG_RETURN_INT32(
>        pg_mbstrlen_with_len(VARDATA(t), VARSIZE(t) - VARHDRSZ));
> 
> Three questions here.
> 1) In the case of encoding max length == 1, can we treat it the same as 
> the non-multibyte case (I presume they are exactly the same)?

Yes.

> 2) Can encoding max length ever be < 1? Doesn't make sense to me.

No. It seems just a defensive coding.

> 3) In the case of encoding max length > 1, if I understand correctly, 
> each encoded character can be one *or more* bytes, up to and encluding 
> encoding max length bytes.

Right.

> So the *only* way presently to get the length 
> of the original character string is to loop through the entire string 
> checking the length of each individual character (that's what 
> pg_mbstrlen_with_len() does it seems)?

Yes.

> Finally, if 3) is true, then there is no way to avoid the retrieval and 
> decompression of the datum just to find out its length. For large 
> datums, detoasting plus the looping through each character would add a 
> huge amount of overhead just to get at the length of the original 
> string. I don't know if we need to be able to get *just* the length 
> often enough to really care, but if we do, I had an idea for some future 
> release (I wouldn't propose doing this for 7.3):
> 
> - add a new EXTENDED state to va_external for MULTIBYTE
> - any string with max encoding length > 1 would be EXTENDED even if it
>    is not EXTERNAL and not COMPRESSED.
> - to each of the structs in the union, add va_strlen
> - populate va_strlen on INSERT and maintain it on UPDATE.
> 
> Now a new function similar to toast_raw_datum_size(), maybe 
> toast_raw_datum_strlen() could be used to get the original string 
> length, whether MB or not, without needing to retrieve and decompress 
> the entire datum.
> 
> I understand we would either: have to steal another bit from the VARHDR 
> which would reduce the effective size of a valena from 1GB down to .5GB; 
> or we would need to add a byte or two to the VARHDR which is extra 
> per-datum overhead. I'm not sure we would want to do either. But I 
> wanted to toss out the idea while it was fresh on my mind.

Interesting idea. I also was thinking about adding some extra
infomation to text data types such as character set, collation
etc. for 7.4 or later.
--
Tatsuo Ishii

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Curt Sampson
Дата: 11 августа 2002 г., 23:18:29
Сообщение: Re: Table Inheritance Discussion

Следующее

От: "Christopher Kings-Lynne"
Дата: 12 августа 2002 г., 01:15:19
Сообщение: Re: python patch

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: [GENERAL] workaround for lack of REPLACE() function

Предыдущее

Следующее