Re: Bug in UTF8-Validation Code?

Поиск

Список

Период

Сортировка

От	Andrew Dunstan
Тема	Re: Bug in UTF8-Validation Code?
Дата	17 марта 2007 г. 17:29:02
Msg-id	45FC4F85.7090804@dunslane.net обсуждение исходный текст
Ответ на	Re: Bug in UTF8-Validation Code? (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы	Re: Bug in UTF8-Validation Code?
Список	pgsql-hackers

Дерево обсуждения


Tom Lane wrote:
> I wrote:
>   
>> Actually, I have to take back that objection: on closer look, COPY
>> validates the data only once and does so before applying its own
>> backslash-escaping rules.  So there is a risk in that path too.
>>     
>
>   
>> It's still pretty annoying to be validating the data twice in the
>> common case where no backslash reduction occurred, but I'm not sure
>> I see any good way to avoid it.
>>     
>
> Further thought here: if we put encoding verification into textin()
> and related functions, could we *remove* it from COPY IN, in the common
> case where client and server encodings are the same?  Currently, copy.c
> forces a trip through pg_client_to_server for multibyte encodings
> even when the encodings are the same, so as to perform validation.
> But I'm wondering whether we'd still need that.  There's no risk of
> SQL injection in COPY data.  Bogus input encoding could possibly
> make for confusion about where the field boundaries are, but bad
> data is bad data in any case.
>
>             regards, tom lane
>
>   


Here are some timing tests in 1m rows of random utf8 encoded 100 char 
data. It doesn't look to me like the saving you're suggesting is worth 
the trouble.

baseline:

Time: 28228.325 ms
Time: 25987.740 ms
Time: 25950.707 ms
Time: 25756.371 ms
Time: 27589.719 ms
Time: 25774.417 ms


after adding suggested extra test to textin():


Time: 26722.376 ms
Time: 28343.226 ms
Time: 26529.364 ms
Time: 28020.140 ms
Time: 24836.853 ms
Time: 24860.530 ms


Script is:

\timing
create table xyz (x text);
copy xyz from '/tmp/utf8.data';
truncate xyz;
copy xyz from '/tmp/utf8.data';
truncate xyz;
copy xyz from '/tmp/utf8.data';
truncate xyz;
copy xyz from '/tmp/utf8.data';
truncate xyz;
copy xyz from '/tmp/utf8.data';
truncate xyz;
copy xyz from '/tmp/utf8.data';
drop table xyz;


Test platform: FC6, Athlon64.


cheers

andrew

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Bug in UTF8-Validation Code?