Re: Allow to_date() and to_timestamp() to accept localized names

Поиск

Список

Период

Сортировка

От	Juan José Santamaría Flecha
Тема	Re: Allow to_date() and to_timestamp() to accept localized names
Дата	24 января 2020 г. 12:48:19
Msg-id	CAC+AXB1fQtg-iWH=4qZnV0xYm08WR6W7AhoApKA3tkFsWgf90A@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Allow to_date() and to_timestamp() to accept localized names (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы	Re: Allow to_date() and to_timestamp() to accept localized names Re: Allow to_date() and to_timestamp() to accept localized names
Список	pgsql-hackers

Дерево обсуждения

On Thu, Jan 23, 2020 at 11:00 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Thank you for your time looking into this.

Here's a v7 patch, rebased over my recent hacking, and addressing
most of the complaints I raised in <31691.1579648824@sss.pgh.pa.us>.
However, I've got some new complaints now that I've looked harder
at the code:

* It's not exactly apparent to me why this code should be concerned
about non-normalized characters when noplace else in the backend is.
I think we should either rip that out entirely, or move the logic
into str_tolower (and hence also str_toupper etc). I'd tend to favor
the former, but of course I don't speak any languages where this would
be an issue. Perhaps a better idea is to invent a new SQL function
that normalizes a given string, and expect users to call that first
if they'd like to_date() to match unnormalized text.

There is an open patch that will make the normalization functionality user visible [1]. So, if a user can call to_date(normalize('01 ŞUB 2010'), 'DD TMMON YYYY') I would vote to drop the normalization logic inside this patch altogether.

* I have no faith in this calculation that decides how long the match
length was:

*len = element_len + name_len - norm_len;

I seriously doubt that this does the right thing if either the
normalization or the downcasing changed the byte length of the
string. I'm not actually sure how we can do that correctly.
There's no good way to know whether the changes happened within
the part of the "name" string that we matched, or the part beyond
what we matched, but we only want to count the former. So this
seems like a pretty hard problem, and even if this logic is somehow
correct as it stands, it needs a comment explaining why.

The proper logic would come from do_to_timestamp() receiving a normalized "date_txt" input, so we do not operate with unnormalize and normalize strings at the same time.

* I'm still concerned about the risk of integer overflow in the
string allocations in seq_search_localized(). Those need to be
adjusted to be more paranoid, similar to the code in e.g. str_tolower.

Please find attached a patch with the normalization logic removed, thus no direct allocations in seq_search_localized().

I would like to rise a couple of questions myself:

* When compiled with DEBUG_TO_FROM_CHAR, there is a warning "‘dump_node’ defined but not used". Should we drop this function or uncomment its usage?

* Would it be worth moving str_tolower(localized_name) from seq_search_localized() into cache_locale_time()?

[1] https://www.postgresql.org/message-id/014866c8-d7ff-2a4f-c185-cf7e3ceb7028%402ndquadrant.com

Regards,

Juan José Santamaría Flecha

Вложения

0001-Allow-localized-month-names-to_date-v8.patch

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Konstantin Knizhnik
Дата: 24 января 2020 г., 12:43:10
Сообщение: Re: [Proposal] Global temporary tables

Следующее

От: Peter Eisentraut
Дата: 24 января 2020 г., 13:04:25
Сообщение: Re: Preserve versions of initdb-created collations in pg_upgrade

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Allow to_date() and to_timestamp() to accept localized names

Вложения

Предыдущее

Следующее