EBCDIC sorting as a use case for ICU rules

Поиск
Список
Период
Сортировка
От Daniel Verite
Тема EBCDIC sorting as a use case for ICU rules
Дата
Msg-id 35cc1684-e516-4a01-a256-351632d47066@manitou-mail.org
обсуждение исходный текст
Ответы Re: EBCDIC sorting as a use case for ICU rules  (Joe Conway <mail@joeconway.com>)
Re: EBCDIC sorting as a use case for ICU rules  (Jeff Davis <pgsql@j-davis.com>)
Re: EBCDIC sorting as a use case for ICU rules  (Peter Eisentraut <peter.eisentraut@enterprisedb.com>)
Re: EBCDIC sorting as a use case for ICU rules  (Jehan-Guillaume de Rorthais <jgdr@dalibo.com>)
Список pgsql-hackers
Hi,

In the "Order changes in PG16 since ICU introduction" discussion, one
sub-thread [1] was about having a credible use case for tailoring collations
with custom rules, a new feature in v16.

At a conference this week I was asked if ICU could be able to
sort like EBCDIC [2]. It turns out it has been already    asked on
-general a few years ago [3] with no satisfactory answer at the time ,
and that it can be implemented with rules in v16.

A collation like the following this seems to work (the rule simply enumerates
US-ASCII letters in the EBCDIC alphabet order, with adequate quoting)

CREATE COLLATION ebcdic (provider='icu', locale='und',
rules=$$&'

'<'.'<'<'<'('<'+'<\|<'&'<'!'<'$'<'*'<')'<';'<'-'<'/'<','<'%'<'_'<'>'<'?'<'`'<':'<'#'<'@'<\'<'='<'"'<a<b<c<d<e<f<g<h<i<j<k<l<m<n<o<p<q<r<'~'<s<t<u<v<w<x<y<z<'['<'^'<']'<'{'<A<B<C<D<E<F<G<H<I<'}'<J<K<L<M<N<O<P<Q<R<'\'<S<T<U<V<W<X<Y<Z<0<1<2<3<4<5<6<7<8<9$$);

This can be useful for people who migrate from mainframes to Postgres
and need their migration tests to produce the same sorted results as the
original system.
Since rules can be defined at the database level with the icu_rules option,
they don't even need to tweak their queries to add COLLATE clauses,
which surely is appreciable in that kind of project.

US-ASCII when sorted in EBCDIC order comes out like this:

.<(+|&!$*);-/,%_>?`:#@'="abcdefghijklmnopqr~stuvwxyz[^]{ABCDEFGHI}JKLMNOPQR\ST
UVWXYZ0123456789

Maybe this example could be added to the documentation except for
the problem that the rule is very long and dollar-quoting cannot be split
into several lines. Literals enclosed by single quotes can be split that
way, but would require escaping the single quotes in the rule, which
would lead to scary-looking over-quoted contents.

I'm open to suggestions on whether this EBCDIC example is worth being in the
doc in some form or putting this in the wiki would be good enough.



[1]
https://www.postgresql.org/message-id/flat/a28aba5fa6bf1abfff96e40b6d6acff8412edb15.camel%40j-davis.com

[2] https://en.wikipedia.org/wiki/EBCDIC

[3]
https://www.postgresql.org/message-id/flat/0A3221C70F24FB45833433255569204D1F84A7AD%40G01JPEXMBYT05


Best regards,
--
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tomas Vondra
Дата:
Сообщение: Re: Can JoinFilter condition be pushed down into IndexScan?
Следующее
От: Tom Lane
Дата:
Сообщение: Re: ProcessStartupPacket(): database_name and user_name truncation