[COMMITTERS] pgsql: Use radix tree for character encoding conversions.

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема [COMMITTERS] pgsql: Use radix tree for character encoding conversions.
Дата
Msg-id E1cnV07-0007li-6D@gemulon.postgresql.org
обсуждение исходный текст
Список pgsql-committers
Use radix tree for character encoding conversions.

Replace the mapping tables used to convert between UTF-8 and other
character encodings with new radix tree-based maps. Looking up an entry in
a radix tree is much faster than a binary search in the old maps. As a
bonus, the radix tree representation is also more compact, making the
binaries slightly smaller.

The "combined" maps work the same as before, with binary search. They are
much smaller than the main tables, so it doesn't matter so much. However,
the "combined" maps are now stored in the same .map files as the main
tables. This seems more clear, since they're always used together, and
generated from the same source files.

Patch by Kyotaro Horiguchi, with lot of hacking by me at various stages.
Reviewed by Michael Paquier and Daniel Gustafsson.

Discussion: https://www.postgresql.org/message-id/20170306.171609.204324917.horiguchi.kyotaro%40lab.ntt.co.jp

Branch
------
master

Details
-------
http://git.postgresql.org/pg/commitdiff/aeed17d00037950a16cc5ebad5b5592e5fa1ad0f

Modified Files
--------------
src/backend/utils/mb/Unicode/Makefile              |    10 +-
src/backend/utils/mb/Unicode/UCS_to_BIG5.pl        |    12 +-
src/backend/utils/mb/Unicode/UCS_to_EUC_CN.pl      |    10 +-
.../utils/mb/Unicode/UCS_to_EUC_JIS_2004.pl        |    22 +-
src/backend/utils/mb/Unicode/UCS_to_EUC_JP.pl      |   189 +-
src/backend/utils/mb/Unicode/UCS_to_EUC_KR.pl      |    14 +-
src/backend/utils/mb/Unicode/UCS_to_EUC_TW.pl      |    10 +-
src/backend/utils/mb/Unicode/UCS_to_GB18030.pl     |    10 +-
src/backend/utils/mb/Unicode/UCS_to_JOHAB.pl       |    12 +-
.../utils/mb/Unicode/UCS_to_SHIFT_JIS_2004.pl      |    21 +-
src/backend/utils/mb/Unicode/UCS_to_SJIS.pl        |    32 +-
src/backend/utils/mb/Unicode/UCS_to_UHC.pl         |    12 +-
src/backend/utils/mb/Unicode/UCS_to_most.pl        |     6 +-
src/backend/utils/mb/Unicode/big5_to_utf8.map      | 18321 ++------
src/backend/utils/mb/Unicode/convutils.pm          |   806 +-
src/backend/utils/mb/Unicode/euc_cn_to_utf8.map    |  9723 +----
.../utils/mb/Unicode/euc_jis_2004_to_utf8.map      | 14744 ++-----
.../mb/Unicode/euc_jis_2004_to_utf8_combined.map   |    29 -
src/backend/utils/mb/Unicode/euc_jp_to_utf8.map    | 17337 ++------
src/backend/utils/mb/Unicode/euc_kr_to_utf8.map    | 10723 ++---
src/backend/utils/mb/Unicode/euc_tw_to_utf8.map    | 31407 ++++----------
src/backend/utils/mb/Unicode/gb18030_to_utf8.map   | 41882 +++++--------------
src/backend/utils/mb/Unicode/gbk_to_utf8.map       | 28344 +++----------
.../utils/mb/Unicode/iso8859_10_to_utf8.map        |   237 +-
.../utils/mb/Unicode/iso8859_13_to_utf8.map        |   237 +-
.../utils/mb/Unicode/iso8859_14_to_utf8.map        |   237 +-
.../utils/mb/Unicode/iso8859_15_to_utf8.map        |   237 +-
.../utils/mb/Unicode/iso8859_16_to_utf8.map        |   237 +-
src/backend/utils/mb/Unicode/iso8859_2_to_utf8.map |   205 +-
src/backend/utils/mb/Unicode/iso8859_3_to_utf8.map |   198 +-
src/backend/utils/mb/Unicode/iso8859_4_to_utf8.map |   205 +-
src/backend/utils/mb/Unicode/iso8859_5_to_utf8.map |   237 +-
src/backend/utils/mb/Unicode/iso8859_6_to_utf8.map |   158 +-
src/backend/utils/mb/Unicode/iso8859_7_to_utf8.map |   234 +-
src/backend/utils/mb/Unicode/iso8859_8_to_utf8.map |   201 +-
src/backend/utils/mb/Unicode/iso8859_9_to_utf8.map |   205 +-
src/backend/utils/mb/Unicode/johab_to_utf8.map     | 23327 +++--------
src/backend/utils/mb/Unicode/koi8r_to_utf8.map     |   237 +-
src/backend/utils/mb/Unicode/koi8u_to_utf8.map     |   237 +-
.../utils/mb/Unicode/shift_jis_2004_to_utf8.map    | 14503 ++-----
.../mb/Unicode/shift_jis_2004_to_utf8_combined.map |    29 -
src/backend/utils/mb/Unicode/sjis_to_utf8.map      | 10202 ++---
src/backend/utils/mb/Unicode/uhc_to_utf8.map       | 23788 +++--------
src/backend/utils/mb/Unicode/utf8_to_big5.map      | 17809 ++------
src/backend/utils/mb/Unicode/utf8_to_euc_cn.map    | 11487 ++---
.../utils/mb/Unicode/utf8_to_euc_jis_2004.map      | 23868 ++++++-----
.../mb/Unicode/utf8_to_euc_jis_2004_combined.map   |    29 -
src/backend/utils/mb/Unicode/utf8_to_euc_jp.map    | 20314 ++++-----
src/backend/utils/mb/Unicode/utf8_to_euc_kr.map    | 14617 +++----
src/backend/utils/mb/Unicode/utf8_to_euc_tw.map    | 24574 +++--------
src/backend/utils/mb/Unicode/utf8_to_gb18030.map   | 40292 +++++-------------
src/backend/utils/mb/Unicode/utf8_to_gbk.map       | 26061 ++----------
.../utils/mb/Unicode/utf8_to_iso8859_10.map        |   240 +-
.../utils/mb/Unicode/utf8_to_iso8859_13.map        |   239 +-
.../utils/mb/Unicode/utf8_to_iso8859_14.map        |   272 +-
.../utils/mb/Unicode/utf8_to_iso8859_15.map        |   227 +-
.../utils/mb/Unicode/utf8_to_iso8859_16.map        |   257 +-
src/backend/utils/mb/Unicode/utf8_to_iso8859_2.map |   240 +-
src/backend/utils/mb/Unicode/utf8_to_iso8859_3.map |   232 +-
src/backend/utils/mb/Unicode/utf8_to_iso8859_4.map |   240 +-
src/backend/utils/mb/Unicode/utf8_to_iso8859_5.map |   229 +-
src/backend/utils/mb/Unicode/utf8_to_iso8859_6.map |   171 +-
src/backend/utils/mb/Unicode/utf8_to_iso8859_7.map |   248 +-
src/backend/utils/mb/Unicode/utf8_to_iso8859_8.map |   194 +-
src/backend/utils/mb/Unicode/utf8_to_iso8859_9.map |   226 +-
src/backend/utils/mb/Unicode/utf8_to_johab.map     | 23380 +++--------
src/backend/utils/mb/Unicode/utf8_to_koi8r.map     |   301 +-
src/backend/utils/mb/Unicode/utf8_to_koi8u.map     |   312 +-
.../utils/mb/Unicode/utf8_to_shift_jis_2004.map    | 18954 ++++-----
.../mb/Unicode/utf8_to_shift_jis_2004_combined.map |    29 -
src/backend/utils/mb/Unicode/utf8_to_sjis.map      | 11648 ++----
src/backend/utils/mb/Unicode/utf8_to_uhc.map       | 23612 +++--------
src/backend/utils/mb/Unicode/utf8_to_win1250.map   |   266 +-
src/backend/utils/mb/Unicode/utf8_to_win1251.map   |   259 +-
src/backend/utils/mb/Unicode/utf8_to_win1252.map   |   267 +-
src/backend/utils/mb/Unicode/utf8_to_win1253.map   |   244 +-
src/backend/utils/mb/Unicode/utf8_to_win1254.map   |   276 +-
src/backend/utils/mb/Unicode/utf8_to_win1255.map   |   260 +-
src/backend/utils/mb/Unicode/utf8_to_win1256.map   |   320 +-
src/backend/utils/mb/Unicode/utf8_to_win1257.map   |   259 +-
src/backend/utils/mb/Unicode/utf8_to_win1258.map   |   284 +-
src/backend/utils/mb/Unicode/utf8_to_win866.map    |   280 +-
src/backend/utils/mb/Unicode/utf8_to_win874.map    |   225 +-
src/backend/utils/mb/Unicode/win1250_to_utf8.map   |   232 +-
src/backend/utils/mb/Unicode/win1251_to_utf8.map   |   236 +-
src/backend/utils/mb/Unicode/win1252_to_utf8.map   |   232 +-
src/backend/utils/mb/Unicode/win1253_to_utf8.map   |   220 +-
src/backend/utils/mb/Unicode/win1254_to_utf8.map   |   230 +-
src/backend/utils/mb/Unicode/win1255_to_utf8.map   |   214 +-
src/backend/utils/mb/Unicode/win1256_to_utf8.map   |   237 +-
src/backend/utils/mb/Unicode/win1257_to_utf8.map   |   225 +-
src/backend/utils/mb/Unicode/win1258_to_utf8.map   |   228 +-
src/backend/utils/mb/Unicode/win866_to_utf8.map    |   237 +-
src/backend/utils/mb/Unicode/win874_to_utf8.map    |   204 +-
src/backend/utils/mb/conv.c                        |   251 +-
.../conversion_procs/utf8_and_big5/utf8_and_big5.c |     4 +-
.../utf8_and_cyrillic/utf8_and_cyrillic.c          |     8 +-
.../utf8_and_euc2004/utf8_and_euc2004.c            |     6 +-
.../utf8_and_euc_cn/utf8_and_euc_cn.c              |     4 +-
.../utf8_and_euc_jp/utf8_and_euc_jp.c              |     4 +-
.../utf8_and_euc_kr/utf8_and_euc_kr.c              |     4 +-
.../utf8_and_euc_tw/utf8_and_euc_tw.c              |     4 +-
.../utf8_and_gb18030/utf8_and_gb18030.c            |     4 +-
.../conversion_procs/utf8_and_gbk/utf8_and_gbk.c   |     4 +-
.../utf8_and_iso8859/utf8_and_iso8859.c            |    75 +-
.../utf8_and_johab/utf8_and_johab.c                |     4 +-
.../conversion_procs/utf8_and_sjis/utf8_and_sjis.c |     4 +-
.../utf8_and_sjis2004/utf8_and_sjis2004.c          |     6 +-
.../conversion_procs/utf8_and_uhc/utf8_and_uhc.c   |     4 +-
.../conversion_procs/utf8_and_win/utf8_and_win.c   |    54 +-
src/include/mb/pg_wchar.h                          |    84 +-
111 files changed, 147742 insertions(+), 367346 deletions(-)


В списке pgsql-committers по дате отправления:

Предыдущее
От: Heikki Linnakangas
Дата:
Сообщение: [COMMITTERS] pgsql: Remove obsolete references to JIS0201.TXT JIS0208.TXT.
Следующее
От: Peter Eisentraut
Дата:
Сообщение: [COMMITTERS] pgsql: Change xlog to WAL in some error messages