Re: confusing / inefficient "need_transcoding" handling in copy

Поиск
Список
Период
Сортировка
От Sutou Kouhei
Тема Re: confusing / inefficient "need_transcoding" handling in copy
Дата
Msg-id 20240214.114608.2091541942684063981.kou@clear-code.com
обсуждение исходный текст
Ответ на Re: confusing / inefficient "need_transcoding" handling in copy  (Michael Paquier <michael@paquier.xyz>)
Список pgsql-hackers
Hi,

In <ZcvlgMEjt3qY8eiL@paquier.xyz>
  "Re: confusing / inefficient "need_transcoding" handling in copy" on Wed, 14 Feb 2024 06:56:16 +0900,
  Michael Paquier <michael@paquier.xyz> wrote:

> We have a couple of non-ASCII characters in the tests, but I suspect
> that this one will not be digested correctly everywhere, even if
> EUC_JP should be OK to use for the check.  How about writing an
> arbitrary sequence of bytes into a temporary file that gets used for 
> the COPY FROM instead?  See for example how we do that with
> abs_builddir in copy.sql.

It makes sense. How about the attached patch?


Thanks,
-- 
kou
From 6eb9669f97c54f8b85fac63db40ad80664692d12 Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Wed, 14 Feb 2024 11:44:13 +0900
Subject: [PATCH v2] Add a test for invalid encoding for COPY FROM

The test data use an UTF-8 character (U+3042 HIRAGANA LETTER A) but
the test specifies EUC_JP. So it's an invalid data.
---
 src/test/regress/expected/copyencoding.out | 13 +++++++++++++
 src/test/regress/parallel_schedule         |  2 +-
 src/test/regress/sql/copyencoding.sql      | 15 +++++++++++++++
 3 files changed, 29 insertions(+), 1 deletion(-)
 create mode 100644 src/test/regress/expected/copyencoding.out
 create mode 100644 src/test/regress/sql/copyencoding.sql

diff --git a/src/test/regress/expected/copyencoding.out b/src/test/regress/expected/copyencoding.out
new file mode 100644
index 0000000000..32a9d918fa
--- /dev/null
+++ b/src/test/regress/expected/copyencoding.out
@@ -0,0 +1,13 @@
+--
+-- Test cases for COPY WITH (ENCODING)
+--
+-- directory paths are passed to us in environment variables
+\getenv abs_builddir PG_ABS_BUILDDIR
+CREATE TABLE test (t text);
+\set utf8_csv :abs_builddir '/results/copyencoding_utf8.csv'
+-- U+3042 HIRAGANA LETTER A
+COPY (SELECT E'\u3042') TO :'utf8_csv' WITH (FORMAT csv, ENCODING 'UTF8');
+COPY test FROM :'utf8_csv' WITH (FORMAT csv, ENCODING 'EUC_JP');
+ERROR:  invalid byte sequence for encoding "EUC_JP": 0xe3 0x81
+CONTEXT:  COPY test, line 1
+DROP TABLE test;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 1d8a414eea..238cef28c4 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -36,7 +36,7 @@ test: geometry horology tstypes regex type_sanity opr_sanity misc_sanity comment
 # execute two copy tests in parallel, to check that copy itself
 # is concurrent safe.
 # ----------
-test: copy copyselect copydml insert insert_conflict
+test: copy copyselect copydml copyencoding insert insert_conflict
 
 # ----------
 # More groups of parallel tests
diff --git a/src/test/regress/sql/copyencoding.sql b/src/test/regress/sql/copyencoding.sql
new file mode 100644
index 0000000000..89e2124996
--- /dev/null
+++ b/src/test/regress/sql/copyencoding.sql
@@ -0,0 +1,15 @@
+--
+-- Test cases for COPY WITH (ENCODING)
+--
+
+-- directory paths are passed to us in environment variables
+\getenv abs_builddir PG_ABS_BUILDDIR
+
+CREATE TABLE test (t text);
+
+\set utf8_csv :abs_builddir '/results/copyencoding_utf8.csv'
+-- U+3042 HIRAGANA LETTER A
+COPY (SELECT E'\u3042') TO :'utf8_csv' WITH (FORMAT csv, ENCODING 'UTF8');
+COPY test FROM :'utf8_csv' WITH (FORMAT csv, ENCODING 'EUC_JP');
+
+DROP TABLE test;
-- 
2.43.0


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Kapila
Дата:
Сообщение: Re: Synchronizing slots from primary to standby
Следующее
От: Andres Freund
Дата:
Сообщение: Re: Improve WALRead() to suck data directly from WAL buffers when possible