Re: BUG #17501: COPY is failing with "ERROR: invalid byte sequence for encoding "UTF8": 0xe5"

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Re: BUG #17501: COPY is failing with "ERROR: invalid byte sequence for encoding "UTF8": 0xe5"
Дата
Msg-id 093b07ad-280c-4741-4519-f8db72420ffb@iki.fi
обсуждение исходный текст
Ответ на BUG #17501: COPY is failing with "ERROR: invalid byte sequence for encoding "UTF8": 0xe5"  (PG Bug reporting form <noreply@postgresql.org>)
Список pgsql-bugs
Sounds a lot like a bug in commit 
f82de5c46bdf8cd65812a7b04c9509c218e1545d. Thanks for the report, I'll 
investigate!

- Heikki

On 28/05/2022 23:57, Vitaly V. Voronov wrote:
> Hello,
> Right commands:
> # Imported without errors
> for i in $(seq 1 207); do echo
> "NURO光です。明日の宅内工事お立合いよろしくお願い致します。2回目の屋外工 
> 事につきましては具体的な工事日案内の準備が整い次第、こちらからご連絡いた 
> します。※詳細はこちら【工事について】https://www.test.jp/1234 
> /5678.html&id=12211 <https://www.test.jp/1234/5678.html&id=12211>" >> 
> /tmp/test_pass.csv; done;
> # Imported with errors
> for i in $(seq 1 5722); do echo "NURO光です。明日の宅内工事お立合いよろ 
> しくお願い致します。2回目の屋外工事につきましては具体的な工事日案内の準 
> 備が整い次第、こちらからご連絡いたします。※詳細はこちら【工事について】 
> https://www.test.jp/1234/5678.html&id=12211" >> /tmp/test_fail.csv; done;
> 
> 
> 28.05.2022, 23:53, "PG Bug reporting form" <noreply@postgresql.org>:
> 
>     The following bug has been logged on the website:
> 
>     Bug reference: 17501
>     Logged by: Vitaly Voronov
>     Email address: wizard_1024@tut.by <mailto:wizard_1024@tut.by>
>     PostgreSQL version: 14.3
>     Operating system: CentOS Linux release 7.9.2009 (Core)
>     Description:
> 
>     Hello,
> 
>     We've seen a such bug: COPY command shows error "ERROR: invalid byte
>     sequence for encoding "UTF8": 0xe5" on file.
>     The same file with small amount of lines is imported without any errors.
> 
>     How to reproduce bug:
>     # create database
>     # create database with
>     # SQL_ASCII, C, C
>     createdb --encoding=SQL_ASCII --lc-collate=C --lc-ctype=C
>     --template=template0 test
> 
>     # connect to the database
>     psql test
> 
>     # Create table
>     CREATE TABLE test_data (
>          test_data text
>     );
> 
>     # Import without error
>     truncate table test_data;
>     COPY test_data (test_data) FROM '/tmp/test_pass.csv' WITH DELIMITER
>     AS ','
>     CSV QUOTE AS '"';
> 
>     COPY 207
> 
>     # Import with error
>     truncate table test_data;
>     COPY test_data (test_data) FROM '/tmp/test_fail.csv' WITH DELIMITER
>     AS ','
>     CSV QUOTE AS '"';
> 
>     ERROR: invalid byte sequence for encoding "UTF8": 0xe5
>     CONTEXT: COPY test_data, line 627
> 
>     # both files contains the same rows, but test_fail contains more rows
>     # seems that the file more than 65K size cannot be imported
>     # if create DB with UTF8 encoding instead of SQL_ASCII - both tests
>     will be
>     passed
> 
>     # How to generate files:
>     # Imported without errors
>     for i in $(seq 1 207); do echo
>     "NURO光です。明日の宅内工事お立合いよろしくお願い致します。2回目の屋
>     外工事につきましては具体的な工事日案内の準備が整い次第、こちらからご
>     連絡いたします。※詳細はこちら【工事について】https://www.test.jp
>     /1234/5678.html&id=12211 <https://www.test.jp/1234/5678.html&id=12211>"
> 
>               /tmp/test_pass.csv; done;
> 
>     # Imported with errors
>     for i in $(seq 1 5722); do echo
>     "NURO光です。明日の宅内工事お立合いよろしくお願い致します。2回目の屋
>     外工事につきましては具体的な工事日案内の準備が整い次第、こちらからご
>     連絡いたします。※詳細はこちら【工事について】https://www.test.jp
>     /1234/5678.html&id=12211 <https://www.test.jp/1234/5678.html&id=12211>"
> 
>               /tmp/test_fail.csv; done;
> 
> 
>     # Both files can be imported without any problem to PostgreSQL 11.
> 




В списке pgsql-bugs по дате отправления:

Предыдущее
От: Noah Misch
Дата:
Сообщение: Re: Extension pg_trgm, permissions and pg_dump order
Следующее
От: Heikki Linnakangas
Дата:
Сообщение: Re: BUG #17501: COPY is failing with "ERROR: invalid byte sequence for encoding "UTF8": 0xe5"