Re: Import/Export failure due to UTF-8 error in pgAdmin4 but not in pgAdmin3

Поиск
Список
Период
Сортировка
От Nanina Tron
Тема Re: Import/Export failure due to UTF-8 error in pgAdmin4 but not in pgAdmin3
Дата
Msg-id deac5cf9-cb88-482d-94ee-f68ef345316e@me.com
обсуждение исходный текст
Ответы Re: Import/Export failure due to UTF-8 error in pgAdmin4 but not in pgAdmin3  (Dave Page <dpage@pgadmin.org>)
Список pgadmin-support

Hi,

yes thank you I will scan the table so at least I will see the 'bad' characters. We will see how many there are.
Sorry, i forgot to append the data.

Nanina

Am 08. Januar 2019 um 15:47 schrieb richard coleman <rcoleman.ascentgl@gmail.com>:

Nanina, 

I am glad to hear that you are finding the discussion illuminating. You might want to send a copy of your backup file (assuming it isn't sensitive data) to Dave so that he can test pgAdmin4 against it.  You might also want to provide your OS and postgreSQL information as well.  I've noticed that your collation is DE (German) perhaps the issue is accented characters?  You can run the is_UTF8() function I've provided, it is non destructive and just returns a Boolean that indicates whether or not there are any characters in the selected column that aren't valid UTF8.  For example in the table table1 to check the text field notes for invalid characters you could run:

SELECT id, is_UTF8(notes) from table1 WHERE is_UTF8(notes) = false;

This would return the list of id's of records that had invalid UTF8 characters in the notes field. You should be able to use that to narrow down the issue.

I hope that helps, 

rik.

On Tue, Jan 8, 2019 at 9:21 AM Nanina Tron <nanina.tron@icloud.com> wrote:
Hi,

thanks for the multiple responses and interesting discussion I am struggeling to keep up with all the new information.
I double checked, all settings I could find and the .csv files I want to import. Everything is manually, or at creation of the db set to UTF8. I tried to copy one of the tables via backup/restore what worked, but of course I migrated the problem with it.
I just don't see where I did something wrong. As you referred to the documentation https://www.postgresql.org/docs/current/multibyte.html, this was the information why I created the db in UTF8 in the first place.
This is the SQL code of the db (from pgAdmin4 tab)
   CREATE DATABASE testdb
    WITH
    OWNER = postgres
    ENCODING = 'UTF8'
    LC_COLLATE = 'de_DE.UTF-8'
    LC_CTYPE = 'de_DE.UTF-8'
    TABLESPACE = pg_default
    CONNECTION LIMIT = -1;

So what I understand from your explanations is that I imported kind of invalid data with pgAdmin3 to begin with and now can't copy these tables in pgAdmin4 because psql has a problem with it? If so why can't I import a new .csv file in UTF8 into an new created table with pgAdmin4? I just have one option to generate the .csv in UTF8 with Excel and this isn't working.

I did not try to eliminate the problematic characters with your code ric, as I am not sure if there are characters we really need, so I would prefere not to alter the data.

I am not sure if that helps, but I just append the file I am not able to Import.

Thanks a lot so far,
Nanina

_________________________________________________


Am 08. Januar 2019 um 15:02 schrieb richard coleman <rcoleman.ascentgl@gmail.com>:

Dave, 

I would imagine Nanina would be in a better position to provide you with problematic import/export data in the short term.  I don't tend to import/export that often these days, preferring to use SQL statements for most things short of a full backup/restore (in my case I've found it to be much less picky). As mentioned previously, in my experience the characters that tend to trip up pgAdmin4 are Windows special characters.  I would imagine the upper Windows-1252 character set as being particularly problematic for pgAdmin4 if it is expecting proper UTF-8 (i.e. ŒœŠšŸŽžƒˆ˜–—‘’‚“”„†‡•…‰‹›€™).  This would explain why Windows ODBC, .Net, and pSQL have no problems dealing with the data.  I would imagine if it the database was set up with  ENCODING =  'WIN1252' then postgreSQL would do the translation into UTF-8 for pgAdmin4, but since it isn't postgreSQL can't provide pgAdmin4 with any help.  It's up to pgAdmin4 to deal with the otherwise valid data appropriately.

I hope your workaround pans out, until then I will spend my time at the psql prompt, or if the data is needed elsewhere run the two functions I had included previously to identify and remove the offensive characters.

Here's the create database script for one of my databases, perhaps it can shed some light (it was originally an 8.3 postgreSQL database {long before my time here, currently running under postgreSQL 10.x} and apparently back then it defaulted to creating SQL_ASCII encoded databases on Windows).

CREATE DATABASE tms_production
    WITH 
    OWNER = local_user
    ENCODING = 'SQL_ASCII'
    LC_COLLATE = 'English_United States.1252'
    LC_CTYPE = 'English_United States.1252'
    TABLESPACE = pg_default
    CONNECTION LIMIT = -1;
ALTER DATABASE tms_production
    SET default_transaction_read_only TO off;
ALTER DATABASE tms_production
    SET client_encoding TO sql_ascii;
ALTER DATABASE tms_production
    SET standard_conforming_strings TO off;  

rik. 

On Tue, Jan 8, 2019 at 8:37 AM Dave Page <dpage@pgadmin.org> wrote:
Hi Rik

On Tue, Jan 8, 2019 at 6:53 PM richard coleman
<rcoleman.ascentgl@gmail.com> wrote:
>
> Dave,
>
> Thanks for continuing this discussion, but I think you misunderstand the situation.  I am storing valid non-UTF8 data in a SQL_ASCII encoded postgreSQL database (please re-read what I had previously written). This is why psql has NO problem dealing with it.  This is also why Windows ODBC and .Net applications have NO problem dealing with it.  In fact the most common character that pgAdmin4 crashes on is the Windows smart quote.  So to reiterate, I am using valid non-UTF8 characters in a SQL_ASCII database.  This is a supported configuration for postgreSQL.  The issue seems to be that pgAdmin4 is assuming  UTF8 data and crashing/failing/throwing errors when it encounters invalid UTF8 characters.
>
> I hope I have made the situation a little bit clearer.

Well psql is failing to deal with it *in this case*, as that's what is
doing the \copy in the import/export tool.

In other cases (i.e. the ones where pgAdmin sees the data, such as
results in the query tool), the issue arises because Python and/or
Javascript (and by extension pgAdmin) may barf on data encoded in a
way they don't recognise. That's why the PostgreSQL docs say to only
use ASCII data in SQL_ASCII databases - the behaviour is undefined,
and as a result may either not render properly or may crash or error
on non-ASCII data.

Anyhoo, I expect to have a little time after dinner shortly so I'll
try out the workaround I thought of earlier to see if it helps (I
doubt it'll be a panacea, but it may help in some cases).

By any chance do you have a test case you can share with me that
refuses to export from pgAdmin (using the Import/Export tool)? If so,
I'd appreciate a copy of it to play with.

--
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Вложения

В списке pgadmin-support по дате отправления:

Предыдущее
От: Steve Hawes
Дата:
Сообщение: Change browser in MacOS
Следующее
От: richard coleman
Дата:
Сообщение: Re: Change browser in MacOS