Обсуждение: importing data

Поиск
Список
Период
Сортировка

importing data

От
Mental
Дата:
Hello, perhaps you may have some advice.

The postgresql documentation for COPY FROM INFILE suggests that high-ascii
characters be encoded to a backslash followed by the octal value for the
character.

In addition to this change, to insert a backslash, a double-bacckslash must be
emitted.

After we discovered this we wrote a short filter program to pipe input data
through to encode high-ascii characters correctly as well as escape slashes.
This program can be seen here:
http://www.neverlight.com/~mental/pginput-filter.c


What we're wondering about is, is there perhaps a better or easier way to
handle data like this? Granted we didnt spend a ton of time on google, but
we did search the docs a little before settling on a filter for the sake
of expediency.

--
Mental (Mental@NeverLight.com)

I've been told that I need to warn people about inappropriate content.
So if anything I say or post is inappropriate, dont look at it.

GPG public key: http://www.neverlight.com/pas/Mental.asc


Re: importing data

От
Tom Lane
Дата:
Mental <Mental@NeverLight.com> writes:
> The postgresql documentation for COPY FROM INFILE suggests that high-ascii
> characters be encoded to a backslash followed by the octal value for the
> character.

While it's certainly possible to do that, I don't see anyplace in the
current documentation that recommends it.  What did you conclude that
from?

            regards, tom lane

Re: importing data

От
Mental
Дата:
On Sat, Jan 17, 2004 at 10:27:09PM -0500, Tom Lane wrote:
> Mental <Mental@NeverLight.com> writes:
> > The postgresql documentation for COPY FROM INFILE suggests that high-ascii
> > characters be encoded to a backslash followed by the octal value for the
> > character.
>
> While it's certainly possible to do that, I don't see anyplace in the
> current documentation that recommends it.  What did you conclude that
> from?
>

http://www.postgresql.org/docs/7.3/static/sql-copy.html indicated:

\digits    , Backslash followed by one to three octal digits specifies the
character with that numeric code

We were having trouble with characters that were high ascii encoded.
Perhaps it was how we were connecting to do the import, but we found that
escaping them as so helped. After filtering, data is copied into the
tables like so:

ENCODING='SQL_ASCII'
$FILTER_DATA
psql -U $USER  -c "SET CLIENT_ENCODING TO '$ENCODING'; copy $i from
'$DATA_DIR/$i-noheader.tab' NULL as '' " $DB

--
Mental (Mental@NeverLight.com)

I've been told that I need to warn people about inappropriate content.
So if anything I say or post is inappropriate, dont look at it.

GPG public key: http://www.neverlight.com/pas/Mental.asc


Re: importing data

От
Tom Lane
Дата:
Mental <Mental@NeverLight.com> writes:
> We were having trouble with characters that were high ascii encoded.

You probably need to pay attention to your client_encoding setting,
and perhaps also reconsider what database encoding you are using.
If either of these is not SQL_ASCII then it had better be an accurate
description of the character set you are using, else you're in for a
world of hurt :-(.  Also, setting client_encoding to SQL_ASCII when the
database encoding is something else does not get you out of having to
respect the encoding setting --- it just prevents any automatic
conversion from happening during I/O.  The data you ship had better be
in the database encoding in this case.

            regards, tom lane