Обсуждение: importing data
Hello, perhaps you may have some advice. The postgresql documentation for COPY FROM INFILE suggests that high-ascii characters be encoded to a backslash followed by the octal value for the character. In addition to this change, to insert a backslash, a double-bacckslash must be emitted. After we discovered this we wrote a short filter program to pipe input data through to encode high-ascii characters correctly as well as escape slashes. This program can be seen here: http://www.neverlight.com/~mental/pginput-filter.c What we're wondering about is, is there perhaps a better or easier way to handle data like this? Granted we didnt spend a ton of time on google, but we did search the docs a little before settling on a filter for the sake of expediency. -- Mental (Mental@NeverLight.com) I've been told that I need to warn people about inappropriate content. So if anything I say or post is inappropriate, dont look at it. GPG public key: http://www.neverlight.com/pas/Mental.asc
Mental <Mental@NeverLight.com> writes: > The postgresql documentation for COPY FROM INFILE suggests that high-ascii > characters be encoded to a backslash followed by the octal value for the > character. While it's certainly possible to do that, I don't see anyplace in the current documentation that recommends it. What did you conclude that from? regards, tom lane
On Sat, Jan 17, 2004 at 10:27:09PM -0500, Tom Lane wrote: > Mental <Mental@NeverLight.com> writes: > > The postgresql documentation for COPY FROM INFILE suggests that high-ascii > > characters be encoded to a backslash followed by the octal value for the > > character. > > While it's certainly possible to do that, I don't see anyplace in the > current documentation that recommends it. What did you conclude that > from? > http://www.postgresql.org/docs/7.3/static/sql-copy.html indicated: \digits , Backslash followed by one to three octal digits specifies the character with that numeric code We were having trouble with characters that were high ascii encoded. Perhaps it was how we were connecting to do the import, but we found that escaping them as so helped. After filtering, data is copied into the tables like so: ENCODING='SQL_ASCII' $FILTER_DATA psql -U $USER -c "SET CLIENT_ENCODING TO '$ENCODING'; copy $i from '$DATA_DIR/$i-noheader.tab' NULL as '' " $DB -- Mental (Mental@NeverLight.com) I've been told that I need to warn people about inappropriate content. So if anything I say or post is inappropriate, dont look at it. GPG public key: http://www.neverlight.com/pas/Mental.asc
Mental <Mental@NeverLight.com> writes: > We were having trouble with characters that were high ascii encoded. You probably need to pay attention to your client_encoding setting, and perhaps also reconsider what database encoding you are using. If either of these is not SQL_ASCII then it had better be an accurate description of the character set you are using, else you're in for a world of hurt :-(. Also, setting client_encoding to SQL_ASCII when the database encoding is something else does not get you out of having to respect the encoding setting --- it just prevents any automatic conversion from happening during I/O. The data you ship had better be in the database encoding in this case. regards, tom lane