Обсуждение: Question on inserting non-ascii strings
Using Postgres 8.3 with DBI 1.607, DBD::Pg 2.12.0, perl v5.10.0, I am trying to insert the medline database contents, which include non-ascii char's, hopefully using a prepared query. Playing with the locale, encoding, client_encoding, standard_conforming_strings, leaves me able to insert values using one-off querys, but I need to find out the correct way to handle these using prepared querys (currently using C, SQL_ASCII, SQL_ASCII, off). For example, given: create table foo ( bar varchar(255) ); and the author's name "P\x8FAZEK" Setting the client_encoding to "SQL_ASCII" does not help: the values can be inserted via E'P\x8FAZEK' but E'$1' will simply insert the "$1" literal into the table. Using convert_from( $1, 'SQL_ASCII' ) gets gets the values input, but with a warning. Q: Is there any combination of locale, encoding, client_encoding or functions that will allow me to insert values with these escape sequences without getting the warnings? Trying this in psql with various combinations of prepares statements leaves me unable to use convert_from with a varchar argument (requires bytea). If there is an example in the doc's I'd appreciate a link to it. thanks -- Steven Lembark 85-09 90th St. Workhorse Computing Woodhaven, NY, 11421 lembark@wrkhors.com +1 888 359 3508
On Thu, May 14, 2009 at 11:25:33AM -0400, Steven Lembark wrote: > Q: Is there any combination of locale, encoding, > client_encoding or functions that will allow > me to insert values with these escape sequences > without getting the warnings? > > Trying this in psql with various combinations of > prepares statements leaves me unable to use > convert_from with a varchar argument (requires > bytea). I think you're confused about where the escaping happens; in SQL, escape sequences are expanded as it is being parsed (actually, it's the step before known as "lex"ing, but the two stages are normally rolled together unless precision is needed). Parameters are passed after the SQL has been parsed and hence no expansion of escape sequences in your values is done. You want to be using whatever language you're generating the parameter from (Perl) to handle the expansion of escape sequences for you. This will cause the expanded string (i.e. the escapes have been interpreted) to be sent to Postgres and everything should just work. Unfortunately I don't use Perl much, so can't give much in the way of a demo--hopefully others will. -- Sam http://samason.me.uk/
On Thu, 14 May 2009 18:44:57 +0100 Sam Mason <sam@samason.me.uk> wrote: > You want to be using whatever language you're generating the parameter > from (Perl) to handle the expansion of escape sequences for you. This > will cause the expanded string (i.e. the escapes have been interpreted) > to be sent to Postgres and everything should just work. Unfortunately I > don't use Perl much, so can't give much in the way of a demo--hopefully > others will. That is what I thought should happen, but using a database with encoding of UTF8 and client_encoding of UTF8 still gave me the warnings. thanx -- Steven Lembark 85-09 90th St. Workhorse Computing Woodhaven, NY, 11421 lembark@wrkhors.com +1 888 359 3508
On Thu, May 14, 2009 at 01:57:04PM -0400, Steven Lembark wrote: > On Thu, 14 May 2009 18:44:57 +0100 Sam Mason <sam@samason.me.uk> wrote: > > You want to be using whatever language you're generating the parameter > > from (Perl) to handle the expansion of escape sequences for you. This > > will cause the expanded string (i.e. the escapes have been interpreted) > > to be sent to Postgres and everything should just work. Unfortunately I > > don't use Perl much, so can't give much in the way of a demo--hopefully > > others will. > > That is what I thought should happen, but > using a database with encoding of UTF8 and > client_encoding of UTF8 still gave me the > warnings. Which "warnings" are you talking about? I thought you said you had plain SQL working OK, but were struggling to pass parameters containing UTF-8 encoded characters. Are you sure that your Perl code is passing the string encoded as UTF8? -- Sam http://samason.me.uk/
> Which "warnings" are you talking about? I thought you said you had > plain SQL working OK, but were struggling to pass parameters containing > UTF-8 encoded characters. > > Are you sure that your Perl code is passing the string encoded as UTF8? Excellent point: Perl will only pass through the converted UTF8, if I remember to convert it from unicode! The earlier version of the database used SQL_ASCII and took the unicode byte as-is with a nastygram about trans-ascii byte -- which led me down the path of E'xxx'. While trying to make that work I'd accidentally fat-figered out the utf8 conversion attempting to comment it out. Character Unicode Code Unicode Name UTF Encoding (from http://www.nlm.nih.gov/databases/dtd/medline_character_database.html#notes) use utf8; x $c = "\x{F8}" x utf8::encode $c; x $c 0 'ø' <-- UTF8 for a slashed o. $sth->execute( $c ) x $d = $dbh->selectall_arrayref( 'select * from foo' ); 0 ARRAY(0x18ef0d0) 0 ARRAY(0x18cc1e8) 0 'ø' <-- bytes in correct order x utf8::decode $d->[0][0] DB<106> x $d 0 ARRAY(0x18ef0d0) 0 ARRAY(0x18cc1e8) I'm still not sure whether using UTF8 or unicode is the best way going forward, but will probably stick with UTF8 in case I have to deal with any offball character sets. thanx
On Thu, May 14, 2009 at 07:39:41PM -0400, Steven Lembark wrote: > > > Which "warnings" are you talking about? I thought you said you had > > plain SQL working OK, but were struggling to pass parameters containing > > UTF-8 encoded characters. > > > > Are you sure that your Perl code is passing the string encoded as UTF8? > > Excellent point: Perl will only pass through the > converted UTF8, if I remember to convert it from > unicode! Probably a minor point; but Unicode doesn't mean much to me here, Unicode defines a set of code-points and their mapping to "characters". These code-points are always encoded in some form whether that be UTF-8, UTF-16 or something else. Strings may be handled in Perl as Unicode strings of unspecified encoding (I don't know) and hence you need to specify what encoding you want them in if you ask for an array of bytes. > The earlier version of the database used SQL_ASCII > and took the unicode byte as-is with a nastygram > about trans-ascii byte -- which led me down the > path of E'xxx'. While trying to make that work I'd > accidentally fat-figered out the utf8 conversion > attempting to comment it out. Doh, easy to do. One things that I'd be tempted to do to make sure your code is doing the right thing would be to test that the characters are correct using another client--psql would be a reasonable choice. That way you can have some more confidence that if you have to get other code to talk to the database it's actually doing what you want it to be doing and it's not just Perl being consistent with itself, as it should be. I know I've saved myself a few times doing similar things like this. -- Sam http://samason.me.uk/