Обсуждение: Character encodings...
I am trying to fill up a database using psql program. A file I have prepared contains Russian in KOI8-R encoding. When I try to process this file using `psql -f file db', it fails: no diagnostics, nothing; it just shows that EOF is reached. When I replace Russian letters with something in ASCII, it works just fine. The main problem is that my second file gets processed just fine. Where to look to? What additional information is needed? :) Thanks, -- Mike
On Thu, 13 Apr 2000, Michael Sobolev wrote: > I am trying to fill up a database using psql program. A file I have prepared > contains Russian in KOI8-R encoding. When I try to process this file using > `psql -f file db', it fails: no diagnostics, nothing; it just shows that EOF is > reached. When I replace Russian letters with something in ASCII, it works just > fine. The main problem is that my second file gets processed just fine. > > Where to look to? What additional information is needed? :) OS, locale, Postgres version, whether Postgres was compiled with locale, multibyte... Oleg. ---- Oleg Broytmann http://members.xoom.com/phd2.1/ phd2@earthling.net Programmers don't die, they just GOSUB without RETURN.
On Thu, Apr 13, 2000 at 10:20:39AM +0000, Oleg Broytmann wrote: > OS, locale, Postgres version, whether Postgres was compiled with locale, > multibyte... Debian GNU/Linux (frozen), 6.5.3-17 (-17 -- debian revision), yes, =UNICODE. -- Mike
Michael Sobolev wrote: >On Thu, Apr 13, 2000 at 10:20:39AM +0000, Oleg Broytmann wrote: >> OS, locale, Postgres version, whether Postgres was compiled with locale >, >> multibyte... >Debian GNU/Linux (frozen), 6.5.3-17 (-17 -- debian revision), yes, =UNICODE. Turn on logging in the backend (edit /etc/postgresql/postmaster.init) and restart the postmaster (/etc/init.d/postgresql restart). See what you get in the log. -- Oliver Elphick Oliver.Elphick@lfix.co.uk Isle of Wight http://www.lfix.co.uk/oliver PGP key from public servers; key ID 32B8FAA1 ======================================== "I sought the LORD, and he heard me, and delivered me from all my fears." Psalms 34:41
On Thu, 13 Apr 2000, Michael Sobolev wrote: > > OS, locale, Postgres version, whether Postgres was compiled with locale, > > multibyte... > Debian GNU/Linux (frozen), 6.5.3-17 (-17 -- debian revision), yes, =UNICODE. Not sure how well Postgres works with UNICODE. It works pretty well with KOI8-R and Windows-1251 encodings... Oleg. ---- Oleg Broytmann http://members.xoom.com/phd2.1/ phd2@earthling.net Programmers don't die, they just GOSUB without RETURN.
On Thu, Apr 13, 2000 at 11:54:17AM +0100, Oliver Elphick wrote: > Turn on logging in the backend (edit /etc/postgresql/postmaster.init) and > restart the postmaster (/etc/init.d/postgresql restart). See what you get > in the log. What level of debug should be sufficient? I've got an impression that it's psql that does not process correctly the stuff. I have a very simple statement: insert into news values ('2000-04-13', NULL, ''); This works just fine. Now I replace '' with 'A' (A -- 65). It still works just fine. Now I replace this latin A with Russian A. And psql shows: $ psql -f test.sql stuff insert into news values ('2000-04-12', NULL, 'А'); EOF -- Mike
Michael Sobolev wrote: >On Thu, Apr 13, 2000 at 11:54:17AM +0100, Oliver Elphick wrote: >> Turn on logging in the backend (edit /etc/postgresql/postmaster.init) and >> restart the postmaster (/etc/init.d/postgresql restart). See what you get >> in the log. >What level of debug should be sufficient? 2 Also set PGECHO in postmaster.init, so that queries are echoed in the log. >I've got an impression that it's psql that does not process correctly the >stuff. > >I have a very simple statement: > > insert into news values ('2000-04-13', NULL, ''); > >This works just fine. Now I replace '' with 'A' (A -- 65). It still works >just fine. Now I replace this latin A with Russian A. And psql shows: > >$ psql -f test.sql stuff >insert into news values ('2000-04-12', NULL, 'á'); >EOF The trouble is, I don't know how to test this. How do I produce Russian characters on an English keyboard? -- Oliver Elphick Oliver.Elphick@lfix.co.uk Isle of Wight http://www.lfix.co.uk/oliver PGP key from public servers; key ID 32B8FAA1 ======================================== "I sought the LORD, and he heard me, and delivered me from all my fears." Psalms 34:41
On Thu, Apr 13, 2000 at 02:52:13PM +0100, Oliver Elphick wrote: > >What level of debug should be sufficient? > > 2 > > Also set PGECHO in postmaster.init, so that queries are echoed in the log. OK. I'll try. > The trouble is, I don't know how to test this. How do I produce Russian > characters on an English keyboard? I am almost sure that this may fail if it's just a character from the upper half of 256. In vim: ^V240 :) -- Mike
On Thu, Apr 13, 2000 at 02:52:13PM +0100, Oliver Elphick wrote: > 2 > > Also set PGECHO in postmaster.init, so that queries are echoed in the log. Here it goes. I would not say it's very useful... Russian a has code 225 (decimal). -- Mike binding ShmemCreate(key=52e2c1, size=2006016) /usr/lib/postgresql/bin/postmaster: ServerLoop: handling reading 4 /usr/lib/postgresql/bin/postmaster: ServerLoop: handling reading 4 /usr/lib/postgresql/bin/postmaster: ServerLoop: handling writing 4 /usr/lib/postgresql/bin/postmaster: BackendStartup: pid 30613 user mss db stuff socket 4 /usr/lib/postgresql/bin/postmaster child[30613]: starting with (/usr/lib/postgresql/bin/postgres -d2 -B 128 -E -v131072 -pstuff ) FindExec: found "/usr/lib/postgresql/bin/postgres" using argv[0] debug info: User = mss RemoteHost = localhost RemotePort = 0 DatabaseName = stuff Verbose = 2 Noversion = f timings = f dates = European bufsize = 128 sortmem = 512 query echo = t InitPostgres reset_client_encoding().. reset_client_encoding() done. StartTransactionCommand query: select getdatabaseencoding() ProcessQuery CommitTransactionCommand StartTransactionCommand query: SET client_encoding = 'UNICODE' ProcessUtility: SET client_encoding = 'UNICODE' CommitTransactionCommand proc_exit(0) [#0] shmem_exit(0) [#0] exit(0) /usr/lib/postgresql/bin/postmaster: reaping dead processes... /usr/lib/postgresql/bin/postmaster: CleanupProc: pid 30613 exited with status 0
On Thu, 13 Apr 2000, Michael Sobolev wrote: > Here it goes. I would not say it's very useful... Russian a has code 225 > (decimal). > StartTransactionCommand > query: SET client_encoding = 'UNICODE' > ProcessUtility: SET client_encoding = 'UNICODE' > CommitTransactionCommand > proc_exit(0) [#0] > shmem_exit(0) [#0] > exit(0) > /usr/lib/postgresql/bin/postmaster: reaping dead processes... > /usr/lib/postgresql/bin/postmaster: CleanupProc: pid 30613 exited with status 0 That looks like the query never got to the backend. This is either a bug in psql or the multibyte suite. I seem to recall that Unicode isn't fully supported, so I'd go for the latter. Can Tatsuo comment? -- Peter Eisentraut Sernanders väg 10:115 peter_e@gmx.net 75262 Uppsala http://yi.org/peter-e/ Sweden
> On Thu, 13 Apr 2000, Michael Sobolev wrote: > > > Here it goes. I would not say it's very useful... Russian a has code 225 > > (decimal). > > > StartTransactionCommand > > query: SET client_encoding = 'UNICODE' > > ProcessUtility: SET client_encoding = 'UNICODE' > > CommitTransactionCommand > > proc_exit(0) [#0] > > shmem_exit(0) [#0] > > exit(0) > > /usr/lib/postgresql/bin/postmaster: reaping dead processes... > > /usr/lib/postgresql/bin/postmaster: CleanupProc: pid 30613 exited with status 0 > > That looks like the query never got to the backend. This is either a bug > in psql or the multibyte suite. I seem to recall that Unicode isn't fully > supported, so I'd go for the latter. Can Tatsuo comment? Oh, he is using the multibyte support and expects an automatic code conversion between KOI8-R and UNICODE that is not supported yet. What he need to do is creating a database with encoding KOI8-R or ISO-8859-5. # make a KOI8-R database $ createdb -E KOI8 or # make a ISO-8859-5 database $ createdb -E LATIN5 In the next case, he might want to set PGCLIENTENCODING environment variable so that a conversion between KOI8-R and ISO-8859-5 automatically performed. # if you want to use KOI8-R on your client. $ export PGCLIENTENCODING=KOI8 or % setenv PGCLIENTENCODING KOI8 -- Tatsuo Ishii
On Fri, Apr 14, 2000 at 03:44:09PM +0900, Tatsuo Ishii wrote: > Oh, he is using the multibyte support and expects an automatic code > conversion between KOI8-R and UNICODE that is not supported yet. Not exactly. If you had a look on my first message, you would see that the problem I see that the behaviour is not consistent. Some time this data gets through, and sometimes it does not. I'd say that an arbitrary text in KOI8-R can hardly be something reasonable in UTF-8, so I'd see that all (yes, ALL) my requests would fail (and preferably with correct diagnostics). > # make a KOI8-R database > $ createdb -E KOI8 Thanks. I was looking for something like this in man page, but unfortunately it does not have this information. > In the next case, he might want to set PGCLIENTENCODING environment > variable so that a conversion between KOI8-R and ISO-8859-5 > automatically performed. What are the requirements for this to work? Thanks, -- Mike
> On Fri, Apr 14, 2000 at 03:44:09PM +0900, Tatsuo Ishii wrote: > > Oh, he is using the multibyte support and expects an automatic code > > conversion between KOI8-R and UNICODE that is not supported yet. > Not exactly. If you had a look on my first message, you would see that the > problem I see that the behaviour is not consistent. Some time this data gets > through, and sometimes it does not. I'd say that an arbitrary text in KOI8-R > can hardly be something reasonable in UTF-8, so I'd see that all (yes, ALL) my > requests would fail (and preferably with correct diagnostics). Sorry. I don't understand your point. What I wanted to say was KOI8-R and UTF-8 are totally different encodings (except ASCII part). > > # make a KOI8-R database > > $ createdb -E KOI8 > Thanks. I was looking for something like this in man page, but unfortunately > it does not have this information. Please look at doc/README.mb. > > In the next case, he might want to set PGCLIENTENCODING environment > > variable so that a conversion between KOI8-R and ISO-8859-5 > > automatically performed. > What are the requirements for this to work? Please explain your backgrounds. If you need KOI8-R only, you could forget about ISO-8859-5. -- Tatsuo Ishii