Обсуждение: BUG #5290: Simple loop with insert into and check to avoid duplicate values fails
BUG #5290: Simple loop with insert into and check to avoid duplicate values fails
От
"WildWezyr"
Дата:
The following bug has been logged online: Bug reference: 5290 Logged by: WildWezyr Email address: wi.ld.we.zy.r@gmail.com PostgreSQL version: 8.4.2 Operating system: Windows Vista Description: Simple loop with insert into and check to avoid duplicate values fails Details: This is simplified version of BUG #5289. Given this one table: create sequence spb_word_seq; create table spb_word ( id bigint not null primary key default nextval('spb_word_seq'), word varchar(410) not null unique ); and these functions (first simulates generating words, seconds performs main loop): create sequence spb_wordnum_seq; create or replace function spb_getWord() returns text as $$ declare rn int; letters varchar(255) := 'Ä ÄÄÅÅóÅźżjklmnopqrstuvwxyz'; llen int := length(letters); res text := ''; wordnum int; begin select nextval('spb_wordnum_seq') into wordnum; rn := 3 * (wordnum + llen * llen * llen); rn := (rn + llen) / (rn % llen + 1); rn := rn % (rn / 2 + 10); loop res := res || substring(letters, rn % llen, 1); rn := floor(rn / llen); exit when rn = 0; end loop; return res; end; $$ language plpgsql; create or replace function spb_runmeSimple2(cnt int) returns void as $$ declare w varchar(410); wordId int; begin perform setval('spb_wordnum_seq', 1, false); truncate table spb_word cascade; for i in 1 .. cnt loop if i % 100 = 0 then raise notice 'i = %', i; end if; select spb_getWord() into w; select id into wordId from spb_word where word = w; if wordId is null then insert into spb_word (word) values (w); end if; end loop; end; $$ language plpgsql; while executing select spb_runmeSimple2(10000000) I run into sql error: ERROR: duplicate key value violates unique constraint "spb_word_word_key" CONTEXT: SQL statement "insert into spb_word (word) values ( $1 )" PL/pgSQL function "spb_runmesimple2" line 15 at SQL statement after unpredictable number of iterations - iteration number for which execution will fail changes every time. If I eliminate polish national chars from function spb_getWord i.e. it will generate words with plain ascii chars there is no error and everything works fine.
"WildWezyr" <wi.ld.we.zy.r@gmail.com> writes: > If I eliminate polish national chars from function spb_getWord i.e. it will > generate words with plain ascii chars there is no error and everything works > fine. What that sounds like is a locale/encoding conflict. Postgres depends on strcoll() or local equivalent to produce consistent sort results, and sometimes if strcoll is presented with data that it thinks is invalidly encoded, it doesn't behave sanely. What locale setting and database encoding are you using? regards, tom lane
2010/1/19 Tom Lane <tgl@sss.pgh.pa.us> > > What that sounds like is a locale/encoding conflict. Postgres depends > on strcoll() or local equivalent to produce consistent sort results, > and sometimes if strcoll is presented with data that it thinks is > invalidly encoded, it doesn't behave sanely. > > What locale setting and database encoding are you using? > > regards, tom lane > Here are my settings: CREATE DATABASE spb WITH OWNER = wwspb ENCODING = 'UTF8' LC_COLLATE = 'Polish_Poland.1250' LC_CTYPE = 'Polish_Poland.1250' CONNECTION LIMIT = -1; Is anything wrong with them? And if so, how should I fix it? Keep in mind that I must deal with words containing different national characters from many languages (not only polish). (WW)
Wild Wezyr <wildwezyr@gmail.com> writes: > 2010/1/19 Tom Lane <tgl@sss.pgh.pa.us> >> What locale setting and database encoding are you using? > Here are my settings: > CREATE DATABASE spb > WITH OWNER = wwspb > ENCODING = 'UTF8' > LC_COLLATE = 'Polish_Poland.1250' > LC_CTYPE = 'Polish_Poland.1250' > CONNECTION LIMIT = -1; I tried the test case on Linux (Fedora 11) with locale pl_PL.utf8, and unsurprisingly failed to reproduce the problem. So it's something specific to Windows. Can anyone else reproduce it? regards, tom lane
2010/1/20 Tom Lane <tgl@sss.pgh.pa.us> > > I tried the test case on Linux (Fedora 11) with locale pl_PL.utf8, > and unsurprisingly failed to reproduce the problem. So it's something > specific to Windows. Can anyone else reproduce it? > > regards, tom lane > If it is of any help: changing LC_COLLATE and LC_CTYPE to 'C' eliminates the error. But when DB is created with LC_COLLATE and LC_CTYPE set to 'Polish_Poland.1250' I get errors (sometimes fast, sometimes I have to wait for several minutes). This corellation to locale 'C' / 'Polish_Poland.1250' was tested on two Windows Vista boxes. Elimination of polish national characters from generated words for locale 'Polish_Poland.1250' leads to same behavior - no errors. (WW)
Re: BUG #5290: Simple loop with insert into and check to avoid duplicate values fails
От
Heikki Linnakangas
Дата:
Tom Lane wrote: > Wild Wezyr <wildwezyr@gmail.com> writes: >> 2010/1/19 Tom Lane <tgl@sss.pgh.pa.us> >>> What locale setting and database encoding are you using? > >> Here are my settings: > >> CREATE DATABASE spb >> WITH OWNER = wwspb >> ENCODING = 'UTF8' >> LC_COLLATE = 'Polish_Poland.1250' >> LC_CTYPE = 'Polish_Poland.1250' >> CONNECTION LIMIT = -1; > > I tried the test case on Linux (Fedora 11) with locale pl_PL.utf8, > and unsurprisingly failed to reproduce the problem. So it's something > specific to Windows. Can anyone else reproduce it? I happen to have access to a Win32 virtual machine just now. CVS HEAD, built from sources on the VM. Seems to work fine. The test case runs for ages, I'm at about 1/3 through it, and no errors this far. I'm going to have to kill it now. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Re: BUG #5290: Simple loop with insert into and check to avoid duplicate values fails
От
Heikki Linnakangas
Дата:
Wild Wezyr wrote: > 2010/1/20 Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> > >> I happen to have access to a Win32 virtual machine just now. CVS HEAD, >> built from sources on the VM. >> >> Seems to work fine. The test case runs for ages, I'm at about 1/3 >> through it, and no errors this far. I'm going to have to kill it now. > > What OS you have on this Win32 box? Is it Windows Vista (like mine)? The login screen says: Windows Server 2003 R2, Datacenter x64 Edition > What version of Postgres you have used? Is it 8.3.3 or 8.4.2 like mine? A fresh checkout from CVS HEAD. > What encoding/collation was your DB created with? Was it UTF8 / > 'Polish_Poland.1250' ? Yes. I used: bin\initdb -D data3 -E UTF-8 --locale=Polish_Poland.1250 and CREATE DATABASE spb WITH OWNER = "Administrator" ENCODING = 'UTF8' LC_COLLATE = 'Polish_Poland.1250' LC_CTYPE = 'Polish_Poland.1250' CONNECTION LIMIT = -1; -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Re: BUG #5290: Simple loop with insert into and check to avoid duplicate values fails
От
"Milen A. Radev"
Дата:
Heikki Linnakangas напиÑа: > Wild Wezyr wrote: >> 2010/1/20 Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> >> >>> I happen to have access to a Win32 virtual machine just now. CVS HEAD, >>> built from sources on the VM. >>> >>> Seems to work fine. The test case runs for ages, I'm at about 1/3 >>> through it, and no errors this far. I'm going to have to kill it now. >> What OS you have on this Win32 box? Is it Windows Vista (like mine)? > > The login screen says: > > Windows Server 2003 R2, Datacenter x64 Edition > >> What version of Postgres you have used? Is it 8.3.3 or 8.4.2 like mine? > > A fresh checkout from CVS HEAD. > >> What encoding/collation was your DB created with? Was it UTF8 / >> 'Polish_Poland.1250' ? > > Yes. I used: > > bin\initdb -D data3 -E UTF-8 --locale=Polish_Poland.1250 > > and > > CREATE DATABASE spb > WITH OWNER = "Administrator" > ENCODING = 'UTF8' > LC_COLLATE = 'Polish_Poland.1250' > LC_CTYPE = 'Polish_Poland.1250' > CONNECTION LIMIT = -1; > I reproduced it on Windows Vista 64-bit SP2 with version 8.4.2 (installed by using the One-click installer from EnterpriseDB): ... NOTICE: i = 642200 NOTICE: i = 642300 NOTICE: i = 642400 NOTICE: i = 642500 ERROR: duplicate key value violates unique constraint "spb_word_word_key" CONTEXT: SQL statement "insert into spb_word (word) values ( $1 )" PL/pgSQL function "spb_runmesimple2" line 15 at SQL statement -- Milen A. Radev
2010/1/20 Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> > > I happen to have access to a Win32 virtual machine just now. CVS HEAD, > built from sources on the VM. > > Seems to work fine. The test case runs for ages, I'm at about 1/3 > through it, and no errors this far. I'm going to have to kill it now. > > -- > Heikki Linnakangas > EnterpriseDB http://www.enterprisedb.com > What OS you have on this Win32 box? Is it Windows Vista (like mine)? What version of Postgres you have used? Is it 8.3.3 or 8.4.2 like mine? What encoding/collation was your DB created with? Was it UTF8 / 'Polish_Poland.1250' ? As I posted - standard 'C' collation gives no errors, so it is important to use 'Polish_Poland.1250' (this is one of three choices I have on my box: 'C' / 'Polish_Poland.1250' / 'Posix'). If my code runs for long with no error - it is best to interrupt and than run it again - it gives errors in unpredictable manner (but only for 'Polish_Poland.1250' collation)... (WW)
2010/1/21 Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> > The login screen says: > > Windows Server 2003 R2, Datacenter x64 Edition > > > What version of Postgres you have used? Is it 8.3.3 or 8.4.2 like mine? > > A fresh checkout from CVS HEAD. > > > What encoding/collation was your DB created with? Was it UTF8 / > > 'Polish_Poland.1250' ? > > Yes. I used: > > bin\initdb -D data3 -E UTF-8 --locale=Polish_Poland.1250 > > and > > CREATE DATABASE spb > WITH OWNER = "Administrator" > ENCODING = 'UTF8' > LC_COLLATE = 'Polish_Poland.1250' > LC_CTYPE = 'Polish_Poland.1250' > CONNECTION LIMIT = -1; > > -- > Heikki Linnakangas > EnterpriseDB http://www.enterprisedb.com > So - it seems like a problem with exactly versions 8.3.3 and 8.4.2 (I don't know what version exactly you have after CSV HEAD) and Windows Vista. And - what now? Is it over - because you cannot reproduce my error or I may count on fix for this error? What else can I do - upgrade to newer version of Postgres? Obviously - resolution of my problem is quite urgent to me... (WW)
"Milen A. Radev" <milen@radev.net> writes: > I reproduced it on Windows Vista 64-bit SP2 with version 8.4.2 > (installed by using the One-click installer from EnterpriseDB): OK, so that means WildWezyr isn't just seeing things. Possibilities that occur to me now: 1. Maybe it's specific to Vista and/or Win64? 2. I wonder whether setting LC_COLLATE/LC_CTYPE during database creation is really sufficient to establish the locale fully on Windows. If wcscoll() is partially dependent on some other bit of state, maybe that would explain the problem. What is the "surrounding" locale in your test case? regards, tom lane
Re: BUG #5290: Simple loop with insert into and check to avoid duplicate values fails
От
"Milen A. Radev"
Дата:
Tom Lane напиÑа: > "Milen A. Radev" <milen@radev.net> writes: >> I reproduced it on Windows Vista 64-bit SP2 with version 8.4.2 >> (installed by using the One-click installer from EnterpriseDB): > > OK, so that means WildWezyr isn't just seeing things. Possibilities > that occur to me now: > > 1. Maybe it's specific to Vista and/or Win64? > > 2. I wonder whether setting LC_COLLATE/LC_CTYPE during database creation > is really sufficient to establish the locale fully on Windows. If > wcscoll() is partially dependent on some other bit of state, maybe that > would explain the problem. What is the "surrounding" locale in your > test case? [...] For the test I changed all regional settings to "Polish (Poland)", restarted the machine, created a new cluster and a new database (using the commands from the last mail from Heikki). A side-note: tested exactly the same function but on "Bulgarian (Bulgaria)" system locale and cluster and database created with "Bulgarian_Bulgaria.1251" locale - no errors. -- Milen A. Radev
On Jan 21, 6:17=C2=A0pm, mi...@radev.net ("Milen A. Radev") wrote: > > [...] > > For the test I changed all regional settings to "Polish (Poland)", > restarted the machine, created a new cluster and a new database (using > the commands from the last mail from Heikki). > > A side-note: tested exactly the same function but on "Bulgarian > (Bulgaria)" system locale and cluster and database created with > "Bulgarian_Bulgaria.1251" locale - no errors. > > -- > Milen A. Radev > No error for Bulgarian locale might be related to absence of Bulgarian national characters in words generated by my test code. I've put only polish characters (=C4=85=C4=87=C4=99=C5=82=C5=84=C3=B3=C5=9B=C5=BA=C5=BC) = - and it fails with polish collation, but after removing polish characters - i get no error with polish collation. I thing it may behave exactly the same for Bulgarian - to produce errors you must modify letters to include Bulgarian chars (line: letters varchar(255) :=3D '=C4=85=C4=87=C4=99=C5=82=C5=84=C3=B3=C5= =9B=C5=BA=C5=BCjklmnopqrstuvwxyz'; --<-- put these letters here). Have you tried this?
Re: BUG #5290: Simple loop with insert into and check to avoid duplicate values fails
От
"Milen A. Radev"
Дата:
WildWezyr напиÑа: > On Jan 21, 6:17 pm, mi...@radev.net ("Milen A. Radev") wrote: >> [...] >> >> For the test I changed all regional settings to "Polish (Poland)", >> restarted the machine, created a new cluster and a new database (using >> the commands from the last mail from Heikki). >> >> A side-note: tested exactly the same function but on "Bulgarian >> (Bulgaria)" system locale and cluster and database created with >> "Bulgarian_Bulgaria.1251" locale - no errors. >> >> -- >> Milen A. Radev >> > > No error for Bulgarian locale might be related to absence of Bulgarian > national characters in words generated by my test code. I've put only > polish characters (Ä ÄÄÅÅóÅźż) - and it fails with polish collation, > but after removing polish characters - i get no error with polish > collation. I thing it may behave exactly the same for Bulgarian - to > produce errors you must modify letters to include Bulgarian chars > (line: letters varchar(255) := 'Ä ÄÄÅÅóÅźżjklmnopqrstuvwxyz'; --<-- > put these letters here). > > Have you tried this? > Yes, actually I tried it yesterday but just to be sure I tried it again today - no errors: .... NOTICE: i = 9999800 NOTICE: i = 9999900 NOTICE: i = 10000000 Total query runtime: 2231947 ms. 1 row retrieved. The "letters" variable used in the test (in case you have the time to try it yourself): letters varchar(255) := 'ÑÑÑÑÑÑÑÑÑjklmnopqrstuvwxyz'; And the database definition: CREATE DATABASE spb WITH OWNER = postgres ENCODING = 'UTF8' LC_COLLATE = 'Bulgarian_Bulgaria.1251' LC_CTYPE = 'Bulgarian_Bulgaria.1251' CONNECTION LIMIT = -1; The system locale is "Bulgarian (Bulgaria)". -- Milen A. Radev
On Jan 21, 6:17=A0pm, mi...@radev.net ("Milen A. Radev") wrote: > [...] > > For the test I changed all regional settings to "Polish (Poland)", > restarted the machine, created a new cluster and a new database (using > the commands from the last mail from Heikki). > > A side-note: tested exactly the same function but on "Bulgarian > (Bulgaria)" system locale and cluster and database created with > "Bulgarian_Bulgaria.1251" locale - no errors. > > -- > Milen A. Radev Strange (?), but when I substituted equality to like in this line: select id into wordId from spb_word where word like w; -- was: ... word =3D w; there is no error with polish locale. (WW)
Re: BUG #5290: Simple loop with insert into and check to avoid duplicate values fails
От
"Milen A. Radev"
Дата:
Tom Lane напиÑа: > "Milen A. Radev" <milen@radev.net> writes: >> I reproduced it on Windows Vista 64-bit SP2 with version 8.4.2 [...] > 1. Maybe it's specific to Vista and/or Win64? [...] Also reproduced it on 32-bit Vista SP2 (with Postgres 8.4.2). Couldn't reproduced it on neither Windows 7 32-bit RC nor Windows XP SP3 32-bit. I have no access to 64-bit Windows XP or 64-bit Windows 7. -- Milen A. Radev