Обсуждение: BUG #5290: Simple loop with insert into and check to avoid duplicate values fails

Поиск
Список
Период
Сортировка

BUG #5290: Simple loop with insert into and check to avoid duplicate values fails

От
"WildWezyr"
Дата:
The following bug has been logged online:

Bug reference:      5290
Logged by:          WildWezyr
Email address:      wi.ld.we.zy.r@gmail.com
PostgreSQL version: 8.4.2
Operating system:   Windows Vista
Description:        Simple loop with insert into and check to avoid
duplicate values fails
Details:

This is simplified version of BUG #5289.

Given this one table:

create sequence spb_word_seq;

create table spb_word (
  id bigint not null primary key default nextval('spb_word_seq'),
  word varchar(410) not null unique
);

and these functions (first simulates generating words, seconds performs main
loop):

    create sequence spb_wordnum_seq;

    create or replace function spb_getWord() returns text as $$
    declare
      rn int;
      letters varchar(255) :=   'ąćęłńóśźżjklmnopqrstuvwxyz';
      llen int := length(letters);
      res text := '';
      wordnum int;
    begin
      select nextval('spb_wordnum_seq') into wordnum;

      rn := 3 * (wordnum + llen * llen * llen);
      rn := (rn + llen) / (rn % llen + 1);
      rn := rn % (rn / 2 + 10);

      loop
        res := res || substring(letters, rn % llen, 1);
        rn := floor(rn / llen);
        exit when rn = 0;
      end loop;

      return res;
    end;
    $$ language plpgsql;

    create or replace function spb_runmeSimple2(cnt int) returns void as $$
    declare
      w varchar(410);
      wordId int;
    begin
      perform setval('spb_wordnum_seq', 1, false);
      truncate table spb_word cascade;

      for i in 1 .. cnt loop

        if i % 100 = 0 then raise notice 'i = %', i; end if;

        select spb_getWord() into w;
        select id into wordId from spb_word where word = w;
        if wordId is null then
          insert into spb_word (word) values (w);
        end if;

      end loop;
    end;
    $$ language plpgsql;

while executing select spb_runmeSimple2(10000000)
I run into sql error:

ERROR:  duplicate key value violates unique constraint "spb_word_word_key"
CONTEXT:  SQL statement "insert into spb_word (word) values ( $1 )"
PL/pgSQL function "spb_runmesimple2" line 15 at SQL statement

after unpredictable number of iterations - iteration number for which
execution will fail changes every time.

If I eliminate polish national chars from function spb_getWord i.e. it will
generate words with plain ascii chars there is no error and everything works
fine.

Re: BUG #5290: Simple loop with insert into and check to avoid duplicate values fails

От
Tom Lane
Дата:
"WildWezyr" <wi.ld.we.zy.r@gmail.com> writes:
> If I eliminate polish national chars from function spb_getWord i.e. it will
> generate words with plain ascii chars there is no error and everything works
> fine.

What that sounds like is a locale/encoding conflict.  Postgres depends
on strcoll() or local equivalent to produce consistent sort results,
and sometimes if strcoll is presented with data that it thinks is
invalidly encoded, it doesn't behave sanely.

What locale setting and database encoding are you using?

            regards, tom lane

Re: BUG #5290: Simple loop with insert into and check to avoid duplicate values fails

От
Wild Wezyr
Дата:
2010/1/19 Tom Lane <tgl@sss.pgh.pa.us>

>
> What that sounds like is a locale/encoding conflict.  Postgres depends
> on strcoll() or local equivalent to produce consistent sort results,
> and sometimes if strcoll is presented with data that it thinks is
> invalidly encoded, it doesn't behave sanely.
>
> What locale setting and database encoding are you using?
>
>                        regards, tom lane
>

Here are my settings:

CREATE DATABASE spb
  WITH OWNER = wwspb
       ENCODING = 'UTF8'
       LC_COLLATE = 'Polish_Poland.1250'
       LC_CTYPE = 'Polish_Poland.1250'
       CONNECTION LIMIT = -1;

Is anything wrong with them? And if so, how should I fix it? Keep in mind
that I must deal with words containing different national characters from
many languages (not only polish).

  (WW)

Re: BUG #5290: Simple loop with insert into and check to avoid duplicate values fails

От
Tom Lane
Дата:
Wild Wezyr <wildwezyr@gmail.com> writes:
> 2010/1/19 Tom Lane <tgl@sss.pgh.pa.us>
>> What locale setting and database encoding are you using?

> Here are my settings:

> CREATE DATABASE spb
>   WITH OWNER = wwspb
>        ENCODING = 'UTF8'
>        LC_COLLATE = 'Polish_Poland.1250'
>        LC_CTYPE = 'Polish_Poland.1250'
>        CONNECTION LIMIT = -1;

I tried the test case on Linux (Fedora 11) with locale pl_PL.utf8,
and unsurprisingly failed to reproduce the problem.  So it's something
specific to Windows.  Can anyone else reproduce it?

            regards, tom lane

Re: BUG #5290: Simple loop with insert into and check to avoid duplicate values fails

От
Wild Wezyr
Дата:
2010/1/20 Tom Lane <tgl@sss.pgh.pa.us>

>
> I tried the test case on Linux (Fedora 11) with locale pl_PL.utf8,
> and unsurprisingly failed to reproduce the problem.  So it's something
> specific to Windows.  Can anyone else reproduce it?
>
>                        regards, tom lane
>


If it is of any help: changing LC_COLLATE and LC_CTYPE to 'C' eliminates the
error.

But when DB is created with LC_COLLATE and LC_CTYPE set to
'Polish_Poland.1250' I get errors (sometimes fast, sometimes I have to wait
for several minutes).

This corellation to locale 'C' / 'Polish_Poland.1250' was tested on two
Windows Vista boxes. Elimination of polish national characters from
generated words for locale 'Polish_Poland.1250' leads to same behavior - no
errors.

  (WW)

Re: BUG #5290: Simple loop with insert into and check to avoid duplicate values fails

От
Heikki Linnakangas
Дата:
Tom Lane wrote:
> Wild Wezyr <wildwezyr@gmail.com> writes:
>> 2010/1/19 Tom Lane <tgl@sss.pgh.pa.us>
>>> What locale setting and database encoding are you using?
>
>> Here are my settings:
>
>> CREATE DATABASE spb
>>   WITH OWNER = wwspb
>>        ENCODING = 'UTF8'
>>        LC_COLLATE = 'Polish_Poland.1250'
>>        LC_CTYPE = 'Polish_Poland.1250'
>>        CONNECTION LIMIT = -1;
>
> I tried the test case on Linux (Fedora 11) with locale pl_PL.utf8,
> and unsurprisingly failed to reproduce the problem.  So it's something
> specific to Windows.  Can anyone else reproduce it?

I happen to have access to a Win32 virtual machine just now. CVS HEAD,
built from sources on the VM.

Seems to work fine. The test case runs for ages, I'm at about 1/3
through it, and no errors this far. I'm going to have to kill it now.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

Re: BUG #5290: Simple loop with insert into and check to avoid duplicate values fails

От
Heikki Linnakangas
Дата:
Wild Wezyr wrote:
> 2010/1/20 Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>
>
>> I happen to have access to a Win32 virtual machine just now. CVS HEAD,
>> built from sources on the VM.
>>
>> Seems to work fine. The test case runs for ages, I'm at about 1/3
>> through it, and no errors this far. I'm going to have to kill it now.
>
> What OS you have on this Win32 box? Is it Windows Vista (like mine)?

The login screen says:

Windows Server 2003 R2, Datacenter x64 Edition

> What version of Postgres you have used? Is it 8.3.3 or 8.4.2 like mine?

A fresh checkout from CVS HEAD.

> What encoding/collation was your DB created with? Was it UTF8 /
> 'Polish_Poland.1250' ?

Yes. I used:

bin\initdb -D data3 -E UTF-8 --locale=Polish_Poland.1250

and

CREATE DATABASE spb
   WITH OWNER = "Administrator"
        ENCODING = 'UTF8'
        LC_COLLATE = 'Polish_Poland.1250'
        LC_CTYPE = 'Polish_Poland.1250'
        CONNECTION LIMIT = -1;

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

Re: BUG #5290: Simple loop with insert into and check to avoid duplicate values fails

От
"Milen A. Radev"
Дата:
Heikki Linnakangas написа:
> Wild Wezyr wrote:
>> 2010/1/20 Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>
>>
>>> I happen to have access to a Win32 virtual machine just now. CVS HEAD,
>>> built from sources on the VM.
>>>
>>> Seems to work fine. The test case runs for ages, I'm at about 1/3
>>> through it, and no errors this far. I'm going to have to kill it now.
>> What OS you have on this Win32 box? Is it Windows Vista (like mine)?
>
> The login screen says:
>
> Windows Server 2003 R2, Datacenter x64 Edition
>
>> What version of Postgres you have used? Is it 8.3.3 or 8.4.2 like mine?
>
> A fresh checkout from CVS HEAD.
>
>> What encoding/collation was your DB created with? Was it UTF8 /
>> 'Polish_Poland.1250' ?
>
> Yes. I used:
>
> bin\initdb -D data3 -E UTF-8 --locale=Polish_Poland.1250
>
> and
>
> CREATE DATABASE spb
>    WITH OWNER = "Administrator"
>         ENCODING = 'UTF8'
>         LC_COLLATE = 'Polish_Poland.1250'
>         LC_CTYPE = 'Polish_Poland.1250'
>         CONNECTION LIMIT = -1;
>



I reproduced it on Windows Vista 64-bit SP2 with version 8.4.2
(installed by using the One-click installer from EnterpriseDB):

...
NOTICE:  i = 642200
NOTICE:  i = 642300
NOTICE:  i = 642400
NOTICE:  i = 642500
ERROR:  duplicate key value violates unique constraint "spb_word_word_key"
CONTEXT:  SQL statement "insert into spb_word (word) values ( $1 )"
PL/pgSQL function "spb_runmesimple2" line 15 at SQL statement




--
Milen A. Radev

Re: BUG #5290: Simple loop with insert into and check to avoid duplicate values fails

От
Wild Wezyr
Дата:
2010/1/20 Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>

>
> I happen to have access to a Win32 virtual machine just now. CVS HEAD,
> built from sources on the VM.
>
> Seems to work fine. The test case runs for ages, I'm at about 1/3
> through it, and no errors this far. I'm going to have to kill it now.
>
> --
>  Heikki Linnakangas
>  EnterpriseDB   http://www.enterprisedb.com
>

What OS you have on this Win32 box? Is it Windows Vista (like mine)?

What version of Postgres you have used? Is it 8.3.3 or 8.4.2 like mine?

What encoding/collation was your DB created with? Was it UTF8 /
'Polish_Poland.1250' ? As I posted - standard 'C' collation gives no errors,
so it is important to use 'Polish_Poland.1250' (this is one of three choices
I have on my box: 'C' / 'Polish_Poland.1250' / 'Posix').

If my code runs for long with no error - it is best to interrupt and than
run it again - it gives errors in unpredictable manner (but only for
'Polish_Poland.1250' collation)...

  (WW)

Re: BUG #5290: Simple loop with insert into and check to avoid duplicate values fails

От
Wild Wezyr
Дата:
2010/1/21 Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>

> The login screen says:
>
> Windows Server 2003 R2, Datacenter x64 Edition
>
> > What version of Postgres you have used? Is it 8.3.3 or 8.4.2 like mine?
>
> A fresh checkout from CVS HEAD.
>
> > What encoding/collation was your DB created with? Was it UTF8 /
> > 'Polish_Poland.1250' ?
>
> Yes. I used:
>
> bin\initdb -D data3 -E UTF-8 --locale=Polish_Poland.1250
>
> and
>
> CREATE DATABASE spb
>   WITH OWNER = "Administrator"
>         ENCODING = 'UTF8'
>        LC_COLLATE = 'Polish_Poland.1250'
>        LC_CTYPE = 'Polish_Poland.1250'
>        CONNECTION LIMIT = -1;
>
> --
>   Heikki Linnakangas
>  EnterpriseDB   http://www.enterprisedb.com
>

So - it seems like a problem with exactly versions 8.3.3 and 8.4.2 (I don't
know what version exactly you have after CSV HEAD) and Windows Vista.

And - what now? Is it over - because you cannot reproduce my error or I may
count on fix for this error? What else can I do - upgrade to newer version
of Postgres? Obviously - resolution of my problem is quite urgent to me...

  (WW)

Re: Re: BUG #5290: Simple loop with insert into and check to avoid duplicate values fails

От
Tom Lane
Дата:
"Milen A. Radev" <milen@radev.net> writes:
> I reproduced it on Windows Vista 64-bit SP2 with version 8.4.2
> (installed by using the One-click installer from EnterpriseDB):

OK, so that means WildWezyr isn't just seeing things.  Possibilities
that occur to me now:

1. Maybe it's specific to Vista and/or Win64?

2. I wonder whether setting LC_COLLATE/LC_CTYPE during database creation
is really sufficient to establish the locale fully on Windows.  If
wcscoll() is partially dependent on some other bit of state, maybe that
would explain the problem.  What is the "surrounding" locale in your
test case?

            regards, tom lane

Re: BUG #5290: Simple loop with insert into and check to avoid duplicate values fails

От
"Milen A. Radev"
Дата:
Tom Lane написа:
> "Milen A. Radev" <milen@radev.net> writes:
>> I reproduced it on Windows Vista 64-bit SP2 with version 8.4.2
>> (installed by using the One-click installer from EnterpriseDB):
>
> OK, so that means WildWezyr isn't just seeing things.  Possibilities
> that occur to me now:
>
> 1. Maybe it's specific to Vista and/or Win64?
>
> 2. I wonder whether setting LC_COLLATE/LC_CTYPE during database creation
> is really sufficient to establish the locale fully on Windows.  If
> wcscoll() is partially dependent on some other bit of state, maybe that
> would explain the problem.  What is the "surrounding" locale in your
> test case?
[...]


For the test I changed all regional settings to "Polish (Poland)",
restarted the machine, created a new cluster and a new database (using
the commands from the last mail from Heikki).


A side-note: tested exactly the same function but on "Bulgarian
(Bulgaria)" system locale and cluster and database created with
"Bulgarian_Bulgaria.1251" locale - no errors.


--
Milen A. Radev

Re: BUG #5290: Simple loop with insert into and check to avoid duplicate values fails

От
WildWezyr
Дата:
On Jan 21, 6:17=C2=A0pm, mi...@radev.net ("Milen A. Radev") wrote:
>
> [...]
>
> For the test I changed all regional settings to "Polish (Poland)",
> restarted the machine, created a new cluster and a new database (using
> the commands from the last mail from Heikki).
>
> A side-note: tested exactly the same function but on "Bulgarian
> (Bulgaria)" system locale and cluster and database created with
> "Bulgarian_Bulgaria.1251" locale - no errors.
>
> --
> Milen A. Radev
>

No error for Bulgarian locale might be related to absence of Bulgarian
national characters in words generated by my test code. I've put only
polish characters (=C4=85=C4=87=C4=99=C5=82=C5=84=C3=B3=C5=9B=C5=BA=C5=BC) =
- and it fails with polish collation,
but after removing polish characters - i get no error with polish
collation. I thing it may behave exactly the same for Bulgarian - to
produce errors you must modify letters to include Bulgarian chars
(line: letters varchar(255) :=3D   '=C4=85=C4=87=C4=99=C5=82=C5=84=C3=B3=C5=
=9B=C5=BA=C5=BCjklmnopqrstuvwxyz'; --<--
put these letters here).

Have you tried this?

Re: BUG #5290: Simple loop with insert into and check to avoid duplicate values fails

От
"Milen A. Radev"
Дата:
WildWezyr написа:
> On Jan 21, 6:17 pm, mi...@radev.net ("Milen A. Radev") wrote:
>> [...]
>>
>> For the test I changed all regional settings to "Polish (Poland)",
>> restarted the machine, created a new cluster and a new database (using
>> the commands from the last mail from Heikki).
>>
>> A side-note: tested exactly the same function but on "Bulgarian
>> (Bulgaria)" system locale and cluster and database created with
>> "Bulgarian_Bulgaria.1251" locale - no errors.
>>
>> --
>> Milen A. Radev
>>
>
> No error for Bulgarian locale might be related to absence of Bulgarian
> national characters in words generated by my test code. I've put only
> polish characters (ąćęłńóśźż) - and it fails with polish collation,
> but after removing polish characters - i get no error with polish
> collation. I thing it may behave exactly the same for Bulgarian - to
> produce errors you must modify letters to include Bulgarian chars
> (line: letters varchar(255) :=   'ąćęłńóśźżjklmnopqrstuvwxyz'; --<--
> put these letters here).
>
> Have you tried this?
>

Yes, actually I tried it yesterday but just to be sure I tried it again
today - no errors:

....
NOTICE:  i = 9999800
NOTICE:  i = 9999900
NOTICE:  i = 10000000

Total query runtime: 2231947 ms.
1 row retrieved.


The "letters" variable used in the test (in case you have the time to
try it yourself):
letters varchar(255) :=   'юяьъщшчцфjklmnopqrstuvwxyz';

And the database definition:

CREATE DATABASE spb
   WITH OWNER = postgres
        ENCODING = 'UTF8'
        LC_COLLATE = 'Bulgarian_Bulgaria.1251'
        LC_CTYPE = 'Bulgarian_Bulgaria.1251'
        CONNECTION LIMIT = -1;


The system locale is "Bulgarian (Bulgaria)".


--
Milen A. Radev

Re: BUG #5290: Simple loop with insert into and check to avoid duplicate values fails

От
WildWezyr
Дата:
On Jan 21, 6:17=A0pm, mi...@radev.net ("Milen A. Radev") wrote:
> [...]
>
> For the test I changed all regional settings to "Polish (Poland)",
> restarted the machine, created a new cluster and a new database (using
> the commands from the last mail from Heikki).
>
> A side-note: tested exactly the same function but on "Bulgarian
> (Bulgaria)" system locale and cluster and database created with
> "Bulgarian_Bulgaria.1251" locale - no errors.
>
> --
> Milen A. Radev


Strange (?), but when I substituted equality to like in this line:

        select id into wordId from spb_word where word like w; --
was: ... word =3D w;

there is no error with polish locale.


  (WW)

Re: BUG #5290: Simple loop with insert into and check to avoid duplicate values fails

От
"Milen A. Radev"
Дата:
Tom Lane написа:
> "Milen A. Radev" <milen@radev.net> writes:
>> I reproduced it on Windows Vista 64-bit SP2 with version 8.4.2
[...]
> 1. Maybe it's specific to Vista and/or Win64?
[...]

Also reproduced it on 32-bit Vista SP2 (with Postgres 8.4.2).

Couldn't reproduced it on neither Windows 7 32-bit RC nor Windows XP SP3
32-bit.

I have no access to 64-bit Windows XP or 64-bit Windows 7.


--
Milen A. Radev