Обсуждение: Strange inconsistency with UPDATE
I am trying to force a column to have lowercase because Postgresql is case-sensitive in queries. For the time being I've made an expression index on lower(KEY). But I would like to have just lower case data and then drop this expression index. However, I see some inconsisent behavior from Postgresql. When I issue an UPDATE command , it shows me a duplicate violation (which could be correct) -- -# update TABLE set ACOLUMN = lower(ACOLUMN); ERROR: duplicate key violates unique constraint "TABLE_ACOLUMN_key" So I try to find out the offending values of this ACOLUMN that become duplicated when lower(ACOLUMN) is issued: -# SELECT lower(ACOLUMN), count(*) FROM TABLE GROUP BY lower(ACOLUMN) HAVING count(*) > 1 ; -------+------- lower | count -------+------- (0 rows) But this doesn't make sense! If there are no columns that get repeated, how can it violate the UNIQUE constraint? I am not sure if the following helps, but I'm including the EXPLAIN on this table. Penny for your thoughts! -PK. -# EXPLAIN SELECT lower(ACOLUMN), count(*) FROM TABLE GROUP BY lower(ACOLUMN) HAVING count(*) > 1 ; QUERY PLAN ------------------------------------------------------------------------ GroupAggregate (cost=1031470.35..1171326.48 rows=4661871 width=10) Filter: (count(*) > 1) -> Sort (cost=1031470.35..1043125.03 rows=4661871 width=10) Sort Key: lower((ACOLUMN)::text) -> Seq Scan on TABLE (cost=0.00..228292.39 rows=4661871 width=10) (5 rows)
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 08/16/07 21:58, Phoenix Kiula wrote: > I am trying to force a column to have lowercase because Postgresql is > case-sensitive in queries. For the time being I've made an expression > index on lower(KEY). But I would like to have just lower case data and > then drop this expression index. > > However, I see some inconsisent behavior from Postgresql. When I issue > an UPDATE command , it shows me a duplicate violation (which could be > correct) -- > > -# update TABLE set ACOLUMN = lower(ACOLUMN); > ERROR: duplicate key violates unique constraint "TABLE_ACOLUMN_key" > > So I try to find out the offending values of this ACOLUMN that become > duplicated when lower(ACOLUMN) is issued: > > -# SELECT lower(ACOLUMN), count(*) FROM TABLE > GROUP BY lower(ACOLUMN) HAVING count(*) > 1 ; > -------+------- > lower | count > -------+------- > (0 rows) > > But this doesn't make sense! If there are no columns that get > repeated, how can it violate the UNIQUE constraint? > > I am not sure if the following helps, but I'm including the EXPLAIN on > this table. Penny for your thoughts! Whatever the issue, you can bet your car that it's not a bug in PostgreSQL, but you who is misunderstanding how PG works. Write a script that loops thru the records one by one, updating only one record per loop iteration. That will find the problem record. - -- Ron Johnson, Jr. Jefferson LA USA Give a man a fish, and he eats for a day. Hit him with a fish, and he goes away for good! -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFGxRSUS9HxQb37XmcRApMyAKCGOmpm5xKkfuWR19OnbXLVZMMbkwCcCHmu 4OOXMnRnaixpp8lSjbrA/5w= =M3jw -----END PGP SIGNATURE-----
On Aug 16, 2007, at 21:58 , Phoenix Kiula wrote: > However, I see some inconsisent behavior from Postgresql. When I issue > an UPDATE command , it shows me a duplicate violation (which could be > correct) -- > > -# update TABLE set ACOLUMN = lower(ACOLUMN); > ERROR: duplicate key violates unique constraint > "TABLE_ACOLUMN_key" > > So I try to find out the offending values of this ACOLUMN that become > duplicated when lower(ACOLUMN) is issued: > > -# SELECT lower(ACOLUMN), count(*) FROM TABLE > GROUP BY lower(ACOLUMN) HAVING count(*) > 1 ; > -------+------- > lower | count > -------+------- > (0 rows) > > But this doesn't make sense! If there are no columns that get > repeated, how can it violate the UNIQUE constraint? I suspect you're not showing us the exact queries you're running. For one, you can't have a table named TABLE (without quotes) in PostgreSQL. # create table TABLE (ACOLUMN text not null unique); ERROR: syntax error at or near "TABLE" LINE 1: create table TABLE (ACOLUMN text not null unique); Perhaps something else you changed when changing the table name, maybe to simplify the appearance of the query, affects the results you're seeing. Things appear to work as expected on my end. test=# select version(); version ------------------------------------------------------------------------ ---------------------------------------------------------------------- PostgreSQL 8.2.4 on powerpc-apple-darwin8.9.0, compiled by GCC powerpc-apple-darwin8-gcc-4.0.1 (GCC) 4.0.1 (Apple Computer, Inc. build 5367) (1 row) test=# create table strings (a_string text primary key); NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "strings_pkey" for table "strings" CREATE TABLE test=# insert into strings (a_string) values ('string'), ('STRING'), ('String'), ('number'); INSERT 0 4 test=# select * from strings; a_string ---------- string STRING String number (4 rows) test=# select lower(a_string), count(*) from strings group by lower (a_string); lower | count --------+------- string | 3 number | 1 (2 rows) test=# select lower(a_string), count(*) from strings group by lower (a_string) having count(*) > 1;; lower | count --------+------- string | 3 (1 row) test=# update strings set a_string = lower(a_string); ERROR: duplicate key violates unique constraint "strings_pkey" Another possibility is the setting of LC_COLLATE used during initdb, but I would think that lower() would be self-consistent under all collations. You might want to check if you're using a collation other than C though. test=# show lc_collate; lc_collate ------------ C (1 row) Hope this gives additional information to help you debug this. Michael Glaesemann grzm seespotcode net
"Phoenix Kiula" <phoenix.kiula@gmail.com> writes: > However, I see some inconsisent behavior from Postgresql. When I issue > an UPDATE command , it shows me a duplicate violation (which could be > correct) -- > -# update TABLE set ACOLUMN = lower(ACOLUMN); > ERROR: duplicate key violates unique constraint "TABLE_ACOLUMN_key" > So I try to find out the offending values of this ACOLUMN that become > duplicated when lower(ACOLUMN) is issued: > -# SELECT lower(ACOLUMN), count(*) FROM TABLE > GROUP BY lower(ACOLUMN) HAVING count(*) > 1 ; > -------+------- > lower | count > -------+------- > (0 rows) Yeah, that *is* pretty bizarre. We have seen some cases where strcoll() yields inconsistent answers (leading to arbitrarily silly behavior on Postgres' part) if it is expecting a character set encoding different from what Postgres is using. What is your lc_collate setting, and are you sure it matches the database encoding? Another possibility is that there's something corrupt about the TABLE_ACOLUMN_key index ... does reindexing it change the outcome? regards, tom lane
> I suspect you're not showing us the exact queries you're running. For > one, you can't have a table named TABLE (without quotes) in PostgreSQL. Of course. The data is a tad private, hence the simple table and column names represented in uppercase. > Perhaps something else you changed when changing the table name, > maybe to simplify the appearance of the query, affects the results > you're seeing. Things appear to work as expected on my end. Except for the name of the table and the column, everything is exactly the same as the queries I'm running. I just pasted those things into a text editor and find-replaced mytable to "TABLE" and mycolumn to "COLUMN". (Note: mytable and mycolumn are also aliases for posting to a public forum). The problem remains. The other tip -- to loop through each record and update it -- is precisely what I was trying to escape with a simple SQL query that seems fairly standard.
On 17/08/07, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Phoenix Kiula" <phoenix.kiula@gmail.com> writes: > > However, I see some inconsisent behavior from Postgresql. When I issue > > an UPDATE command , it shows me a duplicate violation (which could be > > correct) -- > > > -# update TABLE set ACOLUMN = lower(ACOLUMN); > > ERROR: duplicate key violates unique constraint "TABLE_ACOLUMN_key" > > > So I try to find out the offending values of this ACOLUMN that become > > duplicated when lower(ACOLUMN) is issued: > > > -# SELECT lower(ACOLUMN), count(*) FROM TABLE > > GROUP BY lower(ACOLUMN) HAVING count(*) > 1 ; > > -------+------- > > lower | count > > -------+------- > > (0 rows) > > Yeah, that *is* pretty bizarre. > > We have seen some cases where strcoll() yields inconsistent answers > (leading to arbitrarily silly behavior on Postgres' part) if it is > expecting a character set encoding different from what Postgres is > using. What is your lc_collate setting, and are you sure it matches > the database encoding? - lc_collate is "en_US.UTF-8" - database encoding is "utf-8" > Another possibility is that there's something corrupt about the > TABLE_ACOLUMN_key index ... does reindexing it change the outcome? Hmm, I can check. It's just a unique index. Should I drop it and recreate? Last time I ran a massive "UPDATE mytable SET mycol = lower(mycol)" query on about 6 millions records, the database seemed to be locked for eternity. Nothing could insert into it, nor update something else. Is this what is discussed on this list as "deadlock"? How can I avoid this if I were to reindex and such? Thanks!