Обсуждение: Weird behavior with unique constraint not respected, and random results on same queries

Поиск
Список
Период
Сортировка

Weird behavior with unique constraint not respected, and random results on same queries

От
Thomas SIMON
Дата:
Hi all,

I encounter strange behavior since a few days, and the promote of a 
recently installed server as master in my infrastructure (debian 10, was 
added as slave a few days ago)
I use postgresql 12.4 on master an slave.

I have unicity constraint on an items table, based on 2 fields :
     "items_account_id_key" UNIQUE CONSTRAINT, btree (account_id, sku)

Since the promote, I find on database some duplicated entries , despite 
the fact that the constraint should not allow this.
It seems to only happens with entries having "special" characters 
(understand " ", "-", "+", ...)

Example of duplicated entry:
account_id sku item_id
1234 "IP6S+64SILHA+" 45231
1234 "IP6S+64SILHA+" 478212

I've tried to insert manually other "duplicated" data, it sometimes 
works, and sometimes I got (normal) error of duplicated key entry.

(Other) strange thing is when I request database with only sku field, I 
got all duplicated entries (for example, say 10)
SELECT account_id,sku
FROM items
WHERE sku = 'IP6S+64SILHA+'
--> 10results

When I request database with account_id and sku, some entries aren't 
returned (of course, all these values belongs to requested account_id)
SELECT account_id,sku
FROM items
WHERE sku = 'IP6S+64SILHA+'
AND account_id = 1234;
--> 1 result

Using this same request a few time later, I had 9 results... (on the 10 
"real" entries in db)

And sometimes new added entries does not show up :

 > SELECT *
   FROM items
   WHERE account_id = 1234
   and sku = 'IP6S+64GRLMB'
[2020-10-12 13:37:32] 0 rows retrieved in 110 ms (execution: 30 ms, 
fetching: 80 ms)
 > INSERT INTO public.items (item_id, account_id, item_id) VALUES (1234, 
'IP6S+64GRLMB', 45231)
[2020-10-12 13:38:01] 1 row affected in 76 ms
 > SELECT *
   FROM items
   WHERE account_id = 1234
   and sku = 'IP6S+64GRLMB'
[2020-10-12 13:38:01] 0 rows retrieved in 66 ms (execution: 25 ms, 
fetching: 41 ms)

I've checked locales on new server, thinking that it could be related 
with the "special" characters problem, but I didn't see something 
relevant. (I'm using en_US.UTF-8).
I guess it is related to this new server, but on postgres side, I use 
same config as old master excepted some memory parameters, and 
pg_stat_statement acvivation, and on OS side, I don't know what to look for.

I'm a little overwhelmed by all of this, do you have any idea what the 
problem is?

Best regards
thanks
thomas




Re: Weird behavior with unique constraint not respected, and random results on same queries

От
Magnus Hagander
Дата:


On Mon, Oct 12, 2020 at 5:43 PM Thomas SIMON <tsimon@neteven.com> wrote:
Hi all,

I encounter strange behavior since a few days, and the promote of a
recently installed server as master in my infrastructure (debian 10, was
added as slave a few days ago)
I use postgresql 12.4 on master an slave.

Is the previous master also Debian 10?

Specifically, when upgrading to Debian 10 you need to reindex all indexes on text fields (see https://wiki.postgresql.org/wiki/Locale_data_changes). This can in particular cause problems in replication, whereby your primary and standby nodes have to run the same version of Debian since you cannot do a separate reindex on the standby.

E.g. if your master is Debian 9, then you can expect some lookups to be incorrect on the standby, and after you do a failover you may see entries with invalid values in a unique index go in because of the lack of reindex.


Since the promote, I find on database some duplicated entries , despite
the fact that the constraint should not allow this.
It seems to only happens with entries having "special" characters
(understand " ", "-", "+", ...)

As you can see from the locale page above, the changes are specifically around special characters, so this is an indication that it might be this problem.

If this is the problem, the fix is a REINDEX and to make sure that all nodes in the replication cluster use the same version of the collections in the future. If the reindex fails because of existing duplicate entries, you will have to find a way to clean those up before reindexing. You may have to drop the indexes, then clean up, and then recreate them.

-- 

Re: Weird behavior with unique constraint not respected, and random results on same queries

От
Thomas SIMON
Дата:

Hi Magnus,

thank you for you quick and relevant answer, you saved my day !

Le 12/10/2020 à 17:53, Magnus Hagander a écrit :


On Mon, Oct 12, 2020 at 5:43 PM Thomas SIMON <tsimon@neteven.com> wrote:
Hi all,

I encounter strange behavior since a few days, and the promote of a
recently installed server as master in my infrastructure (debian 10, was
added as slave a few days ago)
I use postgresql 12.4 on master an slave.

Is the previous master also Debian 10?
Yes old master was debian 9

Specifically, when upgrading to Debian 10 you need to reindex all indexes on text fields (see https://wiki.postgresql.org/wiki/Locale_data_changes). This can in particular cause problems in replication, whereby your primary and standby nodes have to run the same version of Debian since you cannot do a separate reindex on the standby.

E.g. if your master is Debian 9, then you can expect some lookups to be incorrect on the standby, and after you do a failover you may see entries with invalid values in a unique index go in because of the lack of reindex.
I haven't found this note before, ans it seems indeed to describe the exact problem that we have...
I just upgraded old master to avoid these kind of problems.


Since the promote, I find on database some duplicated entries , despite
the fact that the constraint should not allow this.
It seems to only happens with entries having "special" characters
(understand " ", "-", "+", ...)

As you can see from the locale page above, the changes are specifically around special characters, so this is an indication that it might be this problem.
Exactly yes

If this is the problem, the fix is a REINDEX and to make sure that all nodes in the replication cluster use the same version of the collections in the future. If the reindex fails because of existing duplicate entries, you will have to find a way to clean those up before reindexing. You may have to drop the indexes, then clean up, and then recreate them.

Yes, we are working on a script which will delete all duplicate values, and then we will use reindex with concurrently option on all listed lindexs

We'll do these actions tomorrow morning, and I'll keep you updated about the result.

Re: Weird behavior with unique constraint not respected, and random results on same queries

От
Thomas SIMON
Дата:

Hi Magnus,

Just for the record, you got it right with locale data change, after index rebuild, we no longer have duplicates.

thanks again, have a good day

thomas

Le 12/10/2020 à 19:47, Thomas SIMON a écrit :

Hi Magnus,

thank you for you quick and relevant answer, you saved my day !

Le 12/10/2020 à 17:53, Magnus Hagander a écrit :


On Mon, Oct 12, 2020 at 5:43 PM Thomas SIMON <tsimon@neteven.com> wrote:
Hi all,

I encounter strange behavior since a few days, and the promote of a
recently installed server as master in my infrastructure (debian 10, was
added as slave a few days ago)
I use postgresql 12.4 on master an slave.

Is the previous master also Debian 10?
Yes old master was debian 9

Specifically, when upgrading to Debian 10 you need to reindex all indexes on text fields (see https://wiki.postgresql.org/wiki/Locale_data_changes). This can in particular cause problems in replication, whereby your primary and standby nodes have to run the same version of Debian since you cannot do a separate reindex on the standby.

E.g. if your master is Debian 9, then you can expect some lookups to be incorrect on the standby, and after you do a failover you may see entries with invalid values in a unique index go in because of the lack of reindex.
I haven't found this note before, ans it seems indeed to describe the exact problem that we have...
I just upgraded old master to avoid these kind of problems.


Since the promote, I find on database some duplicated entries , despite
the fact that the constraint should not allow this.
It seems to only happens with entries having "special" characters
(understand " ", "-", "+", ...)

As you can see from the locale page above, the changes are specifically around special characters, so this is an indication that it might be this problem.
Exactly yes

If this is the problem, the fix is a REINDEX and to make sure that all nodes in the replication cluster use the same version of the collections in the future. If the reindex fails because of existing duplicate entries, you will have to find a way to clean those up before reindexing. You may have to drop the indexes, then clean up, and then recreate them.

Yes, we are working on a script which will delete all duplicate values, and then we will use reindex with concurrently option on all listed lindexs

We'll do these actions tomorrow morning, and I'll keep you updated about the result.

-- 
Thomas SIMON
Responsable Infrastructures
Neteven