Обсуждение: Duplicate Primary keys in postgresql tables

Поиск
Список
Период
Сортировка

Duplicate Primary keys in postgresql tables

От
De Lan
Дата:
Hi pgsql community,

Recently we found in a postgresql 11.19.0 alpine table there are two rows with duplicate primary keys.

The table schema (anonymized):

# \d+ app_specs;

                                      Table "public.app_specs"

         Column          |  Type   | Collation | Nullable | Default | Storage  | Stats target | Description 

-------------------------+---------+-----------+----------+---------+----------+--------------+-------------

 ns                      | text    |           | not null |         | extended |              | 

 app                     | text    |           | not null |         | extended |              | 

 version                 | text    |           | not null |         | extended |              | 

 app_data                | text    |           |          |         | extended |              | 

 change_summary          | text    |           |          |         | extended |              | 

 created                 | bigint  |           |          |         | plain    |              | 

 latest                  | boolean |           |          |         | plain    |              | 

Indexes:

    "app_specs_pkey" PRIMARY KEY, btree (ns, app, version)

    "pktest" UNIQUE, btree (ns, app, version) INVALID

    "app_specs_created_idx" btree (created)

    "app_specs_latest_idx" btree (latest)




The rows with duplicate primary keys:


# SELECT ns,app,version,created,latest FROM public.app_specs WHERE ns='default' AND app='Quickstart_v2' AND version='-VERSION';

 namespace |       application       |  version  |    created    | latest 

-----------+-------------------------+-----------+---------------+--------

 default   | Quickstart_v2           | -VERSION  |               | 

 default   | Quickstart_v2           | -VERSION  | 1687965022134 | t

(2 rows)


It looks like the second row was originally in place. And during an "upsert" ddl transaction, the first row was inserted instead of updating the original row. We used the below statement for the mentioned upsert operation.

INSERT INTO simpletable (key1,key2,col1,col2,col3) VALUES (?,?,?,?,?) ON
* CONFLICT (key1,key2) DO UPDATE SET col1=EXCLUDED.col1,col2=EXCLUDED.col2,col3=EXCLUDED.col3;


Recreating the primary keys confirms that the constraint doesn't permit these duplicate rows: "ERROR:  duplicate key value violates unique constraint "app_specs_pkey""



Our investigation:


1. We found there were some previous discussions about duplicate primary keys caused by different collations like [Corruption with duplicate primary key(https://www.postgresql.org/message-id/CY4PR03MB269502786425690DA5B2EDBAA9370%40CY4PR03MB2695.namprd03.prod.outlook.com)


2. We checked our system as well. This container of the table was upgraded from a postgres:11.0.0 image to postgres:11.19.0-alpine. The former was based on Debian and the latter is on Alpine. Debian is using: C.UTF-8, en_US.utf8 for collation. The Alpine one uses the musl library version (1.2.3). From https://stackoverflow.com/a/74808009/21798098 it also talks about collation issues between Debian and Alpine.



Our questions:

Any ideas on what might cause this behavior other than the collation? if it is a well-known issue in the pgsql community or it really is the coalition that's the root cause, do we have mitigation for this kind of issue from happening in future?



De,

Thanks!



Re: Duplicate Primary keys in postgresql tables

От
Ken Marshall
Дата:
> 
> *Our questions:*
> 
> Any ideas on what might cause this behavior other than the collation? if it
> is a well-known issue in the pgsql community or it really is the
> coalition that's the root cause, do we have mitigation for this kind of
> issue from happening in future?
> 
> 
> 
> De,
> 
> Thanks!

Hi De,

Unless you are having a hardware issue this sort of problem is almost
always an unexpected change in the collation being used. You may be able
to use the ICU collation that has been added to minimize the impacts of
such a change. I would recommend monitoring your collation and if it
changes alert. Also, you need to verify all of the collations if you are
going to upgrade any OS or container OS. Containers are not magic in
this regards. The fix is to reindex if you have a collation change.

Regards,
Ken



Re: Duplicate Primary keys in postgresql tables

От
Adrian Klaver
Дата:
On 7/22/23 10:11, De Lan wrote:
> Hi pgsql community,
> 
> Recently we found in a postgresql 11.19.0 alpine table there are two 
> rows with duplicate primary keys.
> 

> 
> *Our questions:*
> 
> Any ideas on what might cause this behavior other than the collation? if 
> it is a well-known issue in the pgsql community or it really is the 
> coalition that's the root cause, do we have mitigation for this kind of 
> issue from happening in future?

Don't blindly update containers, test first.

> 
> 
> 
> De,
> 
> Thanks!
> 
> 
> 

-- 
Adrian Klaver
adrian.klaver@aklaver.com




Re: Duplicate Primary keys in postgresql tables

От
De Lan
Дата:
Hi Ken and Adrian,

Thanks for the useful suggestions!

We've investigated a bit more on the collations and we're almost certain it is the RC:

For Debian 11.11:
user=# select 'a' > 'A';  ?column? ----------  f (1 row) user=# select 'a' < 'A';  ?column? ----------  t (1 row)

For Alpine 11.19:
user=# select 'a' > 'A';  ?column? ----------  t (1 row) user=# select 'a' < 'A';  ?column? ----------  f (1 row)


We'll fix the image issue and re-index the affected tables. 

Regards,
De

On Sat, Jul 22, 2023 at 11:02 AM Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 7/22/23 10:11, De Lan wrote:
> Hi pgsql community,
>
> Recently we found in a postgresql 11.19.0 alpine table there are two
> rows with duplicate primary keys.
>

>
> *Our questions:*
>
> Any ideas on what might cause this behavior other than the collation? if
> it is a well-known issue in the pgsql community or it really is the
> coalition that's the root cause, do we have mitigation for this kind of
> issue from happening in future?

Don't blindly update containers, test first.

>
>
>
> De,
>
> Thanks!
>
>
>

--
Adrian Klaver
adrian.klaver@aklaver.com