Обсуждение: BUG #16634: Conflicting names of indexes for partitioned tables

Поиск
Список
Период
Сортировка

BUG #16634: Conflicting names of indexes for partitioned tables

От
PG Bug reporting form
Дата:
The following bug has been logged on the website:

Bug reference:      16634
Logged by:          Michał Albrycht
Email address:      michalalbrycht@gmail.com
PostgreSQL version: 12.4
Operating system:   Ubuntu 18
Description:

While working on migration process from non-partitioned table to table with
partitions I have encountered strange behavior. I created new partitioned
table, copied data and the tried to recreate indexes. Everything was working
fine when I was creating all indexes from single thread. But when I tried to
parallelize work across more threads I got an error:

ERROR:  duplicate key value violates unique constraint
"pg_class_relname_nsp_index"
DETAIL:  Key (relname, relnamespace)=(child_1_lower_idx, 2200) already
exists.

Which is strange as I'm sure that all my indexes has unique names.

I spent few hours to understand what is happening and I was able to make
small example that reproduce that behavior.

Let's create table with partitions:

CREATE TABLE parent (
    id BIGINT,
    name TEXT,
    surname TEXT
) PARTITIONED BY hash(id);


CREATE TABLE child_1 partition of parent FOR VALUES WITH (MODULUS 3,
REMAINDER 0);
CREATE TABLE child_2 partition of parent FOR VALUES WITH (MODULUS 3,
REMAINDER 1);
CREATE TABLE child_3 partition of parent FOR VALUES WITH (MODULUS 3,
REMAINDER 2);


Then let's create indexes. First session:

BEGIN;
CREATE INDEX parent_lower_name_idx ON parent (LOWER(name));
SELECT pg_sleep(20);
END;


Second session:

BEGIN;
CREATE INDEX parent_lower_surname_idx ON parent (LOWER(name));
SELECT pg_sleep(20);
END;

When you run both of those sessions at the same time you should get an error
like mentioned above.

I think that whole problem is caused by auto-generated name for index for
partition tables. For parent table postgres is using name provided by user,
but for children tables it generates names automatically from name of the
table and fields that are used for index. But when we create index on
function it uses just function name. So for both indexes it generates name
like `child_1_lower_idx` despite the fact that one index is using
`lower(name)` and second one is using `lower(surname)`.
Why did it work with single thread? Apparently there is mechanism that tries
to solve conflicts as first index will get name: child_1_lower_idx and
second one will get child_1_lower_idx1 but for some reason this will not
work when indexes are created in parallel sessions.

Final question: Is that intended? Wouldn't it be better to make it use
function_name + column name which would result with `child_1_lower_name_idx`
and `child_1_lower_surname_idx` names? Is there a way to work around this
problem?


Re: BUG #16634: Conflicting names of indexes for partitioned tables

От
Tom Lane
Дата:
PG Bug reporting form <noreply@postgresql.org> writes:
> I think that whole problem is caused by auto-generated name for index for
> partition tables. For parent table postgres is using name provided by user,
> but for children tables it generates names automatically from name of the
> table and fields that are used for index. But when we create index on
> function it uses just function name. So for both indexes it generates name
> like `child_1_lower_idx` despite the fact that one index is using
> `lower(name)` and second one is using `lower(surname)`.
> Why did it work with single thread? Apparently there is mechanism that tries
> to solve conflicts as first index will get name: child_1_lower_idx and
> second one will get child_1_lower_idx1 but for some reason this will not
> work when indexes are created in parallel sessions.

Well, it won't work reliably anyway; that's inherent to the problem.
There's nothing here that's specific to partitioned tables, and I'm
afraid the answer is "don't do that".  It's not very hard to find
other examples where parallel creation of objects can hit conflicts.

            regards, tom lane



Re: BUG #16634: Conflicting names of indexes for partitioned tables

От
Michał Albrycht
Дата:
I don't understand why you say that it has nothing to do with partitioning. From my perspective it has a lot to do with partitioning. On a non-partitioned table I can create 2 indexes in parallel without any problems. But for partitioned tables I have to know how Postgres is generating names for indexes on partitions, know that it is not good at generating unique names and check all my indexes to figure out whether I can create them in parallel or not. Of course you can say that you shouldn't create objects in parallel, but I was inspired by pg_repack which does exactly this.

Regards,

Michał Albrycht

czw., 24 wrz 2020 o 16:08 Tom Lane <tgl@sss.pgh.pa.us> napisał(a):
PG Bug reporting form <noreply@postgresql.org> writes:
> I think that whole problem is caused by auto-generated name for index for
> partition tables. For parent table postgres is using name provided by user,
> but for children tables it generates names automatically from name of the
> table and fields that are used for index. But when we create index on
> function it uses just function name. So for both indexes it generates name
> like `child_1_lower_idx` despite the fact that one index is using
> `lower(name)` and second one is using `lower(surname)`.
> Why did it work with single thread? Apparently there is mechanism that tries
> to solve conflicts as first index will get name: child_1_lower_idx and
> second one will get child_1_lower_idx1 but for some reason this will not
> work when indexes are created in parallel sessions.

Well, it won't work reliably anyway; that's inherent to the problem.
There's nothing here that's specific to partitioned tables, and I'm
afraid the answer is "don't do that".  It's not very hard to find
other examples where parallel creation of objects can hit conflicts.

                        regards, tom lane

Re: BUG #16634: Conflicting names of indexes for partitioned tables

От
Michał Albrycht
Дата:
I just noticed that I have made a typo in second CREATE INDEX statement. It should be 
CREATE INDEX parent_lower_surname_idx ON parent (LOWER(surname))
So the first index is on the `name` field and the second one is on `surname`.
Regards,

Michał

czw., 24 wrz 2020 o 22:26 Michał Albrycht <michalalbrycht@gmail.com> napisał(a):
I don't understand why you say that it has nothing to do with partitioning. From my perspective it has a lot to do with partitioning. On a non-partitioned table I can create 2 indexes in parallel without any problems. But for partitioned tables I have to know how Postgres is generating names for indexes on partitions, know that it is not good at generating unique names and check all my indexes to figure out whether I can create them in parallel or not. Of course you can say that you shouldn't create objects in parallel, but I was inspired by pg_repack which does exactly this.

Regards,

Michał Albrycht

czw., 24 wrz 2020 o 16:08 Tom Lane <tgl@sss.pgh.pa.us> napisał(a):
PG Bug reporting form <noreply@postgresql.org> writes:
> I think that whole problem is caused by auto-generated name for index for
> partition tables. For parent table postgres is using name provided by user,
> but for children tables it generates names automatically from name of the
> table and fields that are used for index. But when we create index on
> function it uses just function name. So for both indexes it generates name
> like `child_1_lower_idx` despite the fact that one index is using
> `lower(name)` and second one is using `lower(surname)`.
> Why did it work with single thread? Apparently there is mechanism that tries
> to solve conflicts as first index will get name: child_1_lower_idx and
> second one will get child_1_lower_idx1 but for some reason this will not
> work when indexes are created in parallel sessions.

Well, it won't work reliably anyway; that's inherent to the problem.
There's nothing here that's specific to partitioned tables, and I'm
afraid the answer is "don't do that".  It's not very hard to find
other examples where parallel creation of objects can hit conflicts.

                        regards, tom lane