Обсуждение: BUG #16634: Conflicting names of indexes for partitioned tables
The following bug has been logged on the website: Bug reference: 16634 Logged by: Michał Albrycht Email address: michalalbrycht@gmail.com PostgreSQL version: 12.4 Operating system: Ubuntu 18 Description: While working on migration process from non-partitioned table to table with partitions I have encountered strange behavior. I created new partitioned table, copied data and the tried to recreate indexes. Everything was working fine when I was creating all indexes from single thread. But when I tried to parallelize work across more threads I got an error: ERROR: duplicate key value violates unique constraint "pg_class_relname_nsp_index" DETAIL: Key (relname, relnamespace)=(child_1_lower_idx, 2200) already exists. Which is strange as I'm sure that all my indexes has unique names. I spent few hours to understand what is happening and I was able to make small example that reproduce that behavior. Let's create table with partitions: CREATE TABLE parent ( id BIGINT, name TEXT, surname TEXT ) PARTITIONED BY hash(id); CREATE TABLE child_1 partition of parent FOR VALUES WITH (MODULUS 3, REMAINDER 0); CREATE TABLE child_2 partition of parent FOR VALUES WITH (MODULUS 3, REMAINDER 1); CREATE TABLE child_3 partition of parent FOR VALUES WITH (MODULUS 3, REMAINDER 2); Then let's create indexes. First session: BEGIN; CREATE INDEX parent_lower_name_idx ON parent (LOWER(name)); SELECT pg_sleep(20); END; Second session: BEGIN; CREATE INDEX parent_lower_surname_idx ON parent (LOWER(name)); SELECT pg_sleep(20); END; When you run both of those sessions at the same time you should get an error like mentioned above. I think that whole problem is caused by auto-generated name for index for partition tables. For parent table postgres is using name provided by user, but for children tables it generates names automatically from name of the table and fields that are used for index. But when we create index on function it uses just function name. So for both indexes it generates name like `child_1_lower_idx` despite the fact that one index is using `lower(name)` and second one is using `lower(surname)`. Why did it work with single thread? Apparently there is mechanism that tries to solve conflicts as first index will get name: child_1_lower_idx and second one will get child_1_lower_idx1 but for some reason this will not work when indexes are created in parallel sessions. Final question: Is that intended? Wouldn't it be better to make it use function_name + column name which would result with `child_1_lower_name_idx` and `child_1_lower_surname_idx` names? Is there a way to work around this problem?
PG Bug reporting form <noreply@postgresql.org> writes: > I think that whole problem is caused by auto-generated name for index for > partition tables. For parent table postgres is using name provided by user, > but for children tables it generates names automatically from name of the > table and fields that are used for index. But when we create index on > function it uses just function name. So for both indexes it generates name > like `child_1_lower_idx` despite the fact that one index is using > `lower(name)` and second one is using `lower(surname)`. > Why did it work with single thread? Apparently there is mechanism that tries > to solve conflicts as first index will get name: child_1_lower_idx and > second one will get child_1_lower_idx1 but for some reason this will not > work when indexes are created in parallel sessions. Well, it won't work reliably anyway; that's inherent to the problem. There's nothing here that's specific to partitioned tables, and I'm afraid the answer is "don't do that". It's not very hard to find other examples where parallel creation of objects can hit conflicts. regards, tom lane
I don't understand why you say that it has nothing to do with partitioning. From my perspective it has a lot to do with partitioning. On a non-partitioned table I can create 2 indexes in parallel without any problems. But for partitioned tables I have to know how Postgres is generating names for indexes on partitions, know that it is not good at generating unique names and check all my indexes to figure out whether I can create them in parallel or not. Of course you can say that you shouldn't create objects in parallel, but I was inspired by pg_repack which does exactly this.
Regards,
Michał Albrycht
czw., 24 wrz 2020 o 16:08 Tom Lane <tgl@sss.pgh.pa.us> napisał(a):
PG Bug reporting form <noreply@postgresql.org> writes:
> I think that whole problem is caused by auto-generated name for index for
> partition tables. For parent table postgres is using name provided by user,
> but for children tables it generates names automatically from name of the
> table and fields that are used for index. But when we create index on
> function it uses just function name. So for both indexes it generates name
> like `child_1_lower_idx` despite the fact that one index is using
> `lower(name)` and second one is using `lower(surname)`.
> Why did it work with single thread? Apparently there is mechanism that tries
> to solve conflicts as first index will get name: child_1_lower_idx and
> second one will get child_1_lower_idx1 but for some reason this will not
> work when indexes are created in parallel sessions.
Well, it won't work reliably anyway; that's inherent to the problem.
There's nothing here that's specific to partitioned tables, and I'm
afraid the answer is "don't do that". It's not very hard to find
other examples where parallel creation of objects can hit conflicts.
regards, tom lane
I just noticed that I have made a typo in second CREATE INDEX statement. It should be
CREATE INDEX parent_lower_surname_idx ON parent (LOWER(surname))
So the first index is on the `name` field and the second one is on `surname`.
Regards,
Michał
czw., 24 wrz 2020 o 22:26 Michał Albrycht <michalalbrycht@gmail.com> napisał(a):
I don't understand why you say that it has nothing to do with partitioning. From my perspective it has a lot to do with partitioning. On a non-partitioned table I can create 2 indexes in parallel without any problems. But for partitioned tables I have to know how Postgres is generating names for indexes on partitions, know that it is not good at generating unique names and check all my indexes to figure out whether I can create them in parallel or not. Of course you can say that you shouldn't create objects in parallel, but I was inspired by pg_repack which does exactly this.Regards,Michał Albrychtczw., 24 wrz 2020 o 16:08 Tom Lane <tgl@sss.pgh.pa.us> napisał(a):PG Bug reporting form <noreply@postgresql.org> writes:
> I think that whole problem is caused by auto-generated name for index for
> partition tables. For parent table postgres is using name provided by user,
> but for children tables it generates names automatically from name of the
> table and fields that are used for index. But when we create index on
> function it uses just function name. So for both indexes it generates name
> like `child_1_lower_idx` despite the fact that one index is using
> `lower(name)` and second one is using `lower(surname)`.
> Why did it work with single thread? Apparently there is mechanism that tries
> to solve conflicts as first index will get name: child_1_lower_idx and
> second one will get child_1_lower_idx1 but for some reason this will not
> work when indexes are created in parallel sessions.
Well, it won't work reliably anyway; that's inherent to the problem.
There's nothing here that's specific to partitioned tables, and I'm
afraid the answer is "don't do that". It's not very hard to find
other examples where parallel creation of objects can hit conflicts.
regards, tom lane