Обсуждение: Patch: Global Unique Index
Вложения
Hello Do we need new syntax actually? I think that a global unique index can be created automatically instead of raising an error"unique constraint on partitioned table must include all partitioning columns" regards, Sergei
Hello
Do we need new syntax actually? I think that a global unique index can be created automatically instead of raising an error "unique constraint on partitioned table must include all partitioning columns"
regards, Sergei
Sergei Kornilov <sk@zsrv.org> writes: > Do we need new syntax actually? I think that a global unique index can be created automatically instead of raising an error"unique constraint on partitioned table must include all partitioning columns" I'm not convinced that we want this feature at all: as far as I can see, it will completely destroy the benefits of making a partitioned table in the first place. But if we do want it, I don't think it should be so easy to create a global index by accident as that syntax approach would make it. I think there needs to be a pretty clear YES I WANT TO SHOOT MYSELF IN THE FOOT clause in the command. regards, tom lane
Sergei Kornilov <sk@zsrv.org> writes:
> Do we need new syntax actually? I think that a global unique index can be created automatically instead of raising an error "unique constraint on partitioned table must include all partitioning columns"
I'm not convinced that we want this feature at all: as far as I can see,
it will completely destroy the benefits of making a partitioned table
in the first place. But if we do want it, I don't think it should be
so easy to create a global index by accident as that syntax approach
would make it. I think there needs to be a pretty clear YES I WANT TO
SHOOT MYSELF IN THE FOOT clause in the command.
regards, tom lane
On Thu, 17 Nov 2022 at 22:01, Cary Huang <cary.huang@highgo.ca> wrote: > > Patch: Global Unique Index Let me start by expressing severe doubt on the usefulness of such a feature, but also salute your efforts to contribute. > In other words, a global unique index and a regular partitioned index are essentially the same in terms of their storagestructure except that one can do cross-partition uniqueness check, the other cannot. This is the only workable architecture, since it allows DETACH to be feasible, which is essential. You don't seem to mention that this would require a uniqueness check on each partition. Is that correct? This would result in O(N) cost of uniqueness checks, severely limiting load speed. I notice you don't offer any benchmarks on load speed or the overhead associated with this, which is not good if you want to be taken seriously, but at least it is recoverable. (It might be necessary to specify some partitions as READ ONLY, to allow us to record their min/max values for the indexed cols, allowing us to do this more quickly.) > - Supported Features - > 1. Global unique index is supported only on btree index type Why? Surely any index type that supports uniqueness is good. > - Not-supported Features - > 1. Global uniqueness check with Sub partition tables is not yet supported as we do not have immediate use case and it mayinvolve majoy change in current implementation Hmm, sounds like a problem. Arranging the calls recursively should work. > - Create a global unique index - > To create a regular unique index on a partitioned table, Postgres has to perform heap scan and sorting on every child partition.Uniqueness check happens during the sorting phase and will raise an error if multiple tuples with the same indexkey are sorted together. To achieve global uniqueness check, we make Postgres perform the sorting after all of the childpartitions have been scanned instead of on the "sort per partition" fashion. In otherwords, the sorting only happensonce at the very end and it sorts the tuples coming from all the partitions and therefore can ensure global uniqueness. My feeling is that performance on this will suck so badly that we must warn people away from it, and tell people if they want this, create the index at the start and let it load. Hopefully CREATE INDEX CONCURRENTLY still works. Let's see some benchmarks on this also please. You'll need to think about progress reporting early because correctly reporting the progress and expected run times are likely critical for usability. > Example: > > > CREATE TABLE gidx_part (a int, b int, c text) PARTITION BY RANGE (a); > > CREATE TABLE gidx_part1 partition of gidx_part FOR VALUES FROM (0) TO (10); > > CREATE TABLE gidx_part2 partition of gidx_part FOR VALUES FROM (10) TO (20); > > CREATE UNIQUE INDEX global_unique_idx ON gidx_part USING BTREE(b) GLOBAL; > > INSERT INTO gidx_part values(5, 5, 'test'); > > INSERT INTO gidx_part values(15, 5, 'test'); > ERROR: duplicate key value violates unique constraint "gidx_part1_b_idx" > DETAIL: Key (b)=(5) already exists. Well done. > - DETACH - > Since we retain the same partitioned structure, detaching a partition with global unique index is straightforward. UponDETACH, Postgres will change its relkind from RELKIND_GLOBAL_INDEX to RELKIND_INDEX and remove their inheritance relationshipas usual. It's the only way that works > - Optimizer, query planning and vacuum - > Since no major modification is done on global unique index's structure and storage, it works in the same way as a regularpartitioned index. No major change is required to be done on optimizer, planner and vacuum process as they shouldwork in the same way as regular index. Agreed Making a prototype is a great first step. The next step is to understand the good and the bad aspects of it, so you can see what else needs to be done. You need to be honest and real about the fact that this may not actually be desirable in practice, or in a restricted use case. That means performance analysis of create, load, attach, detach, INSERT, SELECT, UPD/DEL and anything else that might be affected, together with algorithmic analysis of what happens for larger N and larger tables. Expect many versions; take provisions for many days. Best of luck -- Simon Riggs http://www.EnterpriseDB.com/
Tom Lane schrieb am 18.11.2022 um 16:06: >> Do we need new syntax actually? I think that a global unique index >> can be created automatically instead of raising an error "unique >> constraint on partitioned table must include all partitioning >> columns" > > I'm not convinced that we want this feature at all: as far as I can > see, it will completely destroy the benefits of making a partitioned > table in the first place. But if we do want it, I don't think it > should be so easy to create a global index by accident as that syntax > approach would make it. I think there needs to be a pretty clear YES > I WANT TO SHOOT MYSELF IN THE FOOT clause in the command. There are many Oracle users that find global indexes useful despite their disadvantages. I have seen this mostly when the goal was to get the benefits of partition pruning at runtime which turned the full table scan (=Seq Scan) on huge tables to partition scans on much smaller partitions. Partition wise joins were also helpful for query performance. The substantially slower drop partition performance was accepted in thos cases I think it would be nice to have the option in Postgres as well. I do agree however, that the global index should not be created automatically. Something like CREATE GLOBAL [UNIQUE] INDEX ... would be a lot better Just my 0.05€
Tom Lane schrieb am 18.11.2022 um 16:06:
>> Do we need new syntax actually? I think that a global unique index
>> can be created automatically instead of raising an error "unique
>> constraint on partitioned table must include all partitioning
>> columns"
>
> I'm not convinced that we want this feature at all: as far as I can
> see, it will completely destroy the benefits of making a partitioned
> table in the first place. But if we do want it, I don't think it
> should be so easy to create a global index by accident as that syntax
> approach would make it. I think there needs to be a pretty clear YES
> I WANT TO SHOOT MYSELF IN THE FOOT clause in the command.
There are many Oracle users that find global indexes useful despite
their disadvantages.
I have seen this mostly when the goal was to get the benefits of
partition pruning at runtime which turned the full table scan (=Seq Scan)
on huge tables to partition scans on much smaller partitions.
Partition wise joins were also helpful for query performance.
The substantially slower drop partition performance was accepted in thos cases
I think it would be nice to have the option in Postgres as well.
I do agree however, that the global index should not be created automatically.
Something like CREATE GLOBAL [UNIQUE] INDEX ... would be a lot better
Just my 0.05€
Pavel Stehule schrieb am 24.11.2022 um 07:03: > There are many Oracle users that find global indexes useful despite > their disadvantages. > > I have seen this mostly when the goal was to get the benefits of > partition pruning at runtime which turned the full table scan (=Seq Scan) > on huge tables to partition scans on much smaller partitions. > Partition wise joins were also helpful for query performance. > The substantially slower drop partition performance was accepted in thos cases > > > I think it would be nice to have the option in Postgres as well. > > I do agree however, that the global index should not be created automatically. > > Something like CREATE GLOBAL [UNIQUE] INDEX ... would be a lot better > > > Is it necessary to use special marks like GLOBAL if this index will > be partitioned, and uniqueness will be ensured by repeated > evaluations? > > Or you think so there should be really forced one relation based > index? > > I can imagine a unique index on partitions without a special mark, > that will be partitioned, and a second variant classic index created > over a partitioned table, that will be marked as GLOBAL. My personal opinion is, that a global index should never be created automatically. The user should consciously decide on using a feature that might have a serious impact on performance in some areas.
On Fri, Nov 18, 2022 at 3:31 AM Cary Huang <cary.huang@highgo.ca> wrote: > > Patch: Global Unique Index > - Optimizer, query planning and vacuum - > Since no major modification is done on global unique index's structure and storage, it works in the same way as a regularpartitioned index. No major change is required to be done on optimizer, planner and vacuum process as they shouldwork in the same way as regular index. It might not need changes in the vacuum to make it work. But this can not be really useful without modifying the vacuum the way it works. I mean currently, the indexes are also partitioned based on the table so whenever we do table vacuum it's fine to do index vacuum but now you will have one gigantic index and which will be vacuumed every time we vacuum any of the partitions. So for example, if you have 10000 partitions then by the time you vacuum the whole table (all 10000 partitions) the global index will be vacuumed 10000 times. There was some effort in past (though it was not concluded) about decoupling the index and heap vacuuming such that instead of doing the index vacuum for each partition we remember the dead tids and we only do the index vacuum when we think there are enough dead items so that the index vacuum makes sense[1]. [1] https://www.postgresql.org/message-id/CA%2BTgmoZgapzekbTqdBrcH8O8Yifi10_nB7uWLB8ajAhGL21M6A%40mail.gmail.com -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Thu, Nov 24, 2022 at 07:03:24AM +0100, Pavel Stehule wrote: > I can imagine a unique index on partitions without a special mark, that > will be partitioned, That exists since v11, as long as the index keys include the partition keys. > and a second variant classic index created over a partitioned table, > that will be marked as GLOBAL. That's not what this patch is about, though. On Thu, Nov 24, 2022 at 08:52:16PM +0530, Dilip Kumar wrote: > but now you will have one gigantic index and which will be vacuumed > every time we vacuum any of the partitions. This patch isn't implemented as "one gigantic index", though. -- Justin
---- On Thu, 24 Nov 2022 08:00:59 -0700 Thomas Kellerer wrote --- > Pavel Stehule schrieb am 24.11.2022 um 07:03: > > There are many Oracle users that find global indexes useful despite > > their disadvantages. > > > > I have seen this mostly when the goal was to get the benefits of > > partition pruning at runtime which turned the full table scan (=Seq Scan) > > on huge tables to partition scans on much smaller partitions. > > Partition wise joins were also helpful for query performance. > > The substantially slower drop partition performance was accepted in thos cases > > > > > > I think it would be nice to have the option in Postgres as well. > > > > I do agree however, that the global index should not be created automatically. > > > > Something like CREATE GLOBAL [UNIQUE] INDEX ... would be a lot better > > > > > > Is it necessary to use special marks like GLOBAL if this index will > > be partitioned, and uniqueness will be ensured by repeated > > evaluations? > > > > Or you think so there should be really forced one relation based > > index? > > > > I can imagine a unique index on partitions without a special mark, > > that will be partitioned, and a second variant classic index created > > over a partitioned table, that will be marked as GLOBAL. > > > My personal opinion is, that a global index should never be created > automatically. > > The user should consciously decide on using a feature > that might have a serious impact on performance in some areas. Agreed, if a unique index is created on non-partition key columns without including the special mark (partition key columns),it may be a mistake from user. (At least I make this mistake all the time). Current PG will give you a warning toinclude the partition keys, which is good. If we were to automatically turn that into a global unique index, user may be using the feature without knowing and experiencingsome performance impacts (to account for extra uniqueness check in all partitions).
On Thu, Nov 24, 2022 at 9:39 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > On Thu, Nov 24, 2022 at 08:52:16PM +0530, Dilip Kumar wrote: > > but now you will have one gigantic index and which will be vacuumed > > every time we vacuum any of the partitions. > > This patch isn't implemented as "one gigantic index", though. If this patch is for supporting a global index then I expect that the global index across all the partitions is going to be big. Anyway, my point was about vacuuming the common index every time you vacuum any of the partitions of the table is not the right way and that will make global indexes less usable. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Fri, Nov 25, 2022 at 8:49 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Thu, Nov 24, 2022 at 9:39 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > > On Thu, Nov 24, 2022 at 08:52:16PM +0530, Dilip Kumar wrote: > > > but now you will have one gigantic index and which will be vacuumed > > > every time we vacuum any of the partitions. > > > > This patch isn't implemented as "one gigantic index", though. > > If this patch is for supporting a global index then I expect that the > global index across all the partitions is going to be big. Anyway, my > point was about vacuuming the common index every time you vacuum any > of the partitions of the table is not the right way and that will make > global indexes less usable. Okay, I got your point. After seeing the details it seems instead of supporting one common index it is just allowing uniqueness checks across multiple index partitions. Sorry for the noise. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Mon, Nov 21, 2022 at 12:33:30PM +0000, Simon Riggs wrote: > On Thu, 17 Nov 2022 at 22:01, Cary Huang <cary.huang@highgo.ca> wrote: > > > > Patch: Global Unique Index > > Let me start by expressing severe doubt on the usefulness of such a > feature, but also salute your efforts to contribute. > > > In other words, a global unique index and a regular partitioned index are essentially the same in terms of their storagestructure except that one can do cross-partition uniqueness check, the other cannot. > > This is the only workable architecture, since it allows DETACH to be > feasible, which is essential. I had trouble understanding this feature so I spent some time thinking about it. I don't think this is really a global unique index, meaning it is not one index with all the value in the index. Rather it is the enforcement of uniqueness across all of a partitioned table's indexes. I think global indexes have a limited enough use-case that this patch's approach is as close as we are going to get to it in the foreseeable future. Second, I outlined the three values of global indexes in this blog entry, based on a 2019 email thread: https://momjian.us/main/blogs/pgblog/2020.html#July_1_2020 https://www.postgresql.org/message-id/CA+Tgmob_J2M2+QKWrhg2NjQEkMEwZNTfd7a6Ubg34fJuZPkN2g@mail.gmail.com The three values are: 1. The ability to reference partitioned tables as foreign keys without requiring the partition key to be part of the foreign key reference; Postgres 12 allows such foreign keys if they match partition keys. 2. The ability to add a uniqueness constraint to a partitioned table where the unique columns are not part of the partition key. 3. The ability to index values that only appear in a few partitions, and are not part of the partition key. This patch should help with #1 and #2, but not #3. The uniqueness guarantee allows, on average, half of the partitioned table's indexes to be checked if there is a match, and all partitioned table's indexes if not. This is because once you find a match, you don't need to keep checking because the value is unique. Looking at the patch, I am unclear how the the patch prevents concurrent duplicate value insertion during the partitioned index checking. I am actually not sure how that can be done without locking all indexes or inserting placeholder entries in all indexes. (Yeah, that sounds bad, unless I am missing something.) -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com Embrace your flaws. They make you human, rather than perfect, which you will never be.
Hi Bruce,
Thank you for helping review the patches in such detail.
Looking at the patch, I am unclear how the the patch prevents concurrent duplicate value insertion during the partitioned index checking. I am actually not sure how that can be done without locking all indexes or inserting placeholder entries in all indexes. (Yeah, that sounds bad, unless I am missing something.)
For the uniqueness check cross all partitions, we tried to follow the implementation of uniqueness check on a single partition, and added a loop to check uniqueness on other partitions after the index tuple has been inserted to current index partition but before this index tuple has been made visible. The uniqueness check will wait `XactLockTableWait` if there is a valid transaction in process, and performs the uniqueness check again after the in-process transaction finished.
We tried to simulate this duplicate value case in blow steps:
1) prepare the partitioned table,
CREATE TABLE gidx_part (a int, b int, c text) PARTITION BY RANGE (a);
CREATE TABLE gidx_part1 partition of gidx_part FOR VALUES FROM (0) TO (10);
CREATE TABLE gidx_part2 partition of gidx_part FOR VALUES FROM (10) TO (20);
2) having two psql consoles hooked up with gdbs and set break points after _bt_doinsert
result = _bt_doinsert(rel, itup, checkUnique, indexUnchanged, heapRel);
inside btinsert function in nbtree.c file.
3) first, execute `INSERT INTO gidx_part values(1, 1, 'test');` on console-1, and then execute `INSERT INTO gidx_part values(11, 1, 'test');` on console-2 (expect duplicated value '1' in the 2nd column to be detected),
The test results is that: console-2 query will have to wait until either console-1 committed or aborted. If console-1 committed, then console-2 reports duplicated value already exists; if console-1 aborted, then console-2 will report insert successfully. If there is a deadlock, then the one detected this deadlock will error out to allow the other one continue.
I am not quite sure if this is a proper way to deal with a deadlock in this case. It would be so grateful if someone could help provide some cases/methods to verify this cross all partitions uniqueness.
Best regards,
David
On Fri, Nov 25, 2022 at 05:03:06PM -0800, David Zhang wrote: > Hi Bruce, > > Thank you for helping review the patches in such detail. > > On 2022-11-25 9:48 a.m., Bruce Momjian wrote: > > Looking at the patch, I am unclear how the the patch prevents concurrent > duplicate value insertion during the partitioned index checking. I am > actually not sure how that can be done without locking all indexes or > inserting placeholder entries in all indexes. (Yeah, that sounds bad, > unless I am missing something.) > > For the uniqueness check cross all partitions, we tried to follow the > implementation of uniqueness check on a single partition, and added a loop to > check uniqueness on other partitions after the index tuple has been inserted to > current index partition but before this index tuple has been made visible. The > uniqueness check will wait `XactLockTableWait` if there is a valid transaction > in process, and performs the uniqueness check again after the in-process > transaction finished. I can't see why this wouldn't work, but I also can't think of any cases where we do this in our code already, so it will need careful consideration. We kind of do this for UPDATE and unique key conflicts, but only for a single index entry. where we peek and sleep on pending changes, but not across indexes. > I am not quite sure if this is a proper way to deal with a deadlock in this > case. It would be so grateful if someone could help provide some cases/methods > to verify this cross all partitions uniqueness. I assume you are using our existing deadlock detection code, and just sleeping in various indexes and expecting deadlock detection to happen. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com Embrace your flaws. They make you human, rather than perfect, which you will never be.
On Fri, Nov 18, 2022 at 12:03:53PM +0300, Sergei Kornilov wrote: > Hello > Do we need new syntax actually? I think that a global unique index can be created automatically instead of raising an error"unique constraint on partitioned table must include all partitioning columns" I may suggest even more of the new syntax. If someone has to implement sequential index checking on unique constraints, then it would be useful to be able to do that inde- pendent of partitioning also. E.g. for some kinds of manual partitions or for strangely de- signed datasets. Or for some of the table partitions instead for all of them. For that reason, perhaps some other type of unique index -- that is not an index per se, but a check against a set of indexes -- could be added. Or, perhaps, not an index, but an EXCLUDE con- straint of that kind.
On 11/24/22 19:15, Cary Huang wrote: > ---- On Thu, 24 Nov 2022 08:00:59 -0700 Thomas Kellerer wrote --- > > Pavel Stehule schrieb am 24.11.2022 um 07:03: > > > There are many Oracle users that find global indexes useful despite > > > their disadvantages. > > > > > > I have seen this mostly when the goal was to get the benefits of > > > partition pruning at runtime which turned the full table scan (=Seq Scan) > > > on huge tables to partition scans on much smaller partitions. > > > Partition wise joins were also helpful for query performance. > > > The substantially slower drop partition performance was accepted in thos cases > > > > > > > > > I think it would be nice to have the option in Postgres as well. > > > > > > I do agree however, that the global index should not be created automatically. > > > > > > Something like CREATE GLOBAL [UNIQUE] INDEX ... would be a lot better > > > > > > > > > Is it necessary to use special marks like GLOBAL if this index will > > > be partitioned, and uniqueness will be ensured by repeated > > > evaluations? > > > > > > Or you think so there should be really forced one relation based > > > index? > > > > > > I can imagine a unique index on partitions without a special mark, > > > that will be partitioned, and a second variant classic index created > > > over a partitioned table, that will be marked as GLOBAL. > > > > > > My personal opinion is, that a global index should never be created > > automatically. > > > > The user should consciously decide on using a feature > > that might have a serious impact on performance in some areas. > > > Agreed, if a unique index is created on non-partition key columns without including the special mark (partition key columns),it may be a mistake from user. (At least I make this mistake all the time). Current PG will give you a warning toinclude the partition keys, which is good. > > If we were to automatically turn that into a global unique index, user may be using the feature without knowing and experiencingsome performance impacts (to account for extra uniqueness check in all partitions). I disagree. A user does not need to know that a table is partitionned, and if the user wants a unique constraint on the table then making them type an extra word to get it is just annoying. -- Vik Fearing
On Tue, 2022-11-29 at 13:58 +0100, Vik Fearing wrote: > I disagree. A user does not need to know that a table is partitionned, > and if the user wants a unique constraint on the table then making them > type an extra word to get it is just annoying. Hmm. But if I created a primary key without thinking too hard about it, only to discover later that dropping old partitions has become a problem, I would not be too happy either. Yours, Laurenz Albe
On Fri, 25 Nov 2022 at 20:03, David Zhang <david.zhang@highgo.ca> wrote: > > Hi Bruce, > > Thank you for helping review the patches in such detail. > > On 2022-11-25 9:48 a.m., Bruce Momjian wrote: > > Looking at the patch, I am unclear how the the patch prevents concurrent > duplicate value insertion during the partitioned index checking. I am > actually not sure how that can be done without locking all indexes or > inserting placeholder entries in all indexes. (Yeah, that sounds bad, > unless I am missing something.) > > For the uniqueness check cross all partitions, we tried to follow the implementation of uniqueness check on a single partition,and added a loop to check uniqueness on other partitions after the index tuple has been inserted to current indexpartition but before this index tuple has been made visible. The uniqueness check will wait `XactLockTableWait` if thereis a valid transaction in process, and performs the uniqueness check again after the in-process transaction finished. I think this is the key issue to discuss. The rest is all UX bikeshedding (which is pretty important in this case) but this is the core uniqueness implementation. If I understand correctly you're going to insert into the local index for the partition using the normal btree uniqueness implementation. Then while holding an exclusive lock on the index do lookups on every partition for the new key. Effectively serializing inserts to the table? I think the precedent here are "exclusion constraints" which are documented in two places in the manual: https://www.postgresql.org/docs/current/ddl-constraints.html#DDL-CONSTRAINTS-EXCLUSION https://www.postgresql.org/docs/current/sql-createtable.html#SQL-CREATETABLE-EXCLUDE These also work by doing lookups for violating entries and don't depend on any special index machinery like btree uniqueness. But I don't think they need to entirely serialize inserts either so it may be worth trying to figure out how they manage this to avoid imposing that overhead. There's a comment in src/backend/executor/execIndexing.c near the top about them but I'm not sure it covers all the magic needed for them to work... -- greg
Greg Stark <stark@mit.edu> writes: > If I understand correctly you're going to insert into the local index > for the partition using the normal btree uniqueness implementation. > Then while holding an exclusive lock on the index do lookups on every > partition for the new key. Effectively serializing inserts to the > table? ... not to mention creating a high probability of deadlocks between concurrent insertions to different partitions. If they each ex-lock their own partition's index before starting to look into other partitions' indexes, it seems like a certainty that such cases would fail. The rule of thumb about locking multiple objects is that all comers had better do it in the same order, and this isn't doing that. That specific issue could perhaps be fixed by having everybody examine all the indexes in the same order, inserting when you come to your own partition's index and otherwise just checking for conflicts. But that still means serializing insertions across all the partitions. And the fact that you need to lock all the partitions, or even just know what they all are, is going to play hob with a lot of assumptions we've made about different partitions being independent, and about what locks are needed for operations like ALTER TABLE ATTACH PARTITION. (I wonder BTW what the game plan is for attaching a partition to a partitioned table having a global index. Won't that mean having to check every row in the new partition against every one of the existing partitions? So much for ATTACH being fast.) I still think this is a dead end that will never get committed. If folks want to put time into perhaps finding an ingenious way around these problems, okay; but they'd better realize that there's a high probability of failure, or at least coming out with something nobody will want to use. regards, tom lane
On Tue, Nov 29, 2022 at 06:13:56PM -0500, Tom Lane wrote: > Greg Stark <stark@mit.edu> writes: > > If I understand correctly you're going to insert into the local index > > for the partition using the normal btree uniqueness implementation. > > Then while holding an exclusive lock on the index do lookups on every > > partition for the new key. Effectively serializing inserts to the > > table? > > ... not to mention creating a high probability of deadlocks between > concurrent insertions to different partitions. If they each > ex-lock their own partition's index before starting to look into > other partitions' indexes, it seems like a certainty that such > cases would fail. The rule of thumb about locking multiple objects > is that all comers had better do it in the same order, and this > isn't doing that. I am not sure why they would need to exclusive lock anything more than the unique index entry they are adding, just like UPDATE does. > I still think this is a dead end that will never get committed. > If folks want to put time into perhaps finding an ingenious > way around these problems, okay; but they'd better realize that > there's a high probability of failure, or at least coming out > with something nobody will want to use. Agreed, my earlier point was that this would need a lot of thought to get right since we don't do this often. The exclusion constraint is a close example, though that is in a single index. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com Embrace your flaws. They make you human, rather than perfect, which you will never be.
Bruce Momjian <bruce@momjian.us> writes: > On Tue, Nov 29, 2022 at 06:13:56PM -0500, Tom Lane wrote: >> ... not to mention creating a high probability of deadlocks between >> concurrent insertions to different partitions. If they each >> ex-lock their own partition's index before starting to look into >> other partitions' indexes, it seems like a certainty that such >> cases would fail. The rule of thumb about locking multiple objects >> is that all comers had better do it in the same order, and this >> isn't doing that. > I am not sure why they would need to exclusive lock anything more than > the unique index entry they are adding, just like UPDATE does. Assuming that you are inserting into index X, and you've checked index Y to find that it has no conflicts, what prevents another backend from inserting a conflict into index Y just after you look? AIUI the idea is to prevent that by continuing to hold an exclusive lock on the whole index Y until you've completed the insertion. Perhaps there's a better way to do that, but it's not what was described. I actually think that that problem should be soluble with a slightly different approach. The thing that feels insoluble is that you can't do this without acquiring sufficient locks to prevent addition of new partitions while the insertion is in progress. That will be expensive in itself, and it will turn ATTACH PARTITION into a performance disaster. regards, tom lane
On Tue, Nov 29, 2022 at 09:16:23PM -0500, Tom Lane wrote: > Assuming that you are inserting into index X, and you've checked > index Y to find that it has no conflicts, what prevents another > backend from inserting a conflict into index Y just after you look? > AIUI the idea is to prevent that by continuing to hold an exclusive > lock on the whole index Y until you've completed the insertion. > Perhaps there's a better way to do that, but it's not what was > described. As I understood it, you insert into index X and then scan all other indexes to look for a conflict --- if you find one, you abort with a unique index conflict. Other index changes do the same. So, for example, one session inserts into index X and then scans all other indexes. During the index scan, another session inserts into index Y, but its scan sees the index X addition and gets a uniqueness conflict error. > I actually think that that problem should be soluble with a > slightly different approach. The thing that feels insoluble > is that you can't do this without acquiring sufficient locks > to prevent addition of new partitions while the insertion is > in progress. That will be expensive in itself, and it will > turn ATTACH PARTITION into a performance disaster. Yes, that would require index locks. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com Embrace your flaws. They make you human, rather than perfect, which you will never be.
On 11/29/22 17:29, Laurenz Albe wrote: > On Tue, 2022-11-29 at 13:58 +0100, Vik Fearing wrote: >> I disagree. A user does not need to know that a table is partitionned, >> and if the user wants a unique constraint on the table then making them >> type an extra word to get it is just annoying. > > Hmm. But if I created a primary key without thinking too hard about it, > only to discover later that dropping old partitions has become a problem, > I would not be too happy either. I have not looked at this patch, but my understanding of its design is the "global" part of the index just makes sure to check a unique index on each partition. I don't see from that how dropping old partitions would be a problem. -- Vik Fearing
On Wed, 2022-11-30 at 10:09 +0100, Vik Fearing wrote: > On 11/29/22 17:29, Laurenz Albe wrote: > > On Tue, 2022-11-29 at 13:58 +0100, Vik Fearing wrote: > > > I disagree. A user does not need to know that a table is partitionned, > > > and if the user wants a unique constraint on the table then making them > > > type an extra word to get it is just annoying. > > > > Hmm. But if I created a primary key without thinking too hard about it, > > only to discover later that dropping old partitions has become a problem, > > I would not be too happy either. > > I have not looked at this patch, but my understanding of its design is > the "global" part of the index just makes sure to check a unique index > on each partition. I don't see from that how dropping old partitions > would be a problem. Right, I should have looked closer. But, according to the parallel discussion, ATTACH PARTITION might be a problem. A global index is likely to be a footgun one way or the other, so I think it should at least have a safety on (CREATE PARTITIONED GLOBAL INDEX or something). Yours, Laurenz Albe
On Tue, 29 Nov 2022 at 21:16, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > I actually think that that problem should be soluble with a > slightly different approach. The thing that feels insoluble > is that you can't do this without acquiring sufficient locks > to prevent addition of new partitions while the insertion is > in progress. That will be expensive in itself, and it will > turn ATTACH PARTITION into a performance disaster. I think there`s a lot of room to manoeuvre here. This is a new feature that doesn't need to be 100% complete or satisfy any existing standard. There are lots of options for compromises that leave room for future improvements. 1) We could just say sure ATTACH is slow if you're attaching an non-empty partition 2) We could invent a concept like convalidated and let people attach a partition without validating the uniqueness and then validate it later concurrently 3) We could say ATTACH doesn't work now and come up with a better strategy in the future Also, don't I vaguely recall something in exclusion constraints about having some kind of in-memory "intent" list where you declared that you're about to insert a value, you validate it doesn't violate the constraint and then you're free to insert it because anyone else will see your intent in memory? There might be a need for some kind of global object that only holds inserted keys long enough that other sessions are guaranteed to see the key in the correct index. And that could maybe even be in memory rather than on disk. This isn't a simple project but I don't think it's impossible as long as we keep an open mind about the requirements. -- greg
Thanks a lot for all the comments. On 2022-11-29 3:13 p.m., Tom Lane wrote: > ... not to mention creating a high probability of deadlocks between > concurrent insertions to different partitions. If they each > ex-lock their own partition's index before starting to look into > other partitions' indexes, it seems like a certainty that such > cases would fail. The rule of thumb about locking multiple objects > is that all comers had better do it in the same order, and this > isn't doing that. In the current POC patch, the deadlock is happening when backend-1 inserts a value to index X(partition-1), and backend-2 try to insert a conflict value right after backend-1 released the buffer block lock but before start to check unique on index Y(partition-2). In this case, backend-1 holds ExclusiveLock on transaction-1 and waits for ShareLock on transaction-2 , while backend-2 holds ExclusiveLock on transaction-2 and waits for ShareLock on transaction-1. Based on my debugging tests, this only happens when backend-1 and backend-2 want to insert a conflict value. If this is true, then is it ok to either `deadlock` error out or `duplicated value` error out since this is a conflict value? (hopefully end users can handle it in a similar way). I think the probability of such deadlock has two conditions: 1) users insert a conflict value and plus 2) the uniqueness checking happens in the right moment (see above). > That specific issue could perhaps be fixed by having everybody > examine all the indexes in the same order, inserting when you > come to your own partition's index and otherwise just checking > for conflicts. But that still means serializing insertions > across all the partitions. And the fact that you need to lock > all the partitions, or even just know what they all are, Here is the main change for insertion cross-partition uniqueness check in `0004-support-global-unique-index-insert-and-update.patch`, result = _bt_doinsert(rel, itup, checkUnique, indexUnchanged, heapRel); + if (checkUnique != UNIQUE_CHECK_NO) + btinsert_check_unique_gi(itup, rel, heapRel, checkUnique); + pfree(itup); where, a cross-partition uniqueness check is added after the index tuple btree insertion on current partition. The idea is to make sure other backends can find out the ongoing index tuple just inserted (but before marked as visible yet), and the current partition uniqueness check can be skipped as it has already been checked. Based on this change, I think the insertion serialization can happen in two cases: 1) two insertions happen on the same buffer block (buffer lock waiting); 2) two ongoing insertions with duplicated values (transaction id waiting);
On 2022-11-29 6:16 p.m., Tom Lane wrote: > Assuming that you are inserting into index X, and you've checked > index Y to find that it has no conflicts, what prevents another > backend from inserting a conflict into index Y just after you look? > AIUI the idea is to prevent that by continuing to hold an exclusive > lock on the whole index Y until you've completed the insertion. > Perhaps there's a better way to do that, but it's not what was > described. Another main change in patch `0004-support-global-unique-index-insert-and-update.patch`, + search_global: + stack = _bt_search(iRel, insertstate.itup_key, + &insertstate.buf, BT_READ, NULL); + xwait = _bt_check_unique_gi(iRel, &insertstate, + hRel, checkUnique, &is_unique, + &speculativeToken, heapRel); + if (unlikely(TransactionIdIsValid(xwait))) + { ... ... + goto search_global; + } Here, I am trying to use `BT_READ` to require a LW_SHARED lock on the buffer block if a match found using `itup_key` search key. The cross-partition uniqueness checking will wait if the index tuple insertion on this buffer block has not done yet, otherwise runs the uniqueness check to see if there is an ongoing transaction which may insert a conflict value. Once the ongoing insertion is done, it will go back and check again (I think it can also handle the case that a potential conflict index tuple was later marked as dead in the same transaction). Based on this change, my test results are: 1) a select-only query will not be blocked by the ongoing insertion on index X 2) insertion happening on index Y may wait for the buffer block lock when inserting a different value but it does not wait for the transaction lock held by insertion on index X. 3) when an insertion inserting a conflict value on index Y, 3.1) it waits for buffer block lock if the lock has been held by the insertion on index X. 3.2) then, it waits for transaction lock until the insertion on index X is done.
On 2022-11-29 6:16 p.m., Tom Lane wrote:
> Assuming that you are inserting into index X, and you've checked
> index Y to find that it has no conflicts, what prevents another
> backend from inserting a conflict into index Y just after you look?
> AIUI the idea is to prevent that by continuing to hold an exclusive
> lock on the whole index Y until you've completed the insertion.
> Perhaps there's a better way to do that, but it's not what was
> described.
Another main change in patch
`0004-support-global-unique-index-insert-and-update.patch`,
+ search_global:
+ stack = _bt_search(iRel, insertstate.itup_key,
+ &insertstate.buf, BT_READ,
NULL);
+ xwait = _bt_check_unique_gi(iRel, &insertstate,
+ hRel, checkUnique,
&is_unique,
+ &speculativeToken, heapRel);
+ if (unlikely(TransactionIdIsValid(xwait)))
+ {
... ...
+ goto search_global;
+ }
Here, I am trying to use `BT_READ` to require a LW_SHARED lock on the
buffer block if a match found using `itup_key` search key. The
cross-partition uniqueness checking will wait if the index tuple
insertion on this buffer block has not done yet, otherwise runs the
uniqueness check to see if there is an ongoing transaction which may
insert a conflict value. Once the ongoing insertion is done, it will go
back and check again (I think it can also handle the case that a
potential conflict index tuple was later marked as dead in the same
transaction). Based on this change, my test results are:
1) a select-only query will not be blocked by the ongoing insertion on
index X
2) insertion happening on index Y may wait for the buffer block lock
when inserting a different value but it does not wait for the
transaction lock held by insertion on index X.
3) when an insertion inserting a conflict value on index Y,
3.1) it waits for buffer block lock if the lock has been held by
the insertion on index X.
3.2) then, it waits for transaction lock until the insertion on
index X is done.
On 2022-12-19 7:51 a.m., Nikita Malakhov wrote: > Sorry to bother - but is this patch used in IvorySQL? > Here: > https://www.ivorysql.org/docs/Global%20Unique%20Index/create_global_unique_index > According to syntax it definitely looks like this patch. The global unique index is one of the features required in IvorySQL development. We want to share it to the communities to get more feedback, and then hopefully we could better contribute it back to PostgreSQL. Best regards, David
On 2022-11-29 6:16 p.m., Tom Lane wrote: > Assuming that you are inserting into index X, and you've checked > index Y to find that it has no conflicts, what prevents another > backend from inserting a conflict into index Y just after you look? > AIUI the idea is to prevent that by continuing to hold an exclusive > lock on the whole index Y until you've completed the insertion. > Perhaps there's a better way to do that, but it's not what was > described. During inserts, global unique index patch does not acquire exclusive lock on the whole index Y while checking it for the uniqueness; it acquires a low level AccessShareLock on Y and will release after checking. So while it is checking, another backend can still insert a duplicate in index Y. If this is the case, a "transaction level lock" will be triggered. For example. Say backend A inserts into index X, and checks index Y to find no conflict, and backend B inserts a conflict into index Y right after. In this case, backend B still has to check index X for conflict and It will fetch a duplicate tuple that has been inserted by A, but it cannot declare a duplicate error yet. This is because the transaction inserting this conflict tuple started by backend A is still in progress. At this moment, backend B has to wait for backend A to commit / abort before it can continue. This is how "transaction level lock" prevents concurrent insert conflicts. There is a chance of deadlock if the conflicting insertions done by A and B happen at roughly the same time, where both backends trigger "transaction level lock" to wait for each other to commit/abort. If this is the case, PG's deadlock detection code will error out one of the backends. It should be okay because it means one of the backends tries to insert a conflict. The purpose of global unique index is also to error out backends trying to insert duplicates. In the end the effects are the same, it's just that the error says deadlock detected instead of duplicate detected. If backend B did not insert a conflicting tuple, no transaction lock wait will be triggered, and therefore no deadlock will happen. Regards Cary Huang ----------------------- HighGo Software Canada
On 2022-11-30 2:30 p.m., Greg Stark wrote: > On Tue, 29 Nov 2022 at 21:16, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I actually think that that problem should be soluble with a >> slightly different approach. The thing that feels insoluble >> is that you can't do this without acquiring sufficient locks >> to prevent addition of new partitions while the insertion is >> in progress. That will be expensive in itself, and it will >> turn ATTACH PARTITION into a performance disaster. > I think there`s a lot of room to manoeuvre here. This is a new feature > that doesn't need to be 100% complete or satisfy any existing > standard. There are lots of options for compromises that leave room > for future improvements. > > 1) We could just say sure ATTACH is slow if you're attaching an > non-empty partition > 2) We could invent a concept like convalidated and let people attach a > partition without validating the uniqueness and then validate it later > concurrently > 3) We could say ATTACH doesn't work now and come up with a better > strategy in the future > > Also, don't I vaguely recall something in exclusion constraints about > having some kind of in-memory "intent" list where you declared that > you're about to insert a value, you validate it doesn't violate the > constraint and then you're free to insert it because anyone else will > see your intent in memory? There might be a need for some kind of > global object that only holds inserted keys long enough that other > sessions are guaranteed to see the key in the correct index. And that > could maybe even be in memory rather than on disk. > > This isn't a simple project but I don't think it's impossible as long > as we keep an open mind about the requirements. In the current global unique index implementation, ATTACH can be slow if there are concurrent inserts happening. ATTACH tries to acquire shareLock on all existing partitions and partition-to-be before it scans and sorts them for uniqueness check. It will release them only after all partitions have been checked. If there are concurrent inserts, ATTACH has to wait for all inserts complete. Likewise, if ATTACH is in progress, inserts have to wait as well. This is an issue now. If we were to make ATTACH acquire a lower level lock (AccessShareLock), scans a partition, and then release it. there is nothing stopping any concurrent inserts from inserting a conflict right after it finishes checking. This is another issue. There is no transaction level lock being triggered here like in multiple concurent inserts case Another email thread called "create index concurrently on partitioned index" discuss some approaches that may be used to solve the attach issue here, basically to allow ATTACH PARTITION CONCURRENTLY... regards Cary Huang --------------------------------- HighGo Software Canada