Обсуждение: An out-of-date comment in nodeIndexonlyscan.c
Hello, Since commit cdf91edb (2012), nodeIndexonlyscan.c says: /* * Predicate locks for index-only scans must be acquired at the page * level when the heap is not accessed, since tuple-level predicate * locks need the tuple's xmin value. If we had to visit the tuple * anyway, then we already have the tuple-level lock and can skip the * page lock. */ if (tuple == NULL) PredicateLockPage(scandesc->heapRelation, ItemPointerGetBlockNumber(tid), estate->es_snapshot); The first sentence of that comment is no longer true as of commit c01262a8 (2013). As for whether it's necessary to predicate-lock the whole eheap page (rather than the heap tuple) anyway because of HOT update chains, I don't know, so I'm not sure what wording to propose instead. -- Thomas Munro http://www.enterprisedb.com
On 14/05/18 02:15, Thomas Munro wrote: > Hello, > > Since commit cdf91edb (2012), nodeIndexonlyscan.c says: > > /* > * Predicate locks for index-only scans must be > acquired at the page > * level when the heap is not accessed, since > tuple-level predicate > * locks need the tuple's xmin value. If we had to > visit the tuple > * anyway, then we already have the tuple-level lock > and can skip the > * page lock. > */ > if (tuple == NULL) > PredicateLockPage(scandesc->heapRelation, > > ItemPointerGetBlockNumber(tid), > estate->es_snapshot); > > The first sentence of that comment is no longer true as of commit > c01262a8 (2013). As for whether it's necessary to predicate-lock the > whole eheap page (rather than the heap tuple) anyway because of HOT > update chains, I don't know, so I'm not sure what wording to propose > instead. Hmm. If there are any updated tuples, HOT or not, the visibility map bit will not be set, and we won't reach this code. So I think we could acquire the tuple lock here. - Heikki
On Thu, May 17, 2018 at 3:49 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > On 14/05/18 02:15, Thomas Munro wrote: > > Since commit cdf91edb (2012), nodeIndexonlyscan.c says: > > > > /* > > * Predicate locks for index-only scans must be > > acquired at the page > > * level when the heap is not accessed, since > > tuple-level predicate > > * locks need the tuple's xmin value. If we had to > > visit the tuple > > * anyway, then we already have the tuple-level lock > > and can skip the > > * page lock. > > */ > > if (tuple == NULL) > > PredicateLockPage(scandesc->heapRelation, > > > > ItemPointerGetBlockNumber(tid), > > estate->es_snapshot); > > > > The first sentence of that comment is no longer true as of commit > > c01262a8 (2013). As for whether it's necessary to predicate-lock the > > whole eheap page (rather than the heap tuple) anyway because of HOT > > update chains, I don't know, so I'm not sure what wording to propose > > instead. > > Hmm. If there are any updated tuples, HOT or not, the visibility map bit > will not be set, and we won't reach this code. So I think we could > acquire the tuple lock here. Right. CC Kevin. I think we should at least fix the comment. As for changing the lock level, PredicateLockTuple() wants a heap tuple that we don't have, so we'd probably need to add a PredicateLockTid() function. -- Thomas Munro http://www.enterprisedb.com
On Fri, Feb 8, 2019 at 4:58 AM Thomas Munro <thomas.munro@enterprisedb.com> wrote: > On Thu, May 17, 2018 at 3:49 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > > On 14/05/18 02:15, Thomas Munro wrote: > > > The first sentence of that comment is no longer true as of commit > > > c01262a8 (2013). As for whether it's necessary to predicate-lock the > > > whole eheap page (rather than the heap tuple) anyway because of HOT > > > update chains, I don't know, so I'm not sure what wording to propose > > > instead. > > > > Hmm. If there are any updated tuples, HOT or not, the visibility map bit > > will not be set, and we won't reach this code. So I think we could > > acquire the tuple lock here. > > Right. CC Kevin. I think we should at least fix the comment. As for > changing the lock level, PredicateLockTuple() wants a heap tuple that > we don't have, so we'd probably need to add a PredicateLockTid() > function. Here's a patch I'd like to commit to fix the comment. We could look at improving the actual code after https://commitfest.postgresql.org/23/2169/ is done. I wonder if it might be possible to avoid page locks on both the heap *and* index in some limited cases, as I mentioned over here (just an idea, could be way off base): https://www.postgresql.org/message-id/CA%2BhUKGJGDVfhHmoUGzi-_J%2BN8FmRjmWTY0MCd1ZV5Fj9qdyb1w%40mail.gmail.com -- Thomas Munro https://enterprisedb.com
Вложения
On Thu, Jun 27, 2019 at 4:33 PM Thomas Munro <thomas.munro@gmail.com> wrote:
Here's a patch I'd like to commit to fix the comment. We could look
at improving the actual code after
https://commitfest.postgresql.org/23/2169/ is done.
Change looks good to me.
I wonder if it might be possible to avoid page locks on both the heap
*and* index in some limited cases, as I mentioned over here (just an
idea, could be way off base):
https://www.postgresql.org/message-id/CA%2BhUKGJGDVfhHmoUGzi-_J%2BN8FmRjmWTY0MCd1ZV5Fj9qdyb1w%40mail.gmail.com
On Fri, Jun 28, 2019 at 12:55 PM Ashwin Agrawal <aagrawal@pivotal.io> wrote: > Change looks good to me. Pushed, thanks. -- Thomas Munro https://enterprisedb.com
On Fri, Jun 28, 2019 at 12:55 PM Ashwin Agrawal <aagrawal@pivotal.io> wrote: > On Thu, Jun 27, 2019 at 4:33 PM Thomas Munro <thomas.munro@gmail.com> wrote: >> I wonder if it might be possible to avoid page locks on both the heap >> *and* index in some limited cases, as I mentioned over here (just an >> idea, could be way off base): >> >> https://www.postgresql.org/message-id/CA%2BhUKGJGDVfhHmoUGzi-_J%2BN8FmRjmWTY0MCd1ZV5Fj9qdyb1w%40mail.gmail.com > > I am in same boat as you. One can get serializable conflict error today based on tuple gets HOT updated or not. HOT islogically internal code optimization and not so much user functionality, so ideally feels shouldn't affect serializablebehavior. But it does currently, again, due to index locking. Disable HOT update and 4 isolation tests fail dueto "could not serialize access due to read/write dependencies among transactions" otherwise not. If the tuple happensto fit on same page user will not get the error, if the tuple gets inserted to different page the error can happen,due to index page locking. I had discussed this with Heikki and thinking is we shouldn't need to take the lock onthe index page, if the index key was not changed, even if it was a non-HOT update. Not sure of complications and implications,but just a thought. Oh, I think the idea I was suggesting might be the same as this item from README-SSI (unrelated to HOT, but related to index uniqueness, particularly in index-only-scan where you might be able to skip the btree page lock): "Several optimizations are possible, though not all are implemented yet: ... * An index scan which is comparing for equality on the entire key for a unique index need not acquire a predicate lock as long as a key is found corresponding to a visible tuple which has not been modified by another transaction -- there are no "between or around" gaps to cover." -- Thomas Munro https://enterprisedb.com
On Fri, Jun 28, 2019 at 12:55 PM Ashwin Agrawal <aagrawal@pivotal.io> wrote: > On Thu, Jun 27, 2019 at 4:33 PM Thomas Munro <thomas.munro@gmail.com> wrote: >> >> Here's a patch I'd like to commit to fix the comment. We could look >> at improving the actual code after >> https://commitfest.postgresql.org/23/2169/ is done. > > Change looks good to me. ... and here is the corresponding code change, with a test to demonstrate the change. I'm working on a proof-of-concept to skip the btree page lock sometimes too, which seems promising, but it requires some planner work which has temporarily pretzeled my brain.
Вложения
On Sat, Jun 12, 2021 at 2:35 PM Thomas Munro <thomas.munro@gmail.com> wrote: > ... and here is the corresponding code change, with a test to > demonstrate the change. > > I'm working on a proof-of-concept to skip the btree page lock > sometimes too, which seems promising, but it requires some planner > work which has temporarily pretzeled my brain. Here's a highly experimental patch I came up with that seems to produce the right results in simple cases, without (yet) involving the planner. The regression tests show single table queries, but it works also for nest loop joins, which is where this optimisation should be most interesting, I think. There are a few weird things about this patch though, and there could well be much better ways to do it, as noted in the commit message and comments. It's a start on the problem...
Вложения
On Mon, 14 Jun 2021 at 00:02, Thomas Munro <thomas.munro@gmail.com> wrote: > Here's a highly experimental patch I came up with that seems to > produce the right results in simple cases, without (yet) involving the > planner. + /* Find all equality quals. */ + for (int i = 0; i < n_scan_keys; ++i) + { + if (scan_keys[i].sk_strategy == BTEqualStrategyNumber) + attnos[nattnos++] = scan_keys[i].sk_attno; + } + + /* Are all attributes covered? */ + /* XXX is this check enough or do we need to work harder? */ + qsort(attnos, nattnos, sizeof(AttrNumber), compare_int16); + nattnos = qunique(attnos, nattnos, sizeof(AttrNumber), compare_int16); + if (nattnos == index->rd_index->indnkeyatts) I think a more optimal and nicer way of doing that would be setting bits in a Bitmapset then checking bms_num_members is equal to n_scan_keys. David
On Mon, Jun 14, 2021 at 12:54 AM David Rowley <dgrowleyml@gmail.com> wrote: > I think a more optimal and nicer way of doing that would be setting > bits in a Bitmapset then checking bms_num_members is equal to > n_scan_keys. Shouldn't it be compared with indnkeyatts? Yes, much nicer, thanks!
Вложения
On Mon, 14 Jun 2021 at 10:03, Thomas Munro <thomas.munro@gmail.com> wrote: > > On Mon, Jun 14, 2021 at 12:54 AM David Rowley <dgrowleyml@gmail.com> wrote: > > I think a more optimal and nicer way of doing that would be setting > > bits in a Bitmapset then checking bms_num_members is equal to > > n_scan_keys. > > Shouldn't it be compared with indnkeyatts? Yes, much nicer, thanks! Oh yeah, I did mean that. Thanks for the correction. Have you also thought about deferrable unique / primary key constraints? It's possible to the uniqueness temporarily violated during a transaction when the unique constraint is deferred, For example: create table t (id int primary key deferrable initially deferred); begin; insert into t values(1),(1); select * from t; id ---- 1 1 (2 rows) I think you'd just need to add a check to ensure that indimmediate is true around where you're checking the indisunique flag. David
On Mon, Jun 14, 2021 at 2:29 PM David Rowley <dgrowleyml@gmail.com> wrote: > Have you also thought about deferrable unique / primary key constraints? > > It's possible to the uniqueness temporarily violated during a > transaction when the unique constraint is deferred, Oh, yeah, very good question. I think many scenarios involving duplicates work correctly, but I think there is a danger like this: create table t (i int primary key deferrable initially deferred, j int); create unique index on t(j); insert into t values (999, 999); -- avoid empty index set enable_seqscan = off; begin transaction isolation level serializable; insert into t values (1, 2); -- create phantom row select * from t where i = 1; delete from t where j = 2; -- remove phantom row SELECT locktype, relation::regclass, page, tuple FROM pg_locks WHERE mode = 'SIReadLock'; commit; master: locktype | relation | page | tuple ----------+----------+------+------- page | t_pkey | 1 | page | t_j_idx | 1 | (2 rows) v3 patch: locktype | relation | page | tuple ----------+----------+------+------- (0 rows) In fact the lock on t_pkey's page 1 was important: it represents the search for a tuple with i = 1, other than our temporary phantom (only allowed because constraint deferred). If someone else inserts a row matching i = 1, the SSI system will not know that we tried to look for it, because our own temporary phantom confused us. > I think you'd just need to add a check to ensure that indimmediate is > true around where you're checking the indisunique flag. Yeah, that fixes the problem in this case at least. With v4: locktype | relation | page | tuple ----------+----------+------+------- page | t_pkey | 1 | (1 row) (It's expected that t_j_idx is not locked: the delete query benefits from the optimisation when accessing the index on t(j)). That test case is a little confusing, because at no point does it ever actually create a duplicate, but I suspect the problem is avoided without the deferred constraint because you'd either get a UCV on insertion, or block anyone else from inserting until after you commit (even after you delete the phantom), and I think that may avoid the hazard. I could be confused about that. If I am wrong, then a possible general solution may be to apply the optimisation only if we find a match that wasn't written by this transaction, though I'm not sure how given the current layering.
Вложения
This patch now fails to apply, probably due to a mostly trivial collision with fdd88571454e2b00dbe446e8609c6e4294ca89ae in the test files. Can you submit a rebased version? -- Daniel Gustafsson https://vmware.com/
> On 9 Nov 2021, at 15:28, Daniel Gustafsson <daniel@yesql.se> wrote: > > This patch now fails to apply, probably due to a mostly trivial collision with > fdd88571454e2b00dbe446e8609c6e4294ca89ae in the test files. Can you submit a > rebased version? I've marked this Returned with Feedback, please feel free to resubmit when a new version of the patch is available. -- Daniel Gustafsson https://vmware.com/