Re: Remove HeapTuple and Buffer dependency for predicate locking functions

Поиск
Список
Период
Сортировка
От Ashwin Agrawal
Тема Re: Remove HeapTuple and Buffer dependency for predicate locking functions
Дата
Msg-id CALfoeivQ6xJdwmgSfcP+_NjTh8yjgjoHqrRqpbXxG_fRaytU4Q@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Remove HeapTuple and Buffer dependency for predicate lockingfunctions  (Andres Freund <andres@anarazel.de>)
Ответы Re: Remove HeapTuple and Buffer dependency for predicate locking functions  (Thomas Munro <thomas.munro@gmail.com>)
Re: Remove HeapTuple and Buffer dependency for predicate locking functions  (Ashwin Agrawal <aagrawal@pivotal.io>)
Список pgsql-hackers

On Wed, Jul 31, 2019 at 2:06 PM Andres Freund <andres@anarazel.de> wrote:
Looking at the code as of master, we currently have:

Super awesome feedback and insights, thank you!

- PredicateLockTuple() calls SubTransGetTopmostTransaction() to figure
  out a whether the tuple has been locked by the current
  transaction. That check afaict just should be
  TransactionIdIsCurrentTransactionId(), without all the other
  stuff that's done today.

Agree. v1-0002 patch attached does that now. Please let me know if that's what you meant.

  TransactionIdIsCurrentTransactionId() imo ought to be optimized to
  always check for the top level transactionid first - that's a good bet
  today, but even moreso for the upcoming AMs that won't have separate
  xids for subtransactions.  Alternatively we shouldn't make that a
  binary search for each subtrans level, but just have a small
  simplehash hashtable for xids.

v1-0001 patch checks for GetTopTransactionIdIfAny() first in TransactionIdIsCurrentTransactionId() which seems yes better in general and more for future. That mostly meets the needs for current discussion.

The alternative of not using binary search seems bigger refactoring and should be handled as separate optimization exercise outside of this thread.
 
- CheckForSerializableConflictOut() wants to get the toplevel xid for
  the tuple, because that's the one the predicate hashtable stores.

  In your patch you've already moved the HTSV() call etc out of
  CheckForSerializableConflictOut(). I'm somewhat inclined to think that
  the SubTransGetTopmostTransaction() call ought to go along with that.
  I don't really think that belongs in predicate.c, especially if
  most/all new AMs don't use subtransaction ids.

  The only downside is that currently the
  TransactionIdEquals(xid, GetTopTransactionIdIfAny()) check
  avoids the SubTransGetTopmostTransaction() check.

  But again, the better fix for that seems to be to improve the generic
  code. As written the check won't prevent a subtrans lookup for heap
  when subtransactions are in use, and it's IME pretty common for tuples
  to get looked at again in the transaction that has created them. So
  I'm somewhat inclined to think that SubTransGetTopmostTransaction()
  should have a fast-path for the current transaction - probably just
  employing TransactionIdIsCurrentTransactionId().

That optimization, as Kuntal also mentioned, seems something which can be done on-top afterwards on current patch.
 
I don't really see what we gain by having the subtrans handling in the
predicate code. Especially given that we've already moved the HTSV()
handling out, it seems architecturally the wrong place to me - but I
admit that that's a fuzzy argument.  The relevant mapping should be one
line in the caller.

Okay, I moved the sub transaction handling out of CheckForSerializableConflictOut() and have it along side HTSV() now.

The reason I felt leaving subtransaction handling in generic place, was it might be premature to thing no future AM will need it. Plus, all serializable function api's having same expectations is easier. Like PredicateLockTuple() can be passed top or subtransaction id and it can handle it but with the change CheckForSerializableConflictOut() only be feed top transaction ID. But its fine and can see the point of AM needing it can easily get top transaction ID and feed it as heap.
 
I wonder if it'd be wroth to combine the
TransactionIdIsCurrentTransactionId() calls in the heap cases that
currently do both, PredicateLockTuple() and
HeapCheckForSerializableConflictOut(). The heap_fetch() case probably
isn't commonly that hot a pathq, but heap_hot_search_buffer() is.

Maybe, will give thought to it separate from the current patch.
 
Minor notes:
- I don't think 'insert_xid' is necessarily great - it could also be the
  updating xid etc. And while you can argue that an update is an insert
  in the current heap, that's not the case for future AMs.
- to me
@@ -1621,7 +1622,8 @@ heap_hot_search_buffer(ItemPointer tid, Relation relation, Buffer buffer,
                        if (valid)
                        {
                                ItemPointerSetOffsetNumber(tid, offnum);
-                               PredicateLockTuple(relation, heapTuple, snapshot);
+                               PredicateLockTID(relation, &(heapTuple)->t_self, snapshot,
+                                                                HeapTupleHeaderGetXmin(heapTuple->t_data));
                                if (all_dead)
                                        *all_dead = false;
                                return true;

 What are those parens - as placed they can't do anything. Did you
 intend to write &(heapTuple->t_self)? Even that is pretty superfluous,
 but it at least clarifies the precedence.

Fixed. No idea what I was thinking there, mostly feel I intended to have it as like &(heapTuple->t_self).

  I'm also a bit confused why we don't need to pass in the offset of the
 current tuple, rather than the HOT root tuple here. That's not related
 to this patch. But aren't we locking the wrong tuple here, in case of
 HOT?

Yes, root is being locked here instead of the HOT. But I don't have full context on the same. If we wish to fix it though, can be easily done now with the patch by passing "tid" instead of &(heapTuple->t_self).

- I wonder if CheckForSerializableConflictOutNeeded() shouldn't have a
  portion of it's code as a static inline. In particular, it's a shame
  that we currently perform external function calls at quite the
  frequency when serializable isn't even in use.

I am not sure on portion of the code part? SerializationNeededForRead() is static inline function in C file. Can't inline CheckForSerializableConflictOutNeeded() without moving SerializationNeededForRead() and some other variables to header file. CheckForSerializableConflictOut() wasn't inline either, so a function call was performed earlier as well when serializable isn't even in use.

I understand that with refactor, HeapCheckForSerializableConflictOut() is called which calls CheckForSerializableConflictOutNeeded(). If that's the problem, for addressing the same, I had proposed alternative way to refactor. CheckForSerializableConflictOut() can take callback function and void* callback argument for AM specific check instead. So, the flow would be AM calling CheckForSerializableConflictOut() as today and only if serializable in use will invoke the callback to check with AM if more work should be performed or not. Essentially HeapCheckForSerializableConflictOut() will become callback function instead. Due to void* callback argument aspect I didn't like that solution and felt AM performing checks and calling CheckForSerializableConflictOut() seems more straight forward.

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Ian Barwick
Дата:
Сообщение: Re: [PATCH] Stop ALTER SYSTEM from making bad assumptions
Следующее
От: Stephen Frost
Дата:
Сообщение: Re: [PATCH] Stop ALTER SYSTEM from making bad assumptions