Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
Дата
Msg-id CA+TgmoaBXekYM5j8hbrvUSwV90Au8VPmwVTtCXba=CPaYHDHsw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE  (Peter Geoghegan <pg@heroku.com>)
Ответы Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE  (Peter Geoghegan <pg@heroku.com>)
Список pgsql-hackers
On Mon, Sep 23, 2013 at 7:05 PM, Peter Geoghegan <pg@heroku.com> wrote:
> It suits my purposes to have the value locks be held for only an
> instant, because:
>
> [ detailed explanation ]

I don't really disagree with any of that.  TBH, I think the question
of how long value locks (as you call them) are held is going to boil
down to a question of how they end up being implemented.   As I
mentioned to you at PG Open (going through the details here for those
following along at home), we could optimistically insert the new heap
tuple, then go add index entries for it, and if we find a conflict,
then instead of erroring out, we mark the tuple we were inserting dead
and go try update the conflicting tuple instead.  In that
implementation, if we find that we have to wait for some other
transaction along the way, it's probably not worth reversing out the
index entries already inserted, because getting them into the index in
the first place was a WAL-logged operation, and therefore relatively
expensive, and IMHO it's most likely better to just hope things work
out than to risk having to redo all of that.

On the other hand, if the locks are strictly in-memory, then the cost
of releasing them all before we go to wait, and of reacquiring them
after we finish waiting, is pretty low.  There might be some
modularity issues to work through there, but they might not turn out
to be very painful, and the advantages you mention are certainly worth
accruing if it turns out to be fairly straightforward.

Personally, I think that trying to keep it all in-memory is going to
be hard.  The problem is that we can't de-optimize regular inserts or
updates to any significant degree to cater to this feature - because
as valuable as this feature is, the number of times it gets used is
still going to be a whole lot smaller than the number of times it
doesn't get used.  Also, I tend to think that we might want to define
the operation as a REPLACE-type operation with respect to a certain
set of key columns; and so we'll do the insert-or-update behavior with
respect only to the index on those columns and let the chips fall
where they may with respect to any others.  In that case this all
becomes much less urgent.

> Even *considering* this is largely academic, though, because without
> some kind of miracle guaranteeing serial ordering, a miracle that
> doesn't allow for serialization failures and also doesn't seriously
> slow down, for example, updates (by making them care about value locks
> *before* they do anything, or in the case of HOT updates *at all*),
> all of this is _impossible_. So, I say let's just do the
> actually-serial-ordering for value lock acquisition with serialization
> failures where we're > read committed. I've seriously considered what
> it would take to do it any other way so things would work how you and
> Andres expect for read committed, and it makes my head hurt, because
> apart from seeming unnecessary to me, it also seems completely
> hopeless.
>
> Am I being too negative here? Well, I guess that's possible. The fact
> is that it's really hard to judge, because all of this is really hard
> to reason about. That's what I really don't like about it.

Suppose we define the operation as REPLACE rather than INSERT...ON
DUPLICATE KEY LOCK FOR UPDATE.  Then we could do something like this:

1. Try to insert a tuple.  If no unique index conflicts occur, stop.
2. Note the identity of the conflicting tuple and mark the inserted
heap tuple dead.
3. If the conflicting tuple's inserting transaction is still in
progress, wait for the inserting transaction to end.
4. If the conflicting tuple is dead (e.g. because the inserter
aborted), start over.
5. If the conflicting tuple's key columns no longer match the key
columns of the REPLACE operation, start over.
6. If the conflicting tuple has a valid xmax, wait for the deleting or
locking transaction to end.  If xmax is still valid, follow the CTID
chain to the updated tuple, let that be the new conflicting tuple, and
resume from step 5.
7. Update the tuple, even though it may be invisible to our snapshot
(a deliberate MVCC violation!).

While this behavior is admittedly wonky from an MVCC perspective, I
suspect that it would make a lot of people happy.

>>> I respectfully
>>> suggest that that exact definition of upsert isn't a useful one,
>>> because other snapshot isolation/MVCC systems operating within the
>>> same constraints must have the same issues, and yet they manage to
>>> implement something that could be called upsert that people seem happy
>>> with.
>>
>> Yeah.  I wonder how they do that.
>
> My guess is that they have some fancy snapshot type that is used by
> the equivalent of ModifyTable subplans, that is appropriately paranoid
> about the Halloween problem and so on. How that actually might work is
> far from clear, but it's a question that I have begun to consider. As
> I said, a concern is that it would be in tension with the generalized,
> composable syntax, where we don't explicitly have a "special update".
> I'd really like to hear other opinions, though.

The tension here feels fairly fundamental to me; I don't think our
implementation is to blame.  I think the problem isn't so much to
figure out a clever trick that will make this all work in a truly
elegant fashion as it is to decide exactly how we're going to
compromise MVCC semantics in the least blatant way.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: Improving avg performance for numeric
Следующее
От: Stephen Frost
Дата:
Сообщение: Re: record identical operator