Re: Promise index tuples for UPSERT

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Re: Promise index tuples for UPSERT
Дата
Msg-id 542E6647.3070400@vmware.com
обсуждение исходный текст
Ответ на Re: Promise index tuples for UPSERT  (Simon Riggs <simon@2ndquadrant.com>)
Ответы Re: Promise index tuples for UPSERT
Re: Promise index tuples for UPSERT
Список pgsql-hackers
On 10/03/2014 11:07 AM, Simon Riggs wrote:
> On 1 October 2014 20:54, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:
>> On 10/01/2014 02:34 PM, Simon Riggs wrote:
>>>
> ...
>>> When later insert scans see the promise tuple they perform
>>> XactLockTableWait() and when they get control they look again for the
>>> key. If they find a promise tuple with an aborted xid they replace
>>> that value with their own xid and continue as a). Otherwise b).
>>
>>
>> XactLockTableWait() waits until the end of transaction, that's not you want
>> here. If the backend that inserted the promise tuple decides to not proceed
>> with the insertion, and removes the promise tuple, the backend waiting on it
>> needs to be woken up more or less immediately, not when the transaction
>> completes.
>
> There is no "remove promise tuple" part of the above design.
>
>> I had this exact same issue in my earlier patch versions, fixed it with a
>> new kind of heavy-weight lock, and a new field in PGPROC
>> (http://www.postgresql.org/message-id/52D00D2D.6030307@vmware.com). That
>> wasn't very pretty, but it got the job done. Some other design might be
>> better.
>
> Currently, if some fool developer decides to insert rows that already
> duplicate values in the table, then currently he inserts a heap row,
> then sees a potential conflict in the index and waits for the
> conflicting xact to end. His fool insert, his wait. That's how btree
> does this now.
> I don't see any reason why we need to do better for Upsert.
>
> If the row already exists we need to be able to quickly change into an
> update, but we still only support one write xact at a time.

That lowers the bar from what I thought everyone agreed on. Namely, if 
two backends run a similar UPSERT command concurrently on a table that 
has more than one unique constraint, they might deadlock, causing one of 
them to throw an error instead of INSERTing or UPDATEing anything.

I'm sure that's useful enough in many applications, but I'd like to have 
a more robust implementation. The shorter we can keep the list of 
caveats, the better.

> The value in the index needs to be protected by a block level lock, so
> we can check it quickly, but the eventual heap work is serialized by
> transactional semantics.
>
> I think a little perspective is due here and we should stick to the
> main use case, not cater for bizarre edge cases.

I'm trying to bisect your thoughts on exactly what use cases you think 
we must support, and which ones you consider bizarre edge cases, and 
what exactly is acceptable behavior in those edge cases.

> What we are looking for is something that can decide whether it is an
> insert or an update and progress quickly with that. It needs to be
> correct, but there is no requirement to be faster than currently
> possible, in the case of concurrency.

I believe we all agree on that.

> Any form of tuple locking that uses the general lock manager will not
> be usable. I can't see it is worth the overhead of doing that to
> protect against deadlocks that would only be experienced by people
> doing foolish things.

Maybe, maybe not, but let's define the acceptable behavior first, and 
think about the implementation second. I'm pretty sure all of the 
approaches discussed so far can be made fast enough, and the bloat 
issues can be made small enough, that it doesn't matter much which one 
we choose from a performance point of view. The differences are in what 
use cases they can support, and the maintainability of the code.

- Heikki




В списке pgsql-hackers по дате отправления:

Предыдущее
От: Dimitri Fontaine
Дата:
Сообщение: Re: DDL Damage Assessment
Следующее
От: Kouhei Kaigai
Дата:
Сообщение: Re: How to make ResourceOwnerForgetBuffer() O(1), instead of O(N^2) scale