Re: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Re: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0
Дата
Msg-id 5509C6EF.5040606@iki.fi
обсуждение исходный текст
Ответ на Re: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0  (Peter Geoghegan <pg@heroku.com>)
Re: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0  (Peter Geoghegan <pg@heroku.com>)
Список pgsql-hackers
On 03/18/2015 06:23 PM, Robert Haas wrote:
> On Tue, Mar 17, 2015 at 6:27 PM, Peter Geoghegan <pg@heroku.com> wrote:
>> On Tue, Mar 17, 2015 at 12:11 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>>> I'm still not sure the way the speculative locking works is the best
>>> approach. Instead of clearing xmin on super-deletion, a new flag on the heap
>>> tuple seems more straightforward. And we could put the speculative insertion
>>> token in t_ctid, instead of stashing it in the PGPROC array. That would
>>> again seem more straightforward.
>>
>> I see the appeal of that approach. What concerns me about that
>> approach is that it makes it the problem of the inserter to confirm
>> its insertion, even though overwhelmingly, speculative insertions
>> succeeds (the race window is small due to the pre-check). The current
>> speculative token is legitimately a very short lived thing, a thing
>> that I hesitate to write to disk at all. So our optimistic approach to
>> inserting speculatively loses its optimism, which I don't like, if you
>> know what I mean.
>
> Modifying a shared buffer is not the same as writing to disk.

AFAICS, there is no need to go and clear the tag after the insert has 
completed.

Here's what I had in mind: the inserter tags the tuple with the 
speculative insertion token, by storing the token in the t_ctid field. 
If the inserter needs to super-delete the tuple, it sets xmax like in a 
regular deletion, but also sets another flag to indicate that it was a 
super-deletion.

When another backend inserts, and notices that it has a potential 
conflict with the first tuple, it tries to acquire a hw-lock on the 
token. In most cases, the inserter has long since completed the 
insertion, and the acquisition succeeds immediately but you have to 
check because the token is not cleared on a completed insertion. It 
would be slightly faster for the second backend if the inserter actually 
removed the token after the insertion has completed: it wouldn't need to 
check the lock if the tuple was not tagged in the first place. But 
that'd be just a tiny optimization, for the rare case that there is a 
conflict, and surely not worth it.

Hmm, I just realized a little fly in that ointment: if the inserter 
inserts enough tuples to wrap around the token counter (4 billion?) in a 
single transaction, another backend that inserts could confuse a new 
speculative insertion with one that completed a long time ago. The risk 
seems small enough that we could just accept it. I don't think anything 
particularly bad would happen on a false positive here. Or we could also 
use all 48 bits available in the t_ctid to push it truly in the realm of 
ain't-gonna-happen. Or we could use (relfilenode, block,  token) as the 
lock tag for the SpeculativeInsertionLock.

Regarding the physical layout: We can use a magic OffsetNumber value 
above MaxOffsetNumber to indicate that the t_ctid field stores a token 
rather than a regular ctid value. And another magic t_ctid value to 
indicate that a tuple has been super-deleted. The token and the 
super-deletion flag are quite ephemeral, they are not needed after the 
inserting transaction has completed, so it's nice to not consume the 
valuable infomask bits for these things. Those states are conveniently 
not possible on an updated tuple, when we would need the t_ctid field 
for it's current purpose.

>>> If I'm reading this correctly, if there are multiple indexes that match the
>>> unique index inference specification, we pick the cheapest one. Isn't that
>>> unstable? Two backends running the same INSERT ON CONFLICT statement might
>>> pick different indexes, and the decision might change over time as the table
>>> is analyzed. I think we should have a more robust rule. Could we easily just
>>> use all matching indexes?
>>
>> Robert feel pretty strongly that there should be a costing aspect to
>> this. Initially I was skeptical of this, but just did what he said all
>> the same. But I've since come around to his view entirely because we
>> now support partial unique indexes.
>
> I think Heikki's concern is something different, although I am not
> altogether up to speed on this and may be confused.  The issue is:
> suppose that process A and process B are both furiously upserting into
> the same table with the same key column but, because they have
> different costing parameters, process A consistently chooses index X
> and process B consistently chooses index Y.  In that situation, will
> we deadlock, livelock, error out, bloat, or get any other undesirable
> behavior, or is that A-OK?

Right, that's what I had in mind.

- Heikki



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Fabrízio de Royes Mello
Дата:
Сообщение: Re: Can pg_dump make use of CURRENT/SESSION_USER
Следующее
От: Fabrízio de Royes Mello
Дата:
Сообщение: Re: Can pg_dump make use of CURRENT/SESSION_USER