Re: Making joins involving ctid work for the benefit of UPSERT

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: Making joins involving ctid work for the benefit of UPSERT
Дата
Msg-id CA+TgmoZdW5EfRVRgkdJzXG2qun2FnSf_c1eNunO0gS+svhp6+Q@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Making joins involving ctid work for the benefit of UPSERT  (Peter Geoghegan <pg@heroku.com>)
Ответы Re: Making joins involving ctid work for the benefit of UPSERT  (Peter Geoghegan <pg@heroku.com>)
Re: Making joins involving ctid work for the benefit of UPSERT  (Bruce Momjian <bruce@momjian.us>)
Список pgsql-hackers
On Wed, Jul 23, 2014 at 7:35 PM, Peter Geoghegan <pg@heroku.com> wrote:
>> It's certain arguable whether you should INSERT and then turn failures
>> into an update or try to UPDATE and then turn failures into an INSERT;
>> we might even want to have both options available, though that smells
>> a little like airing too much of our dirty laundry.  But I think I
>> generally favor optimizing for the UPDATE case for more or less the
>> same reasons Kevin articulated.
>
> I don't see the connection between this and Kevin's remarks. And FWIW,
> I don't see a reason to favor inserts or updates. Fortunately, what I
> have balances both cases very well, and doesn't cause bloat. The work
> of descending the index to lock it isn't wasted if an update is
> required. My implementation decides to either insert or update at
> literally the latest possible moment.

AFAIUI, this is because your implementation uses lwlocks in a way that
Andres and I both find unacceptable.  My suspicion is that any version
of this that ends up getting committed is going to involve a risk of
bloat in cases involving retries, and I think it will be easier to
minimize bloat in an update-driven implementation.  But I suppose
that's speculative.

>> Here you seem to be suggested that I intended to propose your existing
>> design rather than something else, which I didn't.  In this design,
>> you find the conflict (at most one) but scanning for the tuple to be
>> updated.
>
> Yes, but what if you don't see a conflict because it isn't visible to
> your snapshot, and then you insert, and only then (step 5), presumably
> with a dirty snapshot, you find a conflict? How does the loop
> terminate if that brings you back to step 1 with the same MVCC
> snapshot feeding the update?

Good point.  Maybe the syntax should be something like:

UPSERT table (keycol [, keycol] ...) { VALUES (val [, val] ...) [,
...] | select_query }

That would address both the concern about being able to pipe multiple
tuples through it and the point you just raised.  We look for a row
that matches each given tuple on the key columns; if one is found, we
update it; if none is found, we insert.

> I agree that you want to uniquely identify each tuple. What I meant
> was, why should we not be able to upsert multiple rows in a single
> command? What's wrong with that?

Nothing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Kevin Grittner
Дата:
Сообщение: Re: delta relations in AFTER triggers
Следующее
От: Robert Haas
Дата:
Сообщение: Re: gaussian distribution pgbench -- splits v4