Re: INSERT ... ON CONFLICT {UPDATE | IGNORE}

Поиск
Список
Период
Сортировка
От Simon Riggs
Тема Re: INSERT ... ON CONFLICT {UPDATE | IGNORE}
Дата
Msg-id CA+U5nM+K85+iT6GnY2gUBDeW2FrtOLOzEHsBrjqkhua5skzhuA@mail.gmail.com
обсуждение исходный текст
Ответ на INSERT ... ON CONFLICT {UPDATE | IGNORE}  (Peter Geoghegan <pg@heroku.com>)
Ответы Re: INSERT ... ON CONFLICT {UPDATE | IGNORE}  (Robert Haas <robertmhaas@gmail.com>)
Re: INSERT ... ON CONFLICT {UPDATE | IGNORE}  (Peter Geoghegan <pg@heroku.com>)
Список pgsql-hackers
On 28 August 2014 03:43, Peter Geoghegan <pg@heroku.com> wrote:

> The patch currently lacks a way of referencing datums rejected for
> insertion when updating. The way MySQL handles the issue seems
> questionable. They allow you to do something like this:
>
> INSERT INTO upsert (key, val) VALUES (1 'val') ON DUPLICATE KEY UPDATE
> val = VALUES(val);
>
> The implication is that the updated value comes from the INSERT's
> VALUES() list, but emulating that seems like a bad idea. In general,
> at least with Postgres it's entirely possible that values rejected
> differ from the values appearing in the VALUES() list, due to the
> effects of before triggers. I'm not sure whether or not we should
> assume equivalent transformations during any UPDATE before triggers.
>
> This is an open item. I think it makes sense to deal with it a bit later.

IMHO it is impossible to know if any of the other code is correct
until we have a clear and stable vision of what the command is
supposed to perform.

The inner workings are less important than what the feature does.

FWIW, the row available at the end of all BEFORE triggers is clearly
the object we should be manipulating, not the original VALUES()
clause. Otherwise this type of INSERT would behave differently from
normal INSERTs. Which would likely violate RLS, if nothing else.


> As I mentioned, I have incorporated feedback from Kevin Grittner. You
> may specify a unique index to merge on from within the INSERT
> statement, thus avoiding the risk of inadvertently having the update
> affect the wrong tuple due to the user failing to consider that there
> was a would-be unique violation within some other unique index
> constraining some other attribute. You may write the DML statement
> like this:
>
> INSERT INTO upsert(key, val) VALUES(1, 'insert') ON CONFLICT WITHIN
> upsert_pkey UPDATE SET val = 'update';
>
> I think that there is a good chance that at least some people will
> want to make this mandatory. I guess that's fair enough, but I
> *really* don't want to *mandate* that users specify the name of their
> unique index in DML for obvious reasons. Perhaps we can come up with a
> more tasteful syntax that covers all interesting cases (consider the
> issues with partial unique indexes and before triggers for example,
> where a conclusion reached about which index to use during parse
> analysis may subsequently be invalidated by user-defined code, or
> ambiguous specifications in the face of overlapping attributes between
> two unique composite indexes, etc). The Right Thing is far from
> obvious, and there is very little to garner from other systems, since
> SQL MERGE promises essentially nothing about concurrency, both as
> specified by the standard and in practice. You don't need a unique
> index at all, and as I showed in my pgCon talk, there are race
> conditions even for a trivial UPSERT operations in all major SQL MERGE
> implementations.

Surely if there are multiple unique indexes then the result row must
be validated against all unique indexes before it is allowed at all?

The only problem I see is if the newly inserted row matches one row on
one unique value and a different row on a different unique index.
Turning the INSERT into an UPDATE will still fail on one or other, no
matter which index you pick. If there is one row for ALL unique
indexes then it is irrelevant which index you pick. So either way, I
cannot see a reason to specify an index.

If we do need such a construct, we have already the concept of an
IDENTITY for a table, added in 9.4, currently targeted at replication.
Listing indexes or columns in the DML statement is more pushups for
developers and ORMs, so lets KISS.


The way forwards, in my view, is to define precisely the behaviour we
wish to have. That definition will include the best current mechanism
for running an UPSERT using INSERT/UPDATE/loops and comparing that
against what is being provided here. We will then have a functional
test of equivalence of the approaches, and a basis for making a
performance test that shows that performance is increased without any
loss of concurrency.

Once we have that, we can then be certain our time spent on internals
is not wasted by overlooking a simple userland gotcha.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Marko Tiikkaja
Дата:
Сообщение: Re: pgcrypto: PGP armor headers
Следующее
От: Robert Haas
Дата:
Сообщение: Re: Inefficient barriers on solaris with sun cc