Re: [HACKERS] INSERT ... ON CONFLICT () SELECT

Поиск
Список
Период
Сортировка
От Matt Pulver
Тема Re: [HACKERS] INSERT ... ON CONFLICT () SELECT
Дата
Msg-id CAHiCE4XHu=7EoupTTqVT+XPQDweKAK1-+Wt2AuSp-AXnKSr8eA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] INSERT ... ON CONFLICT () SELECT  (Peter Geoghegan <pg@bowt.ie>)
Ответы Re: [HACKERS] INSERT ... ON CONFLICT () SELECT  (Peter Geoghegan <pg@bowt.ie>)
Список pgsql-hackers
On Sat, Jun 17, 2017 at 9:55 PM, Peter Geoghegan <pg@bowt.ie> wrote:
On Sat, Jun 17, 2017 at 7:49 AM, Matt Pulver <mpulver@unitytechgroup.com> wrote:
> With the proposed "INSERT ... ON CONFLICT () SELECT" feature, the
> get_or_create_id() function is simplified to:

Are you locking the existing rows? Because otherwise, the
determination that they're conflicting can become obsolete immediately
afterwards. (I guess you would be.)

If that is required in order to return the rows in their conflicted state, then yes.


The problem with this design and similar designs is that presumably
the user is sometimes projecting the conflicting rows with the
intention of separately updating them in a wCTE. That might not work,
because only ON CONFLICT doesn't use the MVCC snapshot, in order to
ensure that an UPDATE is guaranteed when an INSERT cannot go ahead.
That isn't what you're doing in the example you gave, but surely some
users would try to do things like that, and get very confused.

Ultimately the proposed "INSERT ... ON CONFLICT () DO SELECT" syntax is still an INSERT statement, not a SELECT, so a user should not expect rows returned from it to be available for UPDATE/DELETE in another part of a wCTE. Anyone who understands this behavior for an INSERT statement, let alone the current "INSERT ... ON CONFLICT DO UPDATE" should not be too surprised if the same thing applies to the new "INSERT ... ON CONFLICT DO SELECT".


I think that what you propose to do here would likely create a lot of
confusion by mixing MVCC semantics with special UPSERT visibility
semantics ("show me the latest row version visible to any possible
snapshot for the special update") even without a separate UPDATE, in
fact. Would you be okay if "id" appeared duplicated in the rows you
project in your new syntax, even when there is a separate unique
constraint on that column? I suppose that there is some risk of things
like that today, but this would make the "sleight of hand" used by ON
CONFLICT DO UPDATE more likely to cause problems.

 Good point. Here is an example using the example table from my previous email:

INSERT INTO example (name) VALUES ('foo'), ('foo')
ON CONFLICT (name) DO SELECT
RETURNING *

Here are a couple options of how to handle this:

1) Return two identical rows (with the same id).
2) Produce an error, with error message:
"ERROR:  ON CONFLICT DO SELECT command cannot reference row a second time
HINT:  Ensure that no rows proposed for insertion within the same command have duplicate constrained values."

This would be nearly identical to the existing error message that is produced when running:

INSERT INTO example (name) VALUES ('foo'), ('foo')
ON CONFLICT (name) DO UPDATE SET value=1
RETURNING *

which gives the error message:
"ERROR:  ON CONFLICT DO UPDATE command cannot affect row a second time
HINT:  Ensure that no rows proposed for insertion within the same command have duplicate constrained values."

Technically, an error doesn't need to be produced for the above "ON CONFLICT DO SELECT" statement - I think it is still perfectly well-defined to return duplicate rows as in option 1. Option 2 is more for the benefit of the user who is probably doing something wrong by attempting to INSERT a set of rows that violate a constraint. What do you think would be best?

Thank you for the discussion.

Best regards,
Matt

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Julien Rouhaud
Дата:
Сообщение: [HACKERS] Typo in insert.sgml
Следующее
От: Shubham Barai
Дата:
Сообщение: Re: [HACKERS] GSoC 2017 : Patch for predicate locking in Gist index