Re: [EXTERNAL]: Re: UPSERT in Postgres

Поиск
Список
Период
Сортировка
От Louis Tian
Тема Re: [EXTERNAL]: Re: UPSERT in Postgres
Дата
Msg-id SY4P282MB10529BE09B553228FAA71163A6919@SY4P282MB1052.AUSP282.PROD.OUTLOOK.COM
обсуждение исходный текст
Ответ на Re: UPSERT in Postgres  (Peter Geoghegan <pg@bowt.ie>)
Ответы Re: [EXTERNAL]: Re: UPSERT in Postgres  (Israel Brewster <ijbrewster@alaska.edu>)
Re: [EXTERNAL]: Re: UPSERT in Postgres  (Adrian Klaver <adrian.klaver@aklaver.com>)
Re: [EXTERNAL]: Re: UPSERT in Postgres  (Alban Hertroys <haramrae@gmail.com>)
Re: [EXTERNAL]: Re: UPSERT in Postgres  (Francisco Olarte <folarte@peoplecall.com>)
Список pgsql-general
Hi Peter,

Thanks for your reply. Appreciate the help and discussion.

> In general UPSERT (or any definition of it that I can think of) does
> not imply idempotency.

"Idempotence is the property of certain operations in mathematics and computer science whereby they can be applied
multipletimes without changing the result beyond the initial application." from Wikipedia. 
the concept of Idempotence when applies to HTTP is consistent with the above.
https://developer.mozilla.org/en-US/docs/Glossary/Idempotent.Or are you going by a different defintion that I am not
awareof? 
If you execute the same upsert multiple times, the state of the database remains the same as if only execute once.
If a row already exists, the first statement will update the row so does any subsequent statements. executing the same
updatemultiple time is the same as executing it only once. 
If the row doesn't exist, the first statement will insert that row and any subsequent will try to update, but the
updatehas no real effect since it the value is exactly the same as the insert.  
So by defintion, upsert is idempotent.

> It could just be a unique index or a unique constraint. So you can
> upsert on any individual unique constraint/index, or the primary key.
> Of course there might be several on a given table, but you can only
> use one as the "conflict arbiter" per statement.

Understand that I can use any unique constraint with on conflict. 
But semantically the only correct one is the primary key, since that's what identifies a row logically.
In that sense, any unique column(s) is a potential candidate for primary key.
It's more of a pedantic point rather than pragmatic one.
It's less of a problem for PostgreSQL where the semantic importance of primary key is not manifested at implementation
level,since all index points to the tuple directly 
Whereas it is more import for Databaes like MySQL where the secondary index points to the primary key index.

> It sort of has to work that way, though. In general your example might
> *not* fail, due to a row-level before trigger in the insert path.

Do you mean having the trigger to check whether new."name" is set if not use the old."name" as fallback ?

> Why doesn't your proposed upsert syntax have the same problem? I mean,
> how could it not? I guess it doesn't if you assume that it'll never
> take the insert path with your not NULL constraint example?

Use some pesudo code might be helpful here to explain the difference. 

How on conflict works at the moment.

try {
 insert row
} catch (duplicated key error) {
  update row
}

How I think it upsert should work

if (new.id exists) {
 update row 
} else {
 insert row
}

I would argue that later is a correct form of upsert given it's definition. 
The two are not equivalent when there is not null constraint on any non primary key column.

> But if you know that for sure, why not just use a regular update statement? 
Yes, in general it is not know whether the insert or update path should be taken. 

> On the other hand, if you're not sure if the insert path can be taken,
> then why is it actually helpful to not just throw an error at the
> earliest opportunity?
I am not expecting an error here. The problem is with no conflict it always go down the insert path first and results
ina not null constraint error. 
While I am expecting the insert is never executed in the first place when that row already exist (as identified by it
primarykey). So the update execute without error. 
I hope the pesudo code above is enough to clarify the difference?

> The MERGE command has various race conditions that are particularly
> relevant to UPSERT type use cases. See the wiki page you referenced
> for a huge amount of information on this.
Thanks for the pointer. Reading into it.

Cheers,
Louis Tian


From: Peter Geoghegan <pg@bowt.ie>
Sent: Friday, April 7, 2023 6:58 AM
To: Louis Tian <louis.tian@aquamonix.com.au>
Cc: pgsql-general@lists.postgresql.org <pgsql-general@lists.postgresql.org>
Subject: [EXTERNAL]: Re: UPSERT in Postgres
 
CAUTION: This email originated from outside of Envirada. Do not click links or open attachments unless you recognize
thesender and know the content is safe. 


On Thu, Apr 6, 2023 at 1:21 PM Louis Tian <louis.tian@aquamonix.com.au> wrote:
> An implicit assumption behind this definition is that table must have a primary key for the upsert operation to make
sensesince it's the primary key that uniquely identifies a row. 

It could just be a unique index or a unique constraint. So you can
upsert on any individual unique constraint/index, or the primary key.
Of course there might be several on a given table, but you can only
use one as the "conflict arbiter" per statement.

> But can we achieve the same effect with "INSERT ... ON CONFLICT DO UPDATE" like a lot of references on the internet
seemsto suggest. 
>
> insert into person (id, name) values (0, 'foo') on conflict ("id") do update set id=excluded.id, name=excluded.name
> insert into person (id, is_active) values (0, true) on conflict ("id") do update set id=excluded.id,
is_active=excluded.is_active
>
> Unfortunately. the second statement will fail due to violation of the not null constraint on the "name" column.
> PostgreSQL will always try to insert the row into the table first. and only fallback to update when the uniqueness
constraintis violated. 
> Is this behavior wrong? maybe not, I think it is doing what it reads quite literally.

It sort of has to work that way, though. In general your example might
*not* fail, due to a row-level before trigger in the insert path.

Why doesn't your proposed upsert syntax have the same problem? I mean,
how could it not? I guess it doesn't if you assume that it'll never
take the insert path with your not NULL constraint example? But if you
know that for sure, why not just use a regular update statement? On
the other hand, if you're not sure if the insert path can be taken,
then why is it actually helpful to not just throw an error at the
earliest opportunity?

Surely upsert means "update or insert", so why wouldn't the user expect
to see an error like this, independent of the specifics of the row in question?
Isn't the user tacitly saying "I don't specifically know if the update or insert
path will be taken in respect of any given row" by using ON CONFLICT
DO UPDATE in the first place?

> That being said, I have never had a need for the ON CONFLICT DO UPDATE statement other than where I need upsert.
> But using it as "upsert" is only valid when the table is absent of any NOT NULL constraint on it's non primary key
columns.

I don't know what you mean by that. "Valid"?

> The MERGE command introduced in PG15 in theory can be used to do UPSERT properly that is void of the aforementioned
limitation.
> The downside is it is rather verbose.

The MERGE command has various race conditions that are particularly
relevant to UPSERT type use cases. See the wiki page you referenced
for a huge amount of information on this.

> *Feature Request*
> Given that UPSERT is an *idempotent* operator it is extremely useful.

In general UPSERT (or any definition of it that I can think of) does
not imply idempotency.


--
Peter Geoghegan
--
Message  protected by MailGuard: e-mail anti-virus, anti-spam and content filtering.https://www.mailguard.com.au/mg
Click here to report this message as spam:
https://console.mailguard.com.au/ras/259w6P5THe/6Ld9hKnxHU77IFkjdp0Xsh/0.1

В списке pgsql-general по дате отправления:

Предыдущее
От: Atul Kumar
Дата:
Сообщение: Backup schema without data
Следующее
От: Michael Loftis
Дата:
Сообщение: Re: Backup schema without data