Обсуждение: UPSERT in Postgres

Поиск

Список

Период

Сортировка

UPSERT in Postgres

От

Louis Tian

Дата:

06 апреля 2023 г., 06:21:34

This is a question/feature request.

Virtually all references on the internet today suggests one can "upsert" in PostgreSQL using the "insert ... on
conflictdo update ..." statement.
But this is not complete true.

The PostgreSQL's own wiki page (https://wiki.postgresql.org/wiki/UPSERT#.22UPSERT.22_definition) defines UPSERT as
"UPSERT" is a DBMS feature that allows a DML statement's author to atomically either insert a row, or on the basis of
therow already existing, UPDATE that existing row instead,
I believe this definition is correct and consistent with defintion elsewhere (https://en.wiktionary.org/wiki/upsert).

An implicit assumption behind this definition is that table must have a primary key for the upsert operation to make
sensesince it's the primary key that uniquely identifies a row.
let's say we have a very simple table

```
create table person (
id int primary key,
name text not null,
is_active boolean
)
```

Given the definition of upsert, I'd expect an upsert command to do the following.
- `upsert into person (id, name) values (0, 'foo')` to insert a new row
- `upsert into person (id, is_active) values (0, true)` updates the is_active column for the row inserted above

Naturally, since there isn't a real upsert command in PostgreSQL this won't work today.
But can we achieve the same effect with "INSERT ... ON CONFLICT DO UPDATE" like a lot of references on the internet
seemsto suggest.

insert into person (id, name) values (0, 'foo') on conflict ("id") do update set id=excluded.id, name=excluded.name
insert into person (id, is_active) values (0, true) on conflict ("id") do update set id=excluded.id,
is_active=excluded.is_active

Unfortunately. the second statement will fail due to violation of the not null constraint on the "name" column.
PostgreSQL will always try to insert the row into the table first. and only fallback to update when the uniqueness
constraintis violated.
Is this behavior wrong? maybe not, I think it is doing what it reads quite literally.
That being said, I have never had a need for the ON CONFLICT DO UPDATE statement other than where I need upsert.
But using it as "upsert" is only valid when the table is absent of any NOT NULL constraint on it's non primary key
columns.
So, if my experience/use case is typical (meaning the main purpose / use case for ON CONFLICT DO UPDATE is to support
upsert)then it can be argue the current behavior is incorrect?

This has been a source confusion to say at least.
https://stackoverflow.com/questions/48816629/on-conflict-do-nothing-in-postgres-with-a-not-null-constraint
https://www.postgresql.org/message-id/flat/1996438.1604952901%40sss.pgh.pa.us#ace8adc1354265aca5672028faea0c0f

The MERGE command introduced in PG15 in theory can be used to do UPSERT properly that is void of the aforementioned
limitation.
The downside is it is rather verbose.

*Question*
This there a way to do an upsert proper prior to PG15?

*Feature Request*
Given that UPSERT is an *idempotent* operator it is extremely useful.
Would love to see an UPSERT command in PostgreSQL so one can 'upsert' properly and easily.

Regards,
Louis Tian

Re: UPSERT in Postgres

От

Peter Geoghegan

Дата:

06 апреля 2023 г., 20:58:32

On Thu, Apr 6, 2023 at 1:21 PM Louis Tian <louis.tian@aquamonix.com.au> wrote:
> An implicit assumption behind this definition is that table must have a primary key for the upsert operation to make
sensesince it's the primary key that uniquely identifies a row. 

It could just be a unique index or a unique constraint. So you can
upsert on any individual unique constraint/index, or the primary key.
Of course there might be several on a given table, but you can only
use one as the "conflict arbiter" per statement.

> But can we achieve the same effect with "INSERT ... ON CONFLICT DO UPDATE" like a lot of references on the internet
seemsto suggest. 
>
> insert into person (id, name) values (0, 'foo') on conflict ("id") do update set id=excluded.id, name=excluded.name
> insert into person (id, is_active) values (0, true) on conflict ("id") do update set id=excluded.id,
is_active=excluded.is_active
>
> Unfortunately. the second statement will fail due to violation of the not null constraint on the "name" column.
> PostgreSQL will always try to insert the row into the table first. and only fallback to update when the uniqueness
constraintis violated. 
> Is this behavior wrong? maybe not, I think it is doing what it reads quite literally.

It sort of has to work that way, though. In general your example might
*not* fail, due to a row-level before trigger in the insert path.

Why doesn't your proposed upsert syntax have the same problem? I mean,
how could it not? I guess it doesn't if you assume that it'll never
take the insert path with your not NULL constraint example? But if you
know that for sure, why not just use a regular update statement? On
the other hand, if you're not sure if the insert path can be taken,
then why is it actually helpful to not just throw an error at the
earliest opportunity?

Surely upsert means "update or insert", so why wouldn't the user expect
to see an error like this, independent of the specifics of the row in question?
Isn't the user tacitly saying "I don't specifically know if the update or insert
path will be taken in respect of any given row" by using ON CONFLICT
DO UPDATE in the first place?

> That being said, I have never had a need for the ON CONFLICT DO UPDATE statement other than where I need upsert.
> But using it as "upsert" is only valid when the table is absent of any NOT NULL constraint on it's non primary key
columns.

I don't know what you mean by that. "Valid"?

> The MERGE command introduced in PG15 in theory can be used to do UPSERT properly that is void of the aforementioned
limitation.
> The downside is it is rather verbose.

The MERGE command has various race conditions that are particularly
relevant to UPSERT type use cases. See the wiki page you referenced
for a huge amount of information on this.

> *Feature Request*
> Given that UPSERT is an *idempotent* operator it is extremely useful.

In general UPSERT (or any definition of it that I can think of) does
not imply idempotency.

--
Peter Geoghegan

Re: UPSERT in Postgres

От

Adrian Klaver

Дата:

06 апреля 2023 г., 21:00:56

On 4/5/23 23:21, Louis Tian wrote:
> This is a question/feature request.
> 

> Given the definition of upsert, I'd expect an upsert command to do the following.
> - `upsert into person (id, name) values (0, 'foo')` to insert a new row
> - `upsert into person (id, is_active) values (0, true)` updates the is_active column for the row inserted above
> 
> Naturally, since there isn't a real upsert command in PostgreSQL this won't work today.
> But can we achieve the same effect with "INSERT ... ON CONFLICT DO UPDATE" like a lot of references on the internet
seemsto suggest.
 
> 
> insert into person (id, name) values (0, 'foo') on conflict ("id") do update set id=excluded.id, name=excluded.name
> insert into person (id, is_active) values (0, true) on conflict ("id") do update set id=excluded.id,
is_active=excluded.is_active

insert into person (id, name, is_active) values (0, '', true) on 
conflict ("id") do update set id=excluded.id, name=person.name, 
is_active=excluded.is_active ;
INSERT 0 1

select * from person;
  id | name | is_active
----+------+-----------
   0 | foo  | t

> 
> Unfortunately. the second statement will fail due to violation of the not null constraint on the "name" column.
> PostgreSQL will always try to insert the row into the table first. and only fallback to update when the uniqueness
constraintis violated.
 
> Is this behavior wrong? maybe not, I think it is doing what it reads quite literally.
> That being said, I have never had a need for the ON CONFLICT DO UPDATE statement other than where I need upsert.
> But using it as "upsert" is only valid when the table is absent of any NOT NULL constraint on it's non primary key
columns.
> So, if my experience/use case is typical (meaning the main purpose / use case for ON CONFLICT DO UPDATE is to support
upsert)then it can be argue the current behavior is incorrect?
 
> 
> This has been a source confusion to say at least.
> https://stackoverflow.com/questions/48816629/on-conflict-do-nothing-in-postgres-with-a-not-null-constraint
> https://www.postgresql.org/message-id/flat/1996438.1604952901%40sss.pgh.pa.us#ace8adc1354265aca5672028faea0c0f
> 
> The MERGE command introduced in PG15 in theory can be used to do UPSERT properly that is void of the aforementioned
limitation.
> The downside is it is rather verbose.
> 
> *Question*
> This there a way to do an upsert proper prior to PG15?
> 
> *Feature Request*
> Given that UPSERT is an *idempotent* operator it is extremely useful.
> Would love to see an UPSERT command in PostgreSQL so one can 'upsert' properly and easily.
> 
> 
> Regards,
> Louis Tian
> 
> 
> 
> 
> 
> 
> 
> 
> 

-- 
Adrian Klaver
adrian.klaver@aklaver.com

Re: UPSERT in Postgres

От

Benedict Holland

Дата:

06 апреля 2023 г., 21:42:16

That is the answer. Postgresql can upsert easily via triggers and on conflict.

Thanks,

Ben

On Thu, Apr 6, 2023, 5:01 PM Adrian Klaver <adrian.klaver@aklaver.com> wrote:

On 4/5/23 23:21, Louis Tian wrote:
> This is a question/feature request.
>

> Given the definition of upsert, I'd expect an upsert command to do the following.
> - `upsert into person (id, name) values (0, 'foo')` to insert a new row
> - `upsert into person (id, is_active) values (0, true)` updates the is_active column for the row inserted above
>
> Naturally, since there isn't a real upsert command in PostgreSQL this won't work today.
> But can we achieve the same effect with "INSERT ... ON CONFLICT DO UPDATE" like a lot of references on the internet seems to suggest.
>
> insert into person (id, name) values (0, 'foo') on conflict ("id") do update set id=excluded.id, name=excluded.name
> insert into person (id, is_active) values (0, true) on conflict ("id") do update set id=excluded.id, is_active=excluded.is_active

insert into person (id, name, is_active) values (0, '', true) on
conflict ("id") do update set id=excluded.id, name=person.name,
is_active=excluded.is_active ;
INSERT 0 1

select * from person;
id | name | is_active
----+------+-----------
0 | foo | t

>
> Unfortunately. the second statement will fail due to violation of the not null constraint on the "name" column.
> PostgreSQL will always try to insert the row into the table first. and only fallback to update when the uniqueness constraint is violated.
> Is this behavior wrong? maybe not, I think it is doing what it reads quite literally.
> That being said, I have never had a need for the ON CONFLICT DO UPDATE statement other than where I need upsert.
> But using it as "upsert" is only valid when the table is absent of any NOT NULL constraint on it's non primary key columns.
> So, if my experience/use case is typical (meaning the main purpose / use case for ON CONFLICT DO UPDATE is to support upsert) then it can be argue the current behavior is incorrect?
>
> This has been a source confusion to say at least.
> https://stackoverflow.com/questions/48816629/on-conflict-do-nothing-in-postgres-with-a-not-null-constraint
> https://www.postgresql.org/message-id/flat/1996438.1604952901%40sss.pgh.pa.us#ace8adc1354265aca5672028faea0c0f
>
> The MERGE command introduced in PG15 in theory can be used to do UPSERT properly that is void of the aforementioned limitation.
> The downside is it is rather verbose.
>
> *Question*
> This there a way to do an upsert proper prior to PG15?
>
> *Feature Request*
> Given that UPSERT is an *idempotent* operator it is extremely useful.
> Would love to see an UPSERT command in PostgreSQL so one can 'upsert' properly and easily.
>
>
> Regards,
> Louis Tian
>
>
>
>
>
>
>
>
>

--
Adrian Klaver
adrian.klaver@aklaver.com

Re: [EXTERNAL]: Re: UPSERT in Postgres

От

Louis Tian

Дата:

07 апреля 2023 г., 00:49:35

Hi Peter,

Thanks for your reply. Appreciate the help and discussion.

> In general UPSERT (or any definition of it that I can think of) does
> not imply idempotency.

"Idempotence is the property of certain operations in mathematics and computer science whereby they can be applied
multipletimes without changing the result beyond the initial application." from Wikipedia. 
the concept of Idempotence when applies to HTTP is consistent with the above.
https://developer.mozilla.org/en-US/docs/Glossary/Idempotent.Or are you going by a different defintion that I am not
awareof? 
If you execute the same upsert multiple times, the state of the database remains the same as if only execute once.
If a row already exists, the first statement will update the row so does any subsequent statements. executing the same
updatemultiple time is the same as executing it only once. 
If the row doesn't exist, the first statement will insert that row and any subsequent will try to update, but the
updatehas no real effect since it the value is exactly the same as the insert.  
So by defintion, upsert is idempotent.

> It could just be a unique index or a unique constraint. So you can
> upsert on any individual unique constraint/index, or the primary key.
> Of course there might be several on a given table, but you can only
> use one as the "conflict arbiter" per statement.

Understand that I can use any unique constraint with on conflict. 
But semantically the only correct one is the primary key, since that's what identifies a row logically.
In that sense, any unique column(s) is a potential candidate for primary key.
It's more of a pedantic point rather than pragmatic one.
It's less of a problem for PostgreSQL where the semantic importance of primary key is not manifested at implementation
level,since all index points to the tuple directly 
Whereas it is more import for Databaes like MySQL where the secondary index points to the primary key index.

> It sort of has to work that way, though. In general your example might
> *not* fail, due to a row-level before trigger in the insert path.

Do you mean having the trigger to check whether new."name" is set if not use the old."name" as fallback ?

> Why doesn't your proposed upsert syntax have the same problem? I mean,
> how could it not? I guess it doesn't if you assume that it'll never
> take the insert path with your not NULL constraint example?

Use some pesudo code might be helpful here to explain the difference. 

How on conflict works at the moment.

try {
 insert row
} catch (duplicated key error) {
  update row
}

How I think it upsert should work

if (new.id exists) {
 update row 
} else {
 insert row
}

I would argue that later is a correct form of upsert given it's definition. 
The two are not equivalent when there is not null constraint on any non primary key column.

> But if you know that for sure, why not just use a regular update statement? 
Yes, in general it is not know whether the insert or update path should be taken. 

> On the other hand, if you're not sure if the insert path can be taken,
> then why is it actually helpful to not just throw an error at the
> earliest opportunity?
I am not expecting an error here. The problem is with no conflict it always go down the insert path first and results
ina not null constraint error. 
While I am expecting the insert is never executed in the first place when that row already exist (as identified by it
primarykey). So the update execute without error. 
I hope the pesudo code above is enough to clarify the difference?

> The MERGE command has various race conditions that are particularly
> relevant to UPSERT type use cases. See the wiki page you referenced
> for a huge amount of information on this.
Thanks for the pointer. Reading into it.

Cheers,
Louis Tian

From: Peter Geoghegan <pg@bowt.ie>
Sent: Friday, April 7, 2023 6:58 AM
To: Louis Tian <louis.tian@aquamonix.com.au>
Cc: pgsql-general@lists.postgresql.org <pgsql-general@lists.postgresql.org>
Subject: [EXTERNAL]: Re: UPSERT in Postgres

CAUTION: This email originated from outside of Envirada. Do not click links or open attachments unless you recognize
thesender and know the content is safe. 

On Thu, Apr 6, 2023 at 1:21 PM Louis Tian <louis.tian@aquamonix.com.au> wrote:
> An implicit assumption behind this definition is that table must have a primary key for the upsert operation to make
sensesince it's the primary key that uniquely identifies a row. 

It could just be a unique index or a unique constraint. So you can
upsert on any individual unique constraint/index, or the primary key.
Of course there might be several on a given table, but you can only
use one as the "conflict arbiter" per statement.

> But can we achieve the same effect with "INSERT ... ON CONFLICT DO UPDATE" like a lot of references on the internet
seemsto suggest. 
>
> insert into person (id, name) values (0, 'foo') on conflict ("id") do update set id=excluded.id, name=excluded.name
> insert into person (id, is_active) values (0, true) on conflict ("id") do update set id=excluded.id,
is_active=excluded.is_active
>
> Unfortunately. the second statement will fail due to violation of the not null constraint on the "name" column.
> PostgreSQL will always try to insert the row into the table first. and only fallback to update when the uniqueness
constraintis violated. 
> Is this behavior wrong? maybe not, I think it is doing what it reads quite literally.

It sort of has to work that way, though. In general your example might
*not* fail, due to a row-level before trigger in the insert path.

Why doesn't your proposed upsert syntax have the same problem? I mean,
how could it not? I guess it doesn't if you assume that it'll never
take the insert path with your not NULL constraint example? But if you
know that for sure, why not just use a regular update statement? On
the other hand, if you're not sure if the insert path can be taken,
then why is it actually helpful to not just throw an error at the
earliest opportunity?

Surely upsert means "update or insert", so why wouldn't the user expect
to see an error like this, independent of the specifics of the row in question?
Isn't the user tacitly saying "I don't specifically know if the update or insert
path will be taken in respect of any given row" by using ON CONFLICT
DO UPDATE in the first place?

> That being said, I have never had a need for the ON CONFLICT DO UPDATE statement other than where I need upsert.
> But using it as "upsert" is only valid when the table is absent of any NOT NULL constraint on it's non primary key
columns.

I don't know what you mean by that. "Valid"?

> The MERGE command introduced in PG15 in theory can be used to do UPSERT properly that is void of the aforementioned
limitation.
> The downside is it is rather verbose.

The MERGE command has various race conditions that are particularly
relevant to UPSERT type use cases. See the wiki page you referenced
for a huge amount of information on this.

> *Feature Request*
> Given that UPSERT is an *idempotent* operator it is extremely useful.

In general UPSERT (or any definition of it that I can think of) does
not imply idempotency.

--
Peter Geoghegan
--
Message  protected by MailGuard: e-mail anti-virus, anti-spam and content filtering.https://www.mailguard.com.au/mg
Click here to report this message as spam:
https://console.mailguard.com.au/ras/259w6P5THe/6Ld9hKnxHU77IFkjdp0Xsh/0.1

Re: [EXTERNAL]: Re: UPSERT in Postgres

От

Louis Tian

Дата:

07 апреля 2023 г., 01:27:59

Hi Adrian,

Thank you. I think this is a better approach than trigger-based solution, at least for my taste.

That being said, it does require some logic to push to the client side (figuring out which required column value is missing and set it value to the existing one via reference of the table name).

Still wish there would be UPSERT statement that can handle this and make dev experience better.

Cheers,

Louis Tian

From: Adrian Klaver <adrian.klaver@aklaver.com>
Sent: Friday, April 7, 2023 7:00 AM
To: Louis Tian <louis.tian@aquamonix.com.au>; pgsql-general@lists.postgresql.org <pgsql-general@lists.postgresql.org>
Subject: [EXTERNAL]: Re: UPSERT in Postgres

CAUTION: This email originated from outside of Envirada. Do not click links or open attachments unless you recognize the sender and know the content is safe.

On 4/5/23 23:21, Louis Tian wrote:
> This is a question/feature request.
>

> Given the definition of upsert, I'd expect an upsert command to do the following.
> - `upsert into person (id, name) values (0, 'foo')` to insert a new row
> - `upsert into person (id, is_active) values (0, true)` updates the is_active column for the row inserted above
>
> Naturally, since there isn't a real upsert command in PostgreSQL this won't work today.
> But can we achieve the same effect with "INSERT ... ON CONFLICT DO UPDATE" like a lot of references on the internet seems to suggest.
>
> insert into person (id, name) values (0, 'foo') on conflict ("id") do update set id=excluded.id, name=excluded.name
> insert into person (id, is_active) values (0, true) on conflict ("id") do update set id=excluded.id, is_active=excluded.is_active

insert into person (id, name, is_active) values (0, '', true) on
conflict ("id") do update set id=excluded.id, name=person.name,
is_active=excluded.is_active ;
INSERT 0 1

select * from person;
id | name | is_active
----+------+-----------
0 | foo | t

>
> Unfortunately. the second statement will fail due to violation of the not null constraint on the "name" column.
> PostgreSQL will always try to insert the row into the table first. and only fallback to update when the uniqueness constraint is violated.
> Is this behavior wrong? maybe not, I think it is doing what it reads quite literally.
> That being said, I have never had a need for the ON CONFLICT DO UPDATE statement other than where I need upsert.
> But using it as "upsert" is only valid when the table is absent of any NOT NULL constraint on it's non primary key columns.
> So, if my experience/use case is typical (meaning the main purpose / use case for ON CONFLICT DO UPDATE is to support upsert) then it can be argue the current behavior is incorrect?
>
> This has been a source confusion to say at least.
> https://stackoverflow.com/questions/48816629/on-conflict-do-nothing-in-postgres-with-a-not-null-constraint
> https://www.postgresql.org/message-id/flat/1996438.1604952901%40sss.pgh.pa.us#ace8adc1354265aca5672028faea0c0f
>
> The MERGE command introduced in PG15 in theory can be used to do UPSERT properly that is void of the aforementioned limitation.
> The downside is it is rather verbose.
>
> *Question*
> This there a way to do an upsert proper prior to PG15?
>
> *Feature Request*
> Given that UPSERT is an *idempotent* operator it is extremely useful.
> Would love to see an UPSERT command in PostgreSQL so one can 'upsert' properly and easily.
>
>
> Regards,
> Louis Tian
>
>
>
>
>
>
>
>
>

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Message protected by MailGuard: e-mail anti-virus, anti-spam and content filtering.https://www.mailguard.com.au/mg
Click here to report this message as spam:
https://console.mailguard.com.au/ras/259w94L3yw/5Vb4QHWbRwfP3KeOMAvzU8/-2.1

Re: [EXTERNAL]: Re: UPSERT in Postgres

От

Adrian Klaver

Дата:

07 апреля 2023 г., 20:43:36

On 4/6/23 18:27, Louis Tian wrote:
> Hi Adrian,
> 
> Thank you. I think this is a better approach than trigger-based 
> solution, at least for my taste.
> That being said, it does require some logic to push to the client side 
> (figuring out which required column value is missing and set it value to 
> the existing one via reference of the table name).
> Still wish there would be UPSERT statement that can handle this and make 
> dev experience better.

It does what is advertised on the tin:

https://www.postgresql.org/docs/current/sql-insert.html

The optional ON CONFLICT clause specifies an alternative action to 
raising a unique violation or exclusion constraint violation error

[...]

ON CONFLICT DO UPDATE guarantees an atomic INSERT or UPDATE outcome; 
provided there is no independent error, one of those two outcomes is 
guaranteed, even under high concurrency. This is also known as UPSERT — 
“UPDATE or INSERT”.

You got caught by the '...independent error...' part. The same thing 
would have happened if you had just done:

insert into person (id, is_active) values(0, true);
ERROR:  null value in column "name" of relation "person" violates 
not-null constraint

The insert has to be valid on its own before you get to the 'alternative 
action to raising a unique violation or exclusion constraint violation 
error' part. Otherwise you are asking Postgres to override this 'insert 
into person (id, is_active)' and guess you really wanted something like:

insert into person (id, name, is_active) values(0, <existing value>, true)

I'm would not like the server making those guesses on my behalf.

> ,
> Cheers,
> Louis Tian
> ------------------------------------------------------------------------

-- 
Adrian Klaver
adrian.klaver@aklaver.com

Re: [EXTERNAL]: Re: UPSERT in Postgres

От

Adrian Klaver

Дата:

07 апреля 2023 г., 20:52:17

On 4/6/23 18:27, Louis Tian wrote:
> Hi Adrian,
> 
> Thank you. I think this is a better approach than trigger-based 
> solution, at least for my taste.
> That being said, it does require some logic to push to the client side 
> (figuring out which required column value is missing and set it value to 
> the existing one via reference of the table name).
> Still wish there would be UPSERT statement that can handle this and make 
> dev experience better.

Another way to make the experience easier:

alter table person alter COLUMN name set default 'cat';

  \d person
                   Table "public.person"
   Column   |  Type   | Collation | Nullable |   Default
-----------+---------+-----------+----------+-------------
  id        | integer |           | not null |
  name      | text    |           | not null | 'cat'::text
  is_active | boolean |           |          |

select * from person;
  id | name | is_active
----+------+-----------
   0 | foo  | NULL


insert into person(id, is_active) values (0,  true) on conflict ("id") 
do update set (id, is_active) = (excluded.id,  excluded.is_active);
INSERT 0 1

select * from person;
  id | name | is_active
----+------+-----------
   0 | foo  | t


> 
> Cheers,
> Louis Tian

-- 
Adrian Klaver
adrian.klaver@aklaver.com

Re: [EXTERNAL]: Re: UPSERT in Postgres

От

Israel Brewster

Дата:

08 апреля 2023 г., 17:09:39

> On Apr 6, 2023, at 4:49 PM, Louis Tian <louis.tian@aquamonix.com.au> wrote:
>
> Hi Peter,
>
> Thanks for your reply. Appreciate the help and discussion.
>
>>  In general UPSERT (or any definition of it that I can think of) does
>> not imply idempotency.
>
> "Idempotence is the property of certain operations in mathematics and computer science whereby they can be applied
multipletimes without changing the result beyond the initial application." from Wikipedia. 
> the concept of Idempotence when applies to HTTP is consistent with the above.
https://developer.mozilla.org/en-US/docs/Glossary/Idempotent.Or are you going by a different defintion that I am not
awareof? 
> If you execute the same upsert multiple times, the state of the database remains the same as if only execute once.

Not necessarily. Consider the following UPSERT statement:

INSERT INTO customer_order_counts (customer_id,order_count) VALUES (123456, 1) ON CONFLICT (customer_id) DO UPDATE SET
order_count=customer_order_counts.order_count+1;

That is completely valid (I tested it), and actually makes sense as something you might want to do - keep track of how
manyorders a customer has placed, for example if you only keep the order records for 6 months, but still want to know
thetotal number of orders the customer has placed. If it is a new customer, you insert a record for the customer with
anorder count of 1. Otherwise, you update the record to increment the order count. Clearly this is NOT an idempotent
operation- every time you run it, it changes the order count, so the state of the database does NOT remain the same as
ifyou only execute it once. 

> If a row already exists, the first statement will update the row so does any subsequent statements. executing the
sameupdate multiple time is the same as executing it only once. 
> If the row doesn't exist, the first statement will insert that row and any subsequent will try to update, but the
updatehas no real effect since it the value is exactly the same as the insert.  
> So by defintion, upsert is idempotent.
>
>> It could just be a unique index or a unique constraint. So you can
>> upsert on any individual unique constraint/index, or the primary key.
>> Of course there might be several on a given table, but you can only
>> use one as the "conflict arbiter" per statement.
>
> Understand that I can use any unique constraint with on conflict.
> But semantically the only correct one is the primary key, since that's what identifies a row logically.

Not really. Generally when I am doing an UPSERT, I am NOT using the primary key, but rather some other UNIQUE-ly
indexedcolumn(s). My primary key is typically an ID column that is defined as a serial, and automatically generated by
thedatabase. The unique column I use for the upset, however, is generally something that would identify the row to a
human- such as Station ID and timestamp  columns in a database I have of seismic readings. Each reading gets a unique
ID(the primary key) that identifies it to the database and is used in joins. However, occasionally a reading gets
updated,so I update the row, based not on the primary key (which I don’t know for incoming data), but on the station
andtimestamp. UPSERT, but not based on the primary key. 

---
Israel Brewster
Software Engineer
Alaska Volcano Observatory
Geophysical Institute - UAF
2156 Koyukuk Drive
Fairbanks AK 99775-7320
Work: 907-474-5172
cell:  907-328-9145

> In that sense, any unique column(s) is a potential candidate for primary key.
> It's more of a pedantic point rather than pragmatic one.
> It's less of a problem for PostgreSQL where the semantic importance of primary key is not manifested at
implementationlevel, since all index points to the tuple directly 
> Whereas it is more import for Databaes like MySQL where the secondary index points to the primary key index.

Re: [EXTERNAL]: Re: UPSERT in Postgres

От

Adrian Klaver

Дата:

08 апреля 2023 г., 21:51:08

On 4/6/23 17:49, Louis Tian wrote:
> Hi Peter,
> 
> Thanks for your reply. Appreciate the help and discussion.
> 
>>   In general UPSERT (or any definition of it that I can think of) does
>> not imply idempotency.
> 
> "Idempotence is the property of certain operations in mathematics and computer science whereby they can be applied
multipletimes without changing the result beyond the initial application." from Wikipedia.
 
> the concept of Idempotence when applies to HTTP is consistent with the above.
https://developer.mozilla.org/en-US/docs/Glossary/Idempotent.Or are you going by a different defintion that I am not
awareof?
 
> If you execute the same upsert multiple times, the state of the database remains the same as if only execute once.
> If a row already exists, the first statement will update the row so does any subsequent statements. executing the
sameupdate multiple time is the same as executing it only once.
 
> If the row doesn't exist, the first statement will insert that row and any subsequent will try to update, but the
updatehas no real effect since it the value is exactly the same as the insert.
 
> So by defintion, upsert is idempotent.

No it is not as Israel Brewster pointed out.

To his example I would add:

alter some_table add column ts_upsert_update timestamptz;

insert into some_table values('foo', 'bar') on conflict(tbl_id) do 
update set foo_fld = excluded.foo_fld, bar_fld = some_table.bar_fld, 
ts_upsert_update = now();

You are substituting whatever definition you have in your head for the 
definition as it actually exists.

> 
>> It could just be a unique index or a unique constraint. So you can
>> upsert on any individual unique constraint/index, or the primary key.
>> Of course there might be several on a given table, but you can only
>> use one as the "conflict arbiter" per statement.
> 
> Understand that I can use any unique constraint with on conflict.
> But semantically the only correct one is the primary key, since that's what identifies a row logically.
> In that sense, any unique column(s) is a potential candidate for primary key.
> It's more of a pedantic point rather than pragmatic one.
> It's less of a problem for PostgreSQL where the semantic importance of primary key is not manifested at
implementationlevel, since all index points to the tuple directly
 
> Whereas it is more import for Databaes like MySQL where the secondary index points to the primary key index.

Again you are dealing with the imagined instead of the reality. Along 
that line you left out that a 'exclusion constraint violation error' can 
also trigger the ON CONFLICT.


> Use some pesudo code might be helpful here to explain the difference.
> 
> How on conflict works at the moment.
> 
> try {
>   insert row
> } catch (duplicated key error) {
>    update row
> }
> 
> How I think it upsert should work

And therein lies your problem, you are imagining something that does not 
exist and more to the point will most likely not exist as it would break 
all code that depends on above behavior.

> 
> if (new.id exists) {
>   update row
> } else {
>   insert row
> }
> 

> I am not expecting an error here. The problem is with no conflict it always go down the insert path first and results
ina not null constraint error.
 

To be expected, as after all the command is:

INSERT INTO <some_table> ...

> While I am expecting the insert is never executed in the first place when that row already exist (as identified by it
primarykey). So the update execute without error.
 
> I hope the pesudo code above is enough to clarify the difference?
> 

> Cheers,
> Louis Tian
> 
> 

-- 
Adrian Klaver
adrian.klaver@aklaver.com

Re: [EXTERNAL]: Re: UPSERT in Postgres

От

Benedict Holland

Дата:

08 апреля 2023 г., 22:05:12

This went on too long. On conflict is the solution. It has been since at least 9. I have run that in a production stored proc without a single problem.

This is an actual and literal solution.

Thanks,

Ben

On Sat, Apr 8, 2023, 5:51 PM Adrian Klaver <adrian.klaver@aklaver.com> wrote:

On 4/6/23 17:49, Louis Tian wrote:
> Hi Peter,
>
> Thanks for your reply. Appreciate the help and discussion.
>
>> In general UPSERT (or any definition of it that I can think of) does
>> not imply idempotency.
>
> "Idempotence is the property of certain operations in mathematics and computer science whereby they can be applied multiple times without changing the result beyond the initial application." from Wikipedia.
> the concept of Idempotence when applies to HTTP is consistent with the above. https://developer.mozilla.org/en-US/docs/Glossary/Idempotent. Or are you going by a different defintion that I am not aware of?
> If you execute the same upsert multiple times, the state of the database remains the same as if only execute once.
> If a row already exists, the first statement will update the row so does any subsequent statements. executing the same update multiple time is the same as executing it only once.
> If the row doesn't exist, the first statement will insert that row and any subsequent will try to update, but the update has no real effect since it the value is exactly the same as the insert.
> So by defintion, upsert is idempotent.

No it is not as Israel Brewster pointed out.

To his example I would add:

alter some_table add column ts_upsert_update timestamptz;

insert into some_table values('foo', 'bar') on conflict(tbl_id) do
update set foo_fld = excluded.foo_fld, bar_fld = some_table.bar_fld,
ts_upsert_update = now();

You are substituting whatever definition you have in your head for the
definition as it actually exists.

>
>> It could just be a unique index or a unique constraint. So you can
>> upsert on any individual unique constraint/index, or the primary key.
>> Of course there might be several on a given table, but you can only
>> use one as the "conflict arbiter" per statement.
>
> Understand that I can use any unique constraint with on conflict.
> But semantically the only correct one is the primary key, since that's what identifies a row logically.
> In that sense, any unique column(s) is a potential candidate for primary key.
> It's more of a pedantic point rather than pragmatic one.
> It's less of a problem for PostgreSQL where the semantic importance of primary key is not manifested at implementation level, since all index points to the tuple directly
> Whereas it is more import for Databaes like MySQL where the secondary index points to the primary key index.

Again you are dealing with the imagined instead of the reality. Along
that line you left out that a 'exclusion constraint violation error' can
also trigger the ON CONFLICT.

> Use some pesudo code might be helpful here to explain the difference.
>
> How on conflict works at the moment.
>
> try {
> insert row
> } catch (duplicated key error) {
> update row
> }
>
> How I think it upsert should work

And therein lies your problem, you are imagining something that does not
exist and more to the point will most likely not exist as it would break
all code that depends on above behavior.

>
> if (new.id exists) {
> update row
> } else {
> insert row
> }
>

> I am not expecting an error here. The problem is with no conflict it always go down the insert path first and results in a not null constraint error.

To be expected, as after all the command is:

INSERT INTO <some_table> ...

> While I am expecting the insert is never executed in the first place when that row already exist (as identified by it primary key). So the update execute without error.
> I hope the pesudo code above is enough to clarify the difference?
>

> Cheers,
> Louis Tian
>
>

--
Adrian Klaver
adrian.klaver@aklaver.com

Re: [EXTERNAL]: Re: [EXTERNAL]: Re: UPSERT in Postgres

От

Louis Tian

Дата:

09 апреля 2023 г., 01:42:21

From: Israel Brewster <ijbrewster@alaska.edu>
Sent: Sunday, April 9, 2023 3:09 AM
To: Louis Tian <louis.tian@aquamonix.com.au>
Cc: Peter Geoghegan <pg@bowt.ie>; pgsql-general@lists.postgresql.org <pgsql-general@lists.postgresql.org>
Subject: [EXTERNAL]: Re: [EXTERNAL]: Re: UPSERT in Postgres

Thanks Israel. Your example really helped me to understand where we differ.

> Not necessarily. Consider the following UPSERT statement:

> INSERT INTO customer_order_counts (customer_id,order_count) VALUES (123456, 1) ON CONFLICT (customer_id) DO UPDATE SET order_count=customer_order_counts.order_count+1;

Yes, you are 100% right that is not idempotent statement. But that's not because UPSERT is idempotent *operation*. I am making a distinction between an operation and statement here.

This statement is not idempotent is not because UPSERT operation is not idempotent, it is because the value being upsert is different in the statement each time you run that statement.

> Not really. Generally when I am doing an UPSERT, I am NOT using the primary key, but rather some other UNIQUE-ly indexed column(s). My primary key is typically an ID column that is defined as a serial, > and automatically generated by the database. The unique column I use for the upset, however, is generally something that would identify the row to a human - such as Station ID and timestamp

> columns in a database I have of seismic readings. Each reading gets a unique ID (the primary key) that identifies it to the database and is used in joins. However, occasionally a reading gets updated, so I > update the row, based not on the primary key (which I don’t know for incoming data), but on the station and timestamp. UPSERT, but not based on the primary key.

I am using "primary key" to refer to the "row identifier" in a conceptual way.

Definition from Wiki, "In the relational model of databases, a primary key is a specific choice of a minimal set of attributes (columns) that uniquely specify a tuple (row) in a relation (table)."

So, it's not necessarily the primary key you declared on the table (for example, the id column in your example).

To put it in another way, the Station ID and timestamp columns work in the capability of a primary key.

I was merely setting a scene in my orginal email. What I was trying to say is "UPSERT only makes sense when each row is uniquely identifiable".

So please don't get me wrong there, I am not arguing that being able to do on conflict on any unique index is wrong or bad.

Cheers,

Louis Tian

> On Apr 6, 2023, at 4:49 PM, Louis Tian <louis.tian@aquamonix.com.au> wrote:
>
> Hi Peter,
>
> Thanks for your reply. Appreciate the help and discussion.
>
>> In general UPSERT (or any definition of it that I can think of) does
>> not imply idempotency.
>
> "Idempotence is the property of certain operations in mathematics and computer science whereby they can be applied multiple times without changing the result beyond the initial application." from Wikipedia.
> the concept of Idempotence when applies to HTTP is consistent with the above. https://developer.mozilla.org/en-US/docs/Glossary/Idempotent. Or are you going by a different defintion that I am not aware of?
> If you execute the same upsert multiple times, the state of the database remains the same as if only execute once.

Not necessarily. Consider the following UPSERT statement:

INSERT INTO customer_order_counts (customer_id,order_count) VALUES (123456, 1) ON CONFLICT (customer_id) DO UPDATE SET order_count=customer_order_counts.order_count+1;

That is completely valid (I tested it), and actually makes sense as something you might want to do - keep track of how many orders a customer has placed, for example if you only keep the order records for 6 months, but still want to know the total number of orders the customer has placed. If it is a new customer, you insert a record for the customer with an order count of 1. Otherwise, you update the record to increment the order count. Clearly this is NOT an idempotent operation - every time you run it, it changes the order count, so the state of the database does NOT remain the same as if you only execute it once.

> If a row already exists, the first statement will update the row so does any subsequent statements. executing the same update multiple time is the same as executing it only once.
> If the row doesn't exist, the first statement will insert that row and any subsequent will try to update, but the update has no real effect since it the value is exactly the same as the insert.
> So by defintion, upsert is idempotent.
>
>> It could just be a unique index or a unique constraint. So you can
>> upsert on any individual unique constraint/index, or the primary key.
>> Of course there might be several on a given table, but you can only
>> use one as the "conflict arbiter" per statement.
>
> Understand that I can use any unique constraint with on conflict.
> But semantically the only correct one is the primary key, since that's what identifies a row logically.

Not really. Generally when I am doing an UPSERT, I am NOT using the primary key, but rather some other UNIQUE-ly indexed column(s). My primary key is typically an ID column that is defined as a serial, and automatically generated by the database. The unique column I use for the upset, however, is generally something that would identify the row to a human - such as Station ID and timestamp columns in a database I have of seismic readings. Each reading gets a unique ID (the primary key) that identifies it to the database and is used in joins. However, occasionally a reading gets updated, so I update the row, based not on the primary key (which I don’t know for incoming data), but on the station and timestamp. UPSERT, but not based on the primary key.

---
Israel Brewster
Software Engineer
Alaska Volcano Observatory
Geophysical Institute - UAF
2156 Koyukuk Drive
Fairbanks AK 99775-7320
Work: 907-474-5172
cell: 907-328-9145

> In that sense, any unique column(s) is a potential candidate for primary key.
> It's more of a pedantic point rather than pragmatic one.
> It's less of a problem for PostgreSQL where the semantic importance of primary key is not manifested at implementation level, since all index points to the tuple directly
> Whereas it is more import for Databaes like MySQL where the secondary index points to the primary key index.
--
Message protected by MailGuard: e-mail anti-virus, anti-spam and content filtering.https://www.mailguard.com.au/mg
Click here to report this message as spam:
https://console.mailguard.com.au/ras/25ah5yO8hL/7ACQJux9EWGExcgeAdX4X1/5.6

Re: [EXTERNAL]: Re: [EXTERNAL]: Re: UPSERT in Postgres

От

Louis Tian

Дата:

09 апреля 2023 г., 02:46:33

Hi Adrian, 

> No it is not as Israel Brewster pointed out.
I think we need make a distinction between an operation and a statement here
The examples Israel presented and yours are both non-idempotent statements because the different value being upserted
eachtime the statement is executed. Not because upsert as an operation is not idempotent.  
Likewise, In HTTP, PUT method is defined as idempotent. No one would expect different PUT requests with different
payloadto be idempotent.  

Also, I don't think I am not the only one who think upsert is logically idempotent. Just a few seconds of searching on
google.
- "The upsert action is a combination of insert and update. It allows us to write idempotent statements".
(https://jonmeyers.io/blog/use-on-conflict-to-upsert-in-postgresql.) 
-"One common example of an idempotent operation is an upsert (update or insert) statement in a database."
(https://www.arecadata.com/core-data-engineering-concepts-idempotency)
- "A POST request means that the request is not idempotent, but an "upsert" request is idempotent."
(https://softwareengineering.stackexchange.com/questions/426225/rest-how-to-upsert-a-resource-without-an-specific-resource-url#:~:text=A%20POST%20request%20means%20that,%22upsert%22%20request%20is%20idempotent.)
While no as direct as the article above, but the issue reported in those places are essentially stem from the same
understanding. 
- https://stackoverflow.com/questions/48816629/on-conflict-do-nothing-in-postgres-with-a-not-null-constraint
- https://www.postgresql.org/message-id/flat/1996438.1604952901%40sss.pgh.pa.us#ace8adc1354265aca5672028faea0c0f
So at least I am not only one imaging things?
- "a ready made statement for idempotent data like Mongo's upsert"
(https://www.compose.com/articles/the-potency-of-idempotent-with-rabbitmq-and-mongodb-upsert/)

> And therein lies your problem, you are imagining something that does not
> exist and more to the point will most likely not exist as it would break all code that depends on above behavior.
I never argue the way postgresql does for insert ... on conflict is wrong and need to change.
I said "I think it is doing what it reads quite literally" just like in your words "It does what is advertised on the
tin".So there is no disagreement here. So I am not saying "insert on conflict" need to change.  
What I am point out here is, the "insert...on conflict do update" is not a true "upsert".
I am saying in my opinion it would be nice see UPSERT statement to be added to PostgreSQL so upsert can work "out of
box"even when there're not null constraints on that table. 

It might never happen judging from the replies I am getting but that's fine. The solution you have showed me is good
enoughas a workaround (to not depreciate it's value, just not as easy/bullet proof as it could be in my opinion).  
It was just a question just to confirm my understanding so I got what I need, so thank you all for that.


Cheers,
Louis Tian


From: Adrian Klaver <adrian.klaver@aklaver.com>
Sent: Sunday, April 9, 2023 7:51 AM
To: Louis Tian <louis.tian@aquamonix.com.au>; Peter Geoghegan <pg@bowt.ie>
Cc: pgsql-general@lists.postgresql.org <pgsql-general@lists.postgresql.org>
Subject: [EXTERNAL]: Re: [EXTERNAL]: Re: UPSERT in Postgres
 
CAUTION: This email originated from outside of Envirada. Do not click links or open attachments unless you recognize
thesender and know the content is safe. 


On 4/6/23 17:49, Louis Tian wrote:
> Hi Peter,
>
> Thanks for your reply. Appreciate the help and discussion.
>
>>   In general UPSERT (or any definition of it that I can think of) does
>> not imply idempotency.
>
> "Idempotence is the property of certain operations in mathematics and computer science whereby they can be applied
multipletimes without changing the result beyond the initial application." from Wikipedia. 
> the concept of Idempotence when applies to HTTP is consistent with the above.
https://developer.mozilla.org/en-US/docs/Glossary/Idempotent.Or are you going by a different defintion that I am not
awareof? 
> If you execute the same upsert multiple times, the state of the database remains the same as if only execute once.
> If a row already exists, the first statement will update the row so does any subsequent statements. executing the
sameupdate multiple time is the same as executing it only once. 
> If the row doesn't exist, the first statement will insert that row and any subsequent will try to update, but the
updatehas no real effect since it the value is exactly the same as the insert. 
> So by defintion, upsert is idempotent.

No it is not as Israel Brewster pointed out.

To his example I would add:

alter some_table add column ts_upsert_update timestamptz;

insert into some_table values('foo', 'bar') on conflict(tbl_id) do
update set foo_fld = excluded.foo_fld, bar_fld = some_table.bar_fld,
ts_upsert_update = now();

You are substituting whatever definition you have in your head for the
definition as it actually exists.

>
>> It could just be a unique index or a unique constraint. So you can
>> upsert on any individual unique constraint/index, or the primary key.
>> Of course there might be several on a given table, but you can only
>> use one as the "conflict arbiter" per statement.
>
> Understand that I can use any unique constraint with on conflict.
> But semantically the only correct one is the primary key, since that's what identifies a row logically.
> In that sense, any unique column(s) is a potential candidate for primary key.
> It's more of a pedantic point rather than pragmatic one.
> It's less of a problem for PostgreSQL where the semantic importance of primary key is not manifested at
implementationlevel, since all index points to the tuple directly 
> Whereas it is more import for Databaes like MySQL where the secondary index points to the primary key index.

Again you are dealing with the imagined instead of the reality. Along
that line you left out that a 'exclusion constraint violation error' can
also trigger the ON CONFLICT.


> Use some pesudo code might be helpful here to explain the difference.
>
> How on conflict works at the moment.
>
> try {
>   insert row
> } catch (duplicated key error) {
>    update row
> }
>
> How I think it upsert should work

And therein lies your problem, you are imagining something that does not
exist and more to the point will most likely not exist as it would break
all code that depends on above behavior.

>
> if (new.id exists) {
>   update row
> } else {
>   insert row
> }
>

> I am not expecting an error here. The problem is with no conflict it always go down the insert path first and results
ina not null constraint error. 

To be expected, as after all the command is:

INSERT INTO <some_table> ...

> While I am expecting the insert is never executed in the first place when that row already exist (as identified by it
primarykey). So the update execute without error. 
> I hope the pesudo code above is enough to clarify the difference?
>

> Cheers,
> Louis Tian
>
>

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Message  protected by MailGuard: e-mail anti-virus, anti-spam and content filtering.https://www.mailguard.com.au/mg
Click here to report this message as spam:
https://console.mailguard.com.au/ras/25am4N38I1/7rIyiJl23oANcTXYht5xW0/0

Re: [EXTERNAL]: Re: UPSERT in Postgres

От

Alban Hertroys

Дата:

09 апреля 2023 г., 09:26:16

> On 7 Apr 2023, at 2:49, Louis Tian <louis.tian@aquamonix.com.au> wrote:

(…)

> I am not expecting an error here. The problem is with no conflict it always go down the insert path first and results
ina not null constraint error. 
> While I am expecting the insert is never executed in the first place when that row already exist (as identified by it
primarykey). So the update execute without error. 
> I hope the pesudo code above is enough to clarify the difference?

Your assumption on what the problem is, is not correct. The problem is not with the conflict resolution, it is with
yourstatement violating a not null constraint. 

It doesn’t matter whether you insert first or update first, either operation is going to violate that constraint.
You’respecifying a NULL value for a column that doesn’t accept that because it has a NOT NULL constraint. That is your
problem.

Alban Hertroys
--
There is always an exception to always.

Re: [EXTERNAL]: Re: UPSERT in Postgres

От

Francisco Olarte

Дата:

09 апреля 2023 г., 09:32:11

(not the OP on idempotency)

On Sat, 8 Apr 2023 at 18:33, Louis Tian <louis.tian@aquamonix.com.au> wrote:
> > In general UPSERT (or any definition of it that I can think of) does
> > not imply idempotency.
> "Idempotence is the property of certain operations in mathematics and computer science whereby they can be applied
multipletimes without changing the result beyond the initial application." from Wikipedia.
 
> the concept of Idempotence when applies to HTTP is consistent with the above.
https://developer.mozilla.org/en-US/docs/Glossary/Idempotent.Or are you going by a different defintion that I am not
awareof?
 
> If you execute the same upsert multiple times, the state of the database remains the same as if only execute once.
> If a row already exists, the first statement will update the row so does any subsequent statements. executing the
sameupdate multiple time is the same as executing it only once.
 
> If the row doesn't exist, the first statement will insert that row and any subsequent will try to update, but the
updatehas no real effect since it the value is exactly the same as the insert.
 
> So by defintion, upsert is idempotent.

Only on a narrow definition of upsert.

You are thinking on a narrow ( but very frequent ) use of "upsert"
statements, something like:

    insert on users(id,name) values (1,'x') on conflict(id) update set name='x'

But upsert can be used for things like:

   insert into last_access(id,cuando) values (1,current_timestamp) on
conflict(id) set cuando=current_timestamp

   insert into access_count(id, access_count) values (1,1) on
conflict(id) set access_count=access_count+1

Which are not idempotent ( and also frequent, I use both variants )

Francisco Olarte.

RE: [EXTERNAL]: Re: [EXTERNAL]: Re: UPSERT in Postgres

От

Louis Tian

Дата:

10 апреля 2023 г., 01:05:30

Hi Francisco, 

I think we need to make a distinction between an "operation" and a "statement".
The concept of idempotency applies to an "operation" not an entire statement. 
Like how HTTP "PUT" method is defined as "idempotent", you don't say actual HTTP PUT request is idempotent. 
With the "current_datetime" and "access_count+1", you are effectively changing the value passing to the UPSERT
operator.
 
Just like how you changed the payload of a PUT, then obviously there is no reason to expect the state of the database
toremain the same. 
 

Cheers,
Louis Tian


-----Original Message-----
From: Francisco Olarte <folarte@peoplecall.com> 
Sent: Sunday, April 9, 2023 7:32 PM
To: Louis Tian <louis.tian@aquamonix.com.au>
Cc: Peter Geoghegan <pg@bowt.ie>; pgsql-general@lists.postgresql.org
Subject: [EXTERNAL]: Re: [EXTERNAL]: Re: UPSERT in Postgres

CAUTION: This email originated from outside of Envirada. Do not click links or open attachments unless you recognize
thesender and know the content is safe.
 


(not the OP on idempotency)

On Sat, 8 Apr 2023 at 18:33, Louis Tian <louis.tian@aquamonix.com.au> wrote:
> > In general UPSERT (or any definition of it that I can think of) does 
> > not imply idempotency.
> "Idempotence is the property of certain operations in mathematics and computer science whereby they can be applied
multipletimes without changing the result beyond the initial application." from Wikipedia.
 
> the concept of Idempotence when applies to HTTP is consistent with the above.
https://developer.mozilla.org/en-US/docs/Glossary/Idempotent.Or are you going by a different defintion that I am not
awareof?
 
> If you execute the same upsert multiple times, the state of the database remains the same as if only execute once.
> If a row already exists, the first statement will update the row so does any subsequent statements. executing the
sameupdate multiple time is the same as executing it only once.
 
> If the row doesn't exist, the first statement will insert that row and any subsequent will try to update, but the
updatehas no real effect since it the value is exactly the same as the insert.
 
> So by defintion, upsert is idempotent.

Only on a narrow definition of upsert.

You are thinking on a narrow ( but very frequent ) use of "upsert"
statements, something like:

    insert on users(id,name) values (1,'x') on conflict(id) update set name='x'

But upsert can be used for things like:

   insert into last_access(id,cuando) values (1,current_timestamp) on
conflict(id) set cuando=current_timestamp

   insert into access_count(id, access_count) values (1,1) on
conflict(id) set access_count=access_count+1

Which are not idempotent ( and also frequent, I use both variants )

Francisco Olarte.
--
Message  protected by MailGuard: e-mail anti-virus, anti-spam and content filtering.https://www.mailguard.com.au/mg
Click here to report this message as spam:
https://console.mailguard.com.au/ras/25ayvBFOMd/7rJjrYVDtTx03A1wSEIfeL/0.1

RE: [EXTERNAL]: Re: [EXTERNAL]: Re: UPSERT in Postgres

От

Louis Tian

Дата:

10 апреля 2023 г., 01:55:37

Hi Alban, 

"I am not expecting an error here", by "here" I means when doing a TRUE UPSERT (an upsert current does not exist in
Postgres).

I am NOT referring to an "Insert on conflict do update" (which despite its intention and wide acceptance is not fully
equivalentto a true upsert).

I understand the error I am getting now is due to not null constraint given how "insert on conflict" works. 

An UPSERT checks whether a row exists, if so, it does an update, if not it does an insert. This is the literal
definition.
An UPSERT is NOT defined as try do an INSERT first, if violate uniqueness constraint, do update. This is what on
conflictdo update is doing. 

We cannot define UPSERT with what "ON CONFLICT DO UPDATE" is doing. That is a logical fallacy.
 UPSERT is a higher-level abstract concept. ON CONFLICT DO UPDATE is an implementation of UPSERT. not the other way
around.

When doing a true UPSERT, if a row already exists, there is no need provide an (not null) column, since I am doing an
update.

With `UPSERT person (id, is_active)` VALUES (0, true). Is it necessary to provide the not null "name" column here
logically?

Not really, I already specified the row with the `id` column, then I specify the column I want to update `is_active`. 
* the id does exist; the update can be executed without any issue or assumptions
* the id does not exit; then I am expecting a violate not null constraint. 

On contrast, with `INSERT person (id, is_active) value (0, true) ON CONFLICT DO UPDATE set is_active=true`,
this statement will always fail regardless of whether the id exists or not. 
So the behavior of ON CONFLICT DO UPDATE differs from a UPSERT with the presence of a not null column. 
Hence why ON CONFLICT DO UPDATE is not complete equivalent to a TRUE upsert.

Cheers,
Louis Tian

-----Original Message-----
From: Alban Hertroys <haramrae@gmail.com> 
Sent: Sunday, April 9, 2023 7:26 PM
To: Louis Tian <louis.tian@aquamonix.com.au>
Cc: Peter Geoghegan <pg@bowt.ie>; pgsql-general@lists.postgresql.org
Subject: [EXTERNAL]: Re: [EXTERNAL]: Re: UPSERT in Postgres

CAUTION: This email originated from outside of Envirada. Do not click links or open attachments unless you recognize
thesender and know the content is safe.

> On 7 Apr 2023, at 2:49, Louis Tian <louis.tian@aquamonix.com.au> wrote:

(…)

> I am not expecting an error here. The problem is with no conflict it always go down the insert path first and results
ina not null constraint error.

> While I am expecting the insert is never executed in the first place when that row already exist (as identified by it
primarykey). So the update execute without error.

> I hope the pesudo code above is enough to clarify the difference?

Your assumption on what the problem is, is not correct. The problem is not with the conflict resolution, it is with
yourstatement violating a not null constraint.

It doesn’t matter whether you insert first or update first, either operation is going to violate that constraint.
You’respecifying a NULL value for a column that doesn’t accept that because it has a NOT NULL constraint. That is your
problem.

Alban Hertroys
--
There is always an exception to always.

--
Message  protected by MailGuard: e-mail anti-virus, anti-spam and content filtering.https://www.mailguard.com.au/mg
Click here to report this message as spam:
https://console.mailguard.com.au/ras/25ayougcIv/4FIia1zrtWT2nnuHlesEOS/1.8

Re: [EXTERNAL]: Re: [EXTERNAL]: Re: UPSERT in Postgres

От

Rob Sargent

Дата:

10 апреля 2023 г., 02:16:05

On 4/9/23 19:55, Louis Tian wrote:
> Hi Alban,
>
> "I am not expecting an error here", by "here" I means when doing a TRUE UPSERT (an upsert current does not exist in
Postgres).
> I am NOT referring to an "Insert on conflict do update" (which despite its intention and wide acceptance is not fully
equivalentto a true upsert).
 
> I understand the error I am getting now is due to not null constraint given how "insert on conflict" works.
>
> An UPSERT checks whether a row exists, if so, it does an update, if not it does an insert. This is the literal
definition.

This the part that's always eluded me: How does the client, the 
UPSERTer, come to hold an id and not know whether or not it's already in 
the database.

Re: [EXTERNAL]: Re: [EXTERNAL]: Re: UPSERT in Postgres

От

Adrian Klaver

Дата:

10 апреля 2023 г., 04:30:54

On 4/9/23 19:16, Rob Sargent wrote:
> On 4/9/23 19:55, Louis Tian wrote:
>> Hi Alban,
>>
>> "I am not expecting an error here", by "here" I means when doing a 
>> TRUE UPSERT (an upsert current does not exist in Postgres).
>> I am NOT referring to an "Insert on conflict do update" (which despite 
>> its intention and wide acceptance is not fully equivalent to a true 
>> upsert).
>> I understand the error I am getting now is due to not null constraint 
>> given how "insert on conflict" works.
>>
>> An UPSERT checks whether a row exists, if so, it does an update, if 
>> not it does an insert. This is the literal definition.
> 
> This the part that's always eluded me: How does the client, the 
> UPSERTer, come to hold an id and not know whether or not it's already in 
> the database.

My use case is for bulk loading data into a table I know has data that 
will create a PK/Unique violation with the inserted data. It's a quick 
and dirty way to avoid queries that look for potential violations ahead 
of time, basically 'Ask forgiveness' vs 'Get permission'.

-- 
Adrian Klaver
adrian.klaver@aklaver.com

Re: [EXTERNAL]: Re: [EXTERNAL]: Re: UPSERT in Postgres

От

Francisco Olarte

Дата:

10 апреля 2023 г., 07:37:08

Hi Louis:

On Mon, 10 Apr 2023 at 03:05, Louis Tian <louis.tian@aquamonix.com.au> wrote:
> I think we need to make a distinction between an "operation" and a "statement".
OK

> The concept of idempotency applies to an "operation" not an entire statement.
I think I'll need a definition of both to say anything on this.

> Like how HTTP "PUT" method is defined as "idempotent", you don't say actual HTTP PUT request is idempotent.
Well, in HTTP a request is half defined.

> With the "current_datetime" and "access_count+1", you are effectively changing the value passing to the UPSERT
operator.
I can agree with current_datetime ( I doubt I used that... checks the
BOTTOM QUOTE ... effectively I did not ) which can be thought as a
fancy macro for passing a value, but access_count+1 is an expression,
I do not and could not pass it, the engine has to read the row(s)
affected by the insert to use it.

Anyway, what I was trying to point is that UPSERT has no standard
definition, and that the usual implementations of the concept can be
used to build idempotent "requests", but they are many times used for
non-idempotent ones, like timestamp logging or counter updates.

As postgres does not have, AFAIK, an "UPSERT" statement I think you
will need to define it before further discussion, something like
mapping it to an insert on conflict or similar.

> Just like how you changed the payload of a PUT, then obviously there is no reason to expect the state of the database
toremain the same.

I did not change the payload of a put, I assume you refer to the
database request, but I'm too old to go hunting for the exact thing
you refer to on a response which quotes even my signatures and the
virus scanner lines.

You can repeat a PUT request or not repeat it. HTTP request cover body
too, changing body is like changing URI, or changing method, it is a
different request. Idempotent in HTTP is same request, body included,
same response. Not similar request, similar response.

Francisco Olarte.

Re: [EXTERNAL]: Re: [EXTERNAL]: Re: UPSERT in Postgres

От

Francisco Olarte

Дата:

10 апреля 2023 г., 07:41:15

Hi Rob:

On Mon, 10 Apr 2023 at 04:16, Rob Sargent <robjsargent@gmail.com> wrote:
> > An UPSERT checks whether a row exists, if so, it does an update, if not it does an insert. This is the literal
definition.
> This the part that's always eluded me: How does the client, the
> UPSERTer, come to hold an id and not know whether or not it's already in
> the database.

This is extremely easy to do if you have natural instead of surrogate keys.

I work in telephony, upserting the last incoming call timestamp for a
phone number will be exactly that.

Francisco Olarte.

Re: [EXTERNAL]: Re: [EXTERNAL]: Re: UPSERT in Postgres

От

Karsten Hilbert

Дата:

10 апреля 2023 г., 09:39:53

Am Mon, Apr 10, 2023 at 09:41:15AM +0200 schrieb Francisco Olarte:

> On Mon, 10 Apr 2023 at 04:16, Rob Sargent <robjsargent@gmail.com> wrote:
> > > An UPSERT checks whether a row exists, if so, it does an update, if not it does an insert. This is the literal
definition.
> > This the part that's always eluded me: How does the client, the
> > UPSERTer, come to hold an id and not know whether or not it's already in
> > the database.
>
> This is extremely easy to do if you have natural instead of surrogate keys.
>
> I work in telephony, upserting the last incoming call timestamp for a
> phone number will be exactly that.

timezones ?

DST ?

spoofing ?

...

Karsten
--
GPG  40BE 5B0E C98E 1713 AFA6  5BC0 3BEA AC80 7D4F C89B

Re: [EXTERNAL]: Re: [EXTERNAL]: Re: UPSERT in Postgres

От

Francisco Olarte

Дата:

10 апреля 2023 г., 11:33:41

Hi karsten:

On Mon, 10 Apr 2023 at 11:40, Karsten Hilbert <Karsten.Hilbert@gmx.net> wrote:
>
> Am Mon, Apr 10, 2023 at 09:41:15AM +0200 schrieb Francisco Olarte:
>
> > On Mon, 10 Apr 2023 at 04:16, Rob Sargent <robjsargent@gmail.com> wrote:
> > > > An UPSERT checks whether a row exists, if so, it does an update, if not it does an insert. This is the literal
definition.
> > > This the part that's always eluded me: How does the client, the
> > > UPSERTer, come to hold an id and not know whether or not it's already in
> > > the database.
> >
> > This is extremely easy to do if you have natural instead of surrogate keys.
> >
> > I work in telephony, upserting the last incoming call timestamp for a
> > phone number will be exactly that.
>
> timezones ?
> DST ?

A timestamp is a point in the time line, this is what I insert, just a
real number marking a line, timezones and dst are presentation stuff.

> spoofing ?

¿ Of what ? I do it for a phone number, not for a customer, it does
not matter to me if the number came from a legit customer or from a
spoofer, I want it for the phone number.

Francisco Olarte.

Re: [EXTERNAL]: Re: [EXTERNAL]: Re: UPSERT in Postgres

От

Karsten Hilbert

Дата:

10 апреля 2023 г., 12:35:38

Am Mon, Apr 10, 2023 at 01:33:41PM +0200 schrieb Francisco Olarte:

> > > > This the part that's always eluded me: How does the client, the
> > > > UPSERTer, come to hold an id and not know whether or not it's already in
> > > > the database.
> > >
> > > This is extremely easy to do if you have natural instead of surrogate keys.
> > >
> > > I work in telephony, upserting the last incoming call timestamp for a
> > > phone number will be exactly that.
> >
> > timezones ?
> > DST ?
>
> A timestamp is a point in the time line, this is what I insert, just a
> real number marking a line, timezones and dst are presentation stuff.

Indeed, as is the assumption which time line the numbers are
referring to. Hence the incoming call timestamp is usable as
a (natural) PK with respect to a given time line only, right?

> > spoofing ?
>
> ¿ Of what ?

The time stamp. But then I assume that is obtained on the
logging system.

All I really wanted to hint at is that "incoming call
timestamp" may work pretty well in given settings but does
not _always_ make for a "unique enough" key.

Karsten
--
GPG  40BE 5B0E C98E 1713 AFA6  5BC0 3BEA AC80 7D4F C89B

Re: [EXTERNAL]: Re: [EXTERNAL]: Re: UPSERT in Postgres

От

"Peter J. Holzer"

Дата:

10 апреля 2023 г., 14:43:59

On 2023-04-10 14:35:38 +0200, Karsten Hilbert wrote:
> All I really wanted to hint at is that "incoming call
> timestamp" may work pretty well in given settings but does
> not _always_ make for a "unique enough" key.

This is true for all natural primary keys: Any attribute of an entity
which is unique for a given application may not be unique for other
applications.

        hp

--
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"

Вложения

signature.asc

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: UPSERT in Postgres

Вложения