Обсуждение: INSERT ... ON CONFLICT DO UPDATE

Поиск
Список
Период
Сортировка

INSERT ... ON CONFLICT DO UPDATE

От
Rafal Pietrak
Дата:
Hello,

I've just started to read through postgres-9.5 "what's new" ... before
giving it a try. The "insert ... on conflict do update" is particularly
atractive to me; but I was wondering why it does not cover the third
usage scenario of action that a programmer may need for a PK conflict
during insert.

In my experience, most often I generate a random value for PK, with that
random value becoming a unique ticket like a voucher (related to
monetary value). for that I:

CREATE TABLE vouchers (a_voucher bigint PRIMARY KEY default
(random()*1000000000)::bigint, issued date default now(), .....);

Naturally:
1. A_VOUCHER range space is always significantly larger then currently
issued voucher count - so conflicts are rare.
2. with current (as of 9.5) implementation I think I can always "ON
CONFLICT DO NOTHING", and retry the INSERT from application level.

But it would be immenensly more comfortable if one could: "INSERT ... ON
CONFLICT (a_voucher) DO RETRY"; with semantics of that statement being:
1. prepare should check if there is a DFAULT for specified "conflict
column" (here: "a_voucher"), and fail if there isn't one.
2. prepare shoud check if the default is a VOLATILE function... or fail.
3. when all that pass, the prepared insert, when executed and with a
conflict, should be re-attempt with NEW call to that DEFAULT function of
the indicated CONFLICT column(s).
3. and there should be a /ETC/POSTGRES.CONF parameter limiting the
number of retries for a single conflict - as a programmer I know, that
if I need to retry more then twice, the space is too dense, always. So I
need to change the DFAULT function, not increase the retry_count ...
thus haveing DDS allowing the change to the DFAULT FUNCTION means it's
not necesary to allow for change of the RETRY_CONT (during database
life) - and when the later is in the CONFIG, the less it's prone to typo
errors of application authors.

Was the above considered for "ON CONFLICT" implementation before?

If so, can someone pls point me to critics it received.

If not: is it unreasonable? why?

-R


Re: INSERT ... ON CONFLICT DO UPDATE

От
"Charles Clavadetscher"
Дата:
Hello

> I've just started to read through postgres-9.5 "what's new" ... before giving it
> a try. The "insert ... on conflict do update" is particularly atractive to me; but I
> was wondering why it does not cover the third usage scenario of action that a
> programmer may need for a PK conflict during insert.
>
> In my experience, most often I generate a random value for PK, with that
> random value becoming a unique ticket like a voucher (related to monetary
> value). for that I:
>
> CREATE TABLE vouchers (a_voucher bigint PRIMARY KEY default
> (random()*1000000000)::bigint, issued date default now(), .....);
>
> Naturally:
> 1. A_VOUCHER range space is always significantly larger then currently issued
> voucher count - so conflicts are rare.
> 2. with current (as of 9.5) implementation I think I can always "ON CONFLICT
> DO NOTHING", and retry the INSERT from application level.

An UPSERT is "try an INSERT and if there is a conflict, do nothing or UPDATE some values of the existing record". The
scenariothat you suggest is not an UPSERT, because what you want to reach is to try a  new INSERT, hoping that this
works.
What speak against using a sequence for the primary key column a_voucher? This would guarantee that you don't have a
conflict.

> But it would be immenensly more comfortable if one could: "INSERT ... ON
> CONFLICT (a_voucher) DO RETRY"; with semantics of that statement being:
> 1. prepare should check if there is a DFAULT for specified "conflict column"
> (here: "a_voucher"), and fail if there isn't one.
> 2. prepare shoud check if the default is a VOLATILE function... or fail.
> 3. when all that pass, the prepared insert, when executed and with a conflict,
> should be re-attempt with NEW call to that DEFAULT function of the
> indicated CONFLICT column(s).
> 3. and there should be a /ETC/POSTGRES.CONF parameter limiting the
> number of retries for a single conflict - as a programmer I know, that if I need
> to retry more then twice, the space is too dense, always. So I need to change
> the DFAULT function, not increase the retry_count ...
> thus haveing DDS allowing the change to the DFAULT FUNCTION means it's
> not necesary to allow for change of the RETRY_CONT (during database
> life) - and when the later is in the CONFIG, the less it's prone to typo errors of
> application authors.
>
> Was the above considered for "ON CONFLICT" implementation before?
>
> If so, can someone pls point me to critics it received.
>
> If not: is it unreasonable? why?

IMHO, as I mentioned, this is not an UPSERT use case, but maybe the implementors of the feature may have different
arguments.You could implement that in a function instead of the application, if you prefer. 

Bye
Charles




Re: INSERT ... ON CONFLICT DO UPDATE

От
Rafal Pietrak
Дата:
Hi,

W dniu 19.07.2015 o 09:33, Charles Clavadetscher pisze:
[---------------]
>> 2. with current (as of 9.5) implementation I think I can always "ON CONFLICT
>> DO NOTHING", and retry the INSERT from application level.
>
> An UPSERT is "try an INSERT and if there is a conflict, do nothing or UPDATE some values of the existing record". The
scenariothat you suggest is not an UPSERT, because what you want to reach is to try a  new INSERT, hoping that this
works.
> What speak against using a sequence for the primary key column a_voucher? This would guarantee that you don't have a
conflict.
>

It have to be random, since it barres a "sort of monetary" value. The
vouches are destined to be one-time authorization tokens, they have to
be harder to guess then those drawn from the sequence are.

[------------]
>>
>> If not: is it unreasonable? why?
>
> IMHO, as I mentioned, this is not an UPSERT use case, but maybe the implementors of the feature may have different
arguments.You could implement that in a function instead of the application, if you prefer. 
>

I'm not particularly fond of using functions to accessing RDBMS instead
of tables.

And I'm not particularly fond of "workarounds".

But if that usage scenario is not appreciated here, then guess I have to
live with what is available. And the availability of ON CONFLICT is a
great improvement anyway.

Thenx,

-R


Re: INSERT ... ON CONFLICT DO UPDATE

От
Geoff Winkless
Дата:
On 19 July 2015 at 09:11, Rafal Pietrak <rafal@ztk-rp.eu> wrote:
I'm not particularly fond of using functions to accessing RDBMS instead
of tables.

And I'm not particularly fond of "workarounds".

Use a combination of factors (a sequence ID and the key) for your authorization. So in the extremely unlikely event that the random key isn't unique, it doesn't matter. That's not a workaround, it's good design.

You're asking for a feature that is completely unnecessary and is easily resolved. UPSERT is designed for a situation which is neither.

Geoff

Re: INSERT ... ON CONFLICT DO UPDATE

От
Rafal Pietrak
Дата:
Hi,

W dniu 19.07.2015 o 10:27, Geoff Winkless pisze:
> On 19 July 2015 at 09:11, Rafal Pietrak <rafal@ztk-rp.eu
> <mailto:rafal@ztk-rp.eu>> wrote:
>
>     I'm not particularly fond of using functions to accessing RDBMS instead
>     of tables.
>
>     And I'm not particularly fond of "workarounds".
>
>
> Use a combination of factors (a sequence ID and the key) for your
> authorization. So in the extremely unlikely event that the random key
> isn't unique, it doesn't matter. That's not a workaround, it's good design.

I'm sory. At this point I don't want to prolong the discussion (like
flaming), but I feel like having to defend myself a little.

Regarding the above: when I have to invent/introduce additional
features/columns/attributes (like a key in addition to a sequence),
which are not required by the design, but necessary for implementation)
is a workaround (almost by definition).

In my current implementations I captures (the rare) PK conflict
exceptions and redo the INSERT at application level. It works
sufficiently well. I was just curious if that usage scenario is
currently also covered by current ON CONFLICT, or not.

>
> You're asking for a feature that is completely unnecessary and is easily
> resolved. UPSERT is designed for a situation which is neither.
>

1. Despite possibly sounding like one, I wasn't actually asking for a
feature - I wasn't sure if that could possibly be implemented using
currently available postresql features. So I've just explained a usage
scenario (explaining the semantics using "invented pseudo syntax") which
I've experienced.

2. that usage scenario, IMHO wasn't obviously covered (as of my first
reading of "the upsert" implementation). It might have been covered ...
only I wasn't seeing it; so I've brought it up.

3. and obviously that usage scenario (despite my personal experience)
might actually be very rare - unworthy implementation effort and thus
qualifying for workarounds. This happen, I understand.

I hope this explains my point better.

-R


Re: INSERT ... ON CONFLICT DO UPDATE

От
Geoff Winkless
Дата:
On 19 July 2015 at 11:30, Rafal Pietrak <rafal@ztk-rp.eu> wrote:
when I have to invent/introduce additional
features/columns/attributes (like a key in addition to a sequence),
which are not required by the design, but necessary for implementation)
is a workaround (almost by definition).

I'm sorry that you feel defensive about this, and apologies for repeating myself, but the fact that the random key can be duplicated means it should not be used as a primary key, so using a sequence as a primary key is not a workaround, it's a correction to the design. 

Notwithstanding that, the reason UPSERT is required is because it's possible that two competing transactions can end up fighting over an INSERT and the workarounds that are required are either highly complex or not 100% successful (eg http://www.depesz.com/2012/06/10/why-is-upsert-so-complicated/). 

Conversely, the workaround in the above case (even if you don't want to change the primary key) is trivial - as you yourself described.

Geoff

Re: INSERT ... ON CONFLICT DO UPDATE

От
Rafal Pietrak
Дата:
Hi,

W dniu 19.07.2015 o 14:10, Geoff Winkless pisze:
> On 19 July 2015 at 11:30, Rafal Pietrak <rafal@ztk-rp.eu
> <mailto:rafal@ztk-rp.eu>> wrote:
>
>     when I have to invent/introduce additional
>     features/columns/attributes (like a key in addition to a sequence),
>     which are not required by the design, but necessary for implementation)
>     is a workaround (almost by definition).
>
>
> I'm sorry that you feel defensive about this, and apologies for
> repeating myself, but the fact that the random key can be duplicated
> means it should not be used as a primary key, so using a sequence as a
> primary key is not a workaround, it's a correction to the design.

OK. I think I need to apology myself, too. I hope my defense wasn't too
fierce.

But I need to clearify one thing:

Although "a random" can duplicate its previous values, "my random(s)"
(which are created for this application purpose) cannot be duplicated
when it's stored in the database as "live active data". I understand,
that UNIQUE constraint is precisely the RDBMS tool to guarantee that.

Naturally, if I put a UNIQUE constraint on that column, or make it a PK,
is just a matter of choice here. That shouldn't rise concern. I just use
tools RDBMS provides for "semantics" the application needs.


>
> Notwithstanding that, the reason UPSERT is required is because it's
> possible that two competing transactions can end up fighting over an
> INSERT and the workarounds that are required are either highly complex
> or not 100% successful (eg
> http://www.depesz.com/2012/06/10/why-is-upsert-so-complicated/).
>

I knew that Depesz publication before.

Actually it was the reason I've brought up "my usage scenario" here now.
I'm not as competent as Depesz, so:

1. I worry, that while restarting a failed INSERT transaction at
application level I miss something important (you people know by heart)
and unwillingly corrupt and/or "suboptimise" my application/data. (much
to the point Depesz described).

2. But, since the majority of the hard work of postgresql UPSERT
implementation is already done; I wanted to check out if the usage
scenario I point out falls into it as a "case", or is covered by it by
some "indiomatic SQL sequence", or otherwise. From current discussion I
gather: "its otherwise" - it isn't considered as applicable. (so I
concluded: I'll live with manual re-attempt of failed insert)

-R


Re: INSERT ... ON CONFLICT DO UPDATE

От
Adrian Klaver
Дата:
On 07/19/2015 06:47 AM, Rafal Pietrak wrote:
> Hi,
>
> W dniu 19.07.2015 o 14:10, Geoff Winkless pisze:
>> On 19 July 2015 at 11:30, Rafal Pietrak <rafal@ztk-rp.eu
>> <mailto:rafal@ztk-rp.eu>> wrote:
>>
>>      when I have to invent/introduce additional
>>      features/columns/attributes (like a key in addition to a sequence),
>>      which are not required by the design, but necessary for implementation)
>>      is a workaround (almost by definition).
>>
>>
>> I'm sorry that you feel defensive about this, and apologies for
>> repeating myself, but the fact that the random key can be duplicated
>> means it should not be used as a primary key, so using a sequence as a
>> primary key is not a workaround, it's a correction to the design.
>
> OK. I think I need to apology myself, too. I hope my defense wasn't too
> fierce.
>
> But I need to clearify one thing:
>
> Although "a random" can duplicate its previous values, "my random(s)"
> (which are created for this application purpose) cannot be duplicated
> when it's stored in the database as "live active data". I understand,
> that UNIQUE constraint is precisely the RDBMS tool to guarantee that.

 From my perspective the issue is, you are using a 'unique' key
generator that you know is not creating unique keys and then asking the
database to make it right. Sort of like making a square peg fit a round
hole by shaving the corners. It is possible but has sort of a messy feel
to it.

>
> Naturally, if I put a UNIQUE constraint on that column, or make it a PK,
> is just a matter of choice here. That shouldn't rise concern. I just use
> tools RDBMS provides for "semantics" the application needs.
>
>
>>
>> Notwithstanding that, the reason UPSERT is required is because it's
>> possible that two competing transactions can end up fighting over an
>> INSERT and the workarounds that are required are either highly complex
>> or not 100% successful (eg
>> http://www.depesz.com/2012/06/10/why-is-upsert-so-complicated/).
>>
>
> I knew that Depesz publication before.
>
> Actually it was the reason I've brought up "my usage scenario" here now.
> I'm not as competent as Depesz, so:
>
> 1. I worry, that while restarting a failed INSERT transaction at
> application level I miss something important (you people know by heart)
> and unwillingly corrupt and/or "suboptimise" my application/data. (much
> to the point Depesz described).
>
> 2. But, since the majority of the hard work of postgresql UPSERT
> implementation is already done; I wanted to check out if the usage
> scenario I point out falls into it as a "case", or is covered by it by
> some "indiomatic SQL sequence", or otherwise. From current discussion I
> gather: "its otherwise" - it isn't considered as applicable. (so I
> concluded: I'll live with manual re-attempt of failed insert)

As noted upstream, what you want is not an UPSERT. An UPSERT is based on
the premise that if you try an INSERT where the unique constraint
already exists then the INSERT is turned into an UPDATE.

To be fair:

http://www.postgresql.org/docs/9.5/static/sql-insert.html
"
  ON CONFLICT DO UPDATE guarantees an atomic INSERT or UPDATE outcome -
provided there is no independent error, one of those two outcomes is
guaranteed, even under high concurrency. This feature is also known as
UPSERT"

So an UPSERT is just one feature of ON CONFLICT. The other being DO
NOTHING. Therefore I could see an argument made for adding other ON
CONFLICT clauses. How difficult/plausible that would be is above my
level of expertise.


>
> -R
>
>


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: INSERT ... ON CONFLICT DO UPDATE

От
Rafal Pietrak
Дата:

W dniu 19.07.2015 o 16:33, Adrian Klaver pisze:
> On 07/19/2015 06:47 AM, Rafal Pietrak wrote:
>> Hi,
>>
>> W dniu 19.07.2015 o 14:10, Geoff Winkless pisze:
>>> On 19 July 2015 at 11:30, Rafal Pietrak <rafal@ztk-rp.eu
[---------------]
>> Although "a random" can duplicate its previous values, "my random(s)"
>> (which are created for this application purpose) cannot be duplicated
>> when it's stored in the database as "live active data". I understand,
>> that UNIQUE constraint is precisely the RDBMS tool to guarantee that.
>
> From my perspective the issue is, you are using a 'unique' key generator
> that you know is not creating unique keys and then asking the database
> to make it right. Sort of like making a square peg fit a round hole by
> shaving the corners. It is possible but has sort of a messy feel to it.

Hmmm, yes.

Yet, I don't feel guilty as much, since that is quite similar to a
unique key on database "username", while the "generator" (human mind)
does not guarantee that. The DB just makes sure it does.

[--------------]
>
> So an UPSERT is just one feature of ON CONFLICT. The other being DO
> NOTHING. Therefore I could see an argument made for adding other ON
> CONFLICT clauses. How difficult/plausible that would be is above my
> level of expertise.
>

Mine too. But I'd say that the above wording exactly makes the point I
was trying to make. Thank you.

-R


Re: INSERT ... ON CONFLICT DO UPDATE

От
Adrian Klaver
Дата:
On 07/19/2015 08:04 AM, Rafal Pietrak wrote:
>
>
> W dniu 19.07.2015 o 16:33, Adrian Klaver pisze:
>> On 07/19/2015 06:47 AM, Rafal Pietrak wrote:
>>> Hi,
>>>
>>> W dniu 19.07.2015 o 14:10, Geoff Winkless pisze:
>>>> On 19 July 2015 at 11:30, Rafal Pietrak <rafal@ztk-rp.eu
> [---------------]
>>> Although "a random" can duplicate its previous values, "my random(s)"
>>> (which are created for this application purpose) cannot be duplicated
>>> when it's stored in the database as "live active data". I understand,
>>> that UNIQUE constraint is precisely the RDBMS tool to guarantee that.
>>
>>  From my perspective the issue is, you are using a 'unique' key generator
>> that you know is not creating unique keys and then asking the database
>> to make it right. Sort of like making a square peg fit a round hole by
>> shaving the corners. It is possible but has sort of a messy feel to it.
>
> Hmmm, yes.
>
> Yet, I don't feel guilty as much, since that is quite similar to a
> unique key on database "username", while the "generator" (human mind)
> does not guarantee that. The DB just makes sure it does.

I think the argument to be made here is you have no control over what
people choose as a username, you do have control over what your key
generator outputs.

>
> [--------------]
>>
>> So an UPSERT is just one feature of ON CONFLICT. The other being DO
>> NOTHING. Therefore I could see an argument made for adding other ON
>> CONFLICT clauses. How difficult/plausible that would be is above my
>> level of expertise.
>>
>
> Mine too. But I'd say that the above wording exactly makes the point I
> was trying to make. Thank you.
>
> -R
>


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: INSERT ... ON CONFLICT DO UPDATE

От
Melvin Davidson
Дата:
Aside from Tom Lane's comments, it seems to me you are reinventing the wheel by generating random values for keys. Why not just use UUID http://www.postgresql.org/docs/9.5/static/datatype-uuid.html
or serial http://www.postgresql.org/docs/9.5/static/datatype-numeric.html#DATATYPE-SERIAL?
Wouldn't that simplify things by insuring uniqueness?

On Sun, Jul 19, 2015 at 11:12 AM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 07/19/2015 08:04 AM, Rafal Pietrak wrote:


W dniu 19.07.2015 o 16:33, Adrian Klaver pisze:
On 07/19/2015 06:47 AM, Rafal Pietrak wrote:
Hi,

W dniu 19.07.2015 o 14:10, Geoff Winkless pisze:
On 19 July 2015 at 11:30, Rafal Pietrak <rafal@ztk-rp.eu
[---------------]
Although "a random" can duplicate its previous values, "my random(s)"
(which are created for this application purpose) cannot be duplicated
when it's stored in the database as "live active data". I understand,
that UNIQUE constraint is precisely the RDBMS tool to guarantee that.

 From my perspective the issue is, you are using a 'unique' key generator
that you know is not creating unique keys and then asking the database
to make it right. Sort of like making a square peg fit a round hole by
shaving the corners. It is possible but has sort of a messy feel to it.

Hmmm, yes.

Yet, I don't feel guilty as much, since that is quite similar to a
unique key on database "username", while the "generator" (human mind)
does not guarantee that. The DB just makes sure it does.

I think the argument to be made here is you have no control over what people choose as a username, you do have control over what your key generator outputs.


[--------------]

So an UPSERT is just one feature of ON CONFLICT. The other being DO
NOTHING. Therefore I could see an argument made for adding other ON
CONFLICT clauses. How difficult/plausible that would be is above my
level of expertise.


Mine too. But I'd say that the above wording exactly makes the point I
was trying to make. Thank you.

-R



--
Adrian Klaver
adrian.klaver@aklaver.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



--
Melvin Davidson
I reserve the right to fantasize.  Whether or not you
wish to share my fantasy is entirely up to you.

Re: INSERT ... ON CONFLICT DO UPDATE

От
"Daniel Verite"
Дата:
    Rafal Pietrak wrote:

> CREATE TABLE vouchers (a_voucher bigint PRIMARY KEY default
> (random()*1000000000)::bigint, issued date default now(), .....);

Generators of truly unique pseudo-random values provide a
better ground for this. Consider for example:

https://wiki.postgresql.org/wiki/Pseudo_encrypt

You may replace the round function with your own secret function,
so you'll get the required randomness, secrecy and uniqueness.
No need to deal with collisions on insertion as there are none.

> 2. with current (as of 9.5) implementation I think I can always "ON
> CONFLICT DO NOTHING", and retry the INSERT from application level.

Yes, but retrying is now easy, let's not underappreciate that.

As a test, with 9.5alpha1, I create a table with 100k unique
random numbers:

  CREATE TABLE vouchers(id int primary key);

Then try to populate it immediately with 100k rows:

INSERT INTO vouchers
   select (random()*1000000000)::int from
   generate_series(1,100000)
 ON CONFLICT DO NOTHING;

psql result: INSERT 0 99995

Note how 5 values conflicted right from the beginning,
even though we're claiming only 10^5 out of the 10^9 output
range (or 0.01%).
The probability of at least one collision is pretty high,
see the "birthday paradox" for the theory on that.

Anyway the collisions got eliminated without any effort
from me and that's quite useful already.

Now trying to insert 10k rows at a time:

  INSERT INTO vouchers
      SELECT (random()*1000000000)::int
      FROM generate_series(1,10000)
      ON CONFLICT DO NOTHING
      RETURNING id;

when run repeatedly, it tends to return between 9995 and 10000 values.
If we want exactly N rows and we get back N-epsilon, then we need to
re-ask for epsilon rows, but this will converge fast  to completion.
(that is, until you have enough values that the birthday paradox effect
really kicks in).

My point is that we can now achieve that without any exception handling
or transaction retry, and no plpgsql function to create, so it's really
a significant improvement in ease of use. And presumably in
performance too.


Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @ManitouMail


Re: INSERT ... ON CONFLICT DO UPDATE

От
"Daniel Verite"
Дата:
    Melvin Davidson wrote:

> Aside from Tom Lane's comments, it seems to me you are reinventing the wheel
> by generating random values for keys. Why not just use UUID
> http://www.postgresql.org/docs/9.5/static/datatype-uuid.html
> or serial
> http://www.postgresql.org/docs/9.5/static/datatype-numeric.html#DATATYPE-SERIAL?
> Wouldn't that simplify things by insuring uniqueness?

UUIDs are 36 characters wide; it's too boring and error-prone
for a person to type this on a keyboard or spell it over the phone
to an operator.

For SERIAL, it's too obvious to guess what is the next one,
so malicious people could claim access codes or vouchers
they don't own.

The constraint is that such codes must be reasonably short, but
someone who tries to make up one must have a near-zero chance
of guessing one that actually exists.

Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @ManitouMail


Re: INSERT ... ON CONFLICT DO UPDATE

От
Francisco Olarte
Дата:
Hi Daniel:

On Sun, Jul 19, 2015 at 9:03 PM, Daniel Verite <daniel@manitou-mail.org> wrote:
> For SERIAL, it's too obvious to guess what is the next one,
> so malicious people could claim access codes or vouchers
> they don't own.

Why don't you use encryption? Specifically only on the external side.
You use a serial in the DB and send the encrypted serial as voucher
code ( this way you do not need to have database resident encryption
). Then when you receive the code in the program you decrypt it and
are done. And having serial in the DB can be good for your internal
operations. Encryption, reversible and colision free, not hashing.

> The constraint is that such codes must be reasonably short, but
> someone who tries to make up one must have a near-zero chance
> of guessing one that actually exists.

If you can live with a little longer voucher  ( it seems you use 10^9
in random ), you can use 2^32, which is just 9.5 digits, and search
for a 32 bit block cipher ( or build it yourself, it's not that hard
using stream ciphers or other tools ).

I also thinks random UUIDs are not ok, not because they are long but
because they are random, and can collide ( encrypted serials can not
).

Francisco Olarte.


Re: INSERT ... ON CONFLICT DO UPDATE

От
Deven Phillips
Дата:
For this purpose, I have seen it recommended to use a UUID instead of a randomly generated integer. I do this myself for production applications and over millions of records I have yet to log a conflict. Also, as stated above, you could create a plpgsql function which would achieve exactly what you want (retry insert until it succeeds).

Just my 2 cents,

Deven

On Sun, Jul 19, 2015 at 9:47 AM, Rafal Pietrak <rafal@ztk-rp.eu> wrote:
Hi,

W dniu 19.07.2015 o 14:10, Geoff Winkless pisze:
> On 19 July 2015 at 11:30, Rafal Pietrak <rafal@ztk-rp.eu
> <mailto:rafal@ztk-rp.eu>> wrote:
>
>     when I have to invent/introduce additional
>     features/columns/attributes (like a key in addition to a sequence),
>     which are not required by the design, but necessary for implementation)
>     is a workaround (almost by definition).
>
>
> I'm sorry that you feel defensive about this, and apologies for
> repeating myself, but the fact that the random key can be duplicated
> means it should not be used as a primary key, so using a sequence as a
> primary key is not a workaround, it's a correction to the design.

OK. I think I need to apology myself, too. I hope my defense wasn't too
fierce.

But I need to clearify one thing:

Although "a random" can duplicate its previous values, "my random(s)"
(which are created for this application purpose) cannot be duplicated
when it's stored in the database as "live active data". I understand,
that UNIQUE constraint is precisely the RDBMS tool to guarantee that.

Naturally, if I put a UNIQUE constraint on that column, or make it a PK,
is just a matter of choice here. That shouldn't rise concern. I just use
tools RDBMS provides for "semantics" the application needs.


>
> Notwithstanding that, the reason UPSERT is required is because it's
> possible that two competing transactions can end up fighting over an
> INSERT and the workarounds that are required are either highly complex
> or not 100% successful (eg
> http://www.depesz.com/2012/06/10/why-is-upsert-so-complicated/).
>

I knew that Depesz publication before.

Actually it was the reason I've brought up "my usage scenario" here now.
I'm not as competent as Depesz, so:

1. I worry, that while restarting a failed INSERT transaction at
application level I miss something important (you people know by heart)
and unwillingly corrupt and/or "suboptimise" my application/data. (much
to the point Depesz described).

2. But, since the majority of the hard work of postgresql UPSERT
implementation is already done; I wanted to check out if the usage
scenario I point out falls into it as a "case", or is covered by it by
some "indiomatic SQL sequence", or otherwise. From current discussion I
gather: "its otherwise" - it isn't considered as applicable. (so I
concluded: I'll live with manual re-attempt of failed insert)

-R


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: INSERT ... ON CONFLICT DO UPDATE

От
Rafal Pietrak
Дата:
If I'm not mistaken, the conclusions from posts in this thread are:

1. recognizing of a "RETRY" action, as a separate case of "ON CONFLICT"
transaction continuation is not generally appreciated.

2. I shouldn't expect any "hidden corruption/performance" obstacles when
simply re-attempting of an INSERT at the application level (should
constraint conflict arise).

3. there are methods (like cryptographic "random" sequence), which
guarantee no conflicts. So one should resort to that.

Regarding the last point. Usually, I implement one-time used vouchers as
rows in table like:

CREATE TABLE (voucher int not null, consumed bool, expire timestamp not
null default timestamp_pl_interval(now()::timestamp, '2
min'::interval),..., unique (voucher,consumed) );

with CONSUMED column NULLyfied when voucher is used. The entire row of
consumed voucher is purged after clearence and verification, which
happen significantly later.

Such short lived (when active) voucher is usually just 6-digit long, to
help people enter it.

I don't know much about cryptography, but would a generic encryption
function (like that indicated by Daniel) have the same "waking through
the entire range-space" behavior as the original when that range-space
is externally (by my application) truncated to those 6 digits? If not,
would it be as efficient in conflict avoidance as used with original
32-bit range-space?

Then again. Is it really a "good practice" to rely on a programmer to
peek "proper/correct encryption helper" instead of providing him/her
with a database-integrated tool for a "well defined" and not so rare
usage scenario as "random default" for UNIQUE/PK column?

So my conclusion from this thread is that as this usage scenario does
not seem to be foreseen by current implementation of ON CONFLICT
transaction, a workaround exists (like: cryptographic range-walker).
Being it a workaround, I'd vote for some direct supported of that
scenario in the future at database level.

-R


Re: INSERT ... ON CONFLICT DO UPDATE

От
Geoff Winkless
Дата:
On 20 July 2015 at 14:33, Rafal Pietrak <rafal@ztk-rp.eu> wrote:
If I'm not mistaken, the conclusions from posts in this thread are:

3. there are methods (like cryptographic "random" sequence), which
guarantee no conflicts. So one should resort to that.


Some web research suggests that random sequences are not great for indexes because of the resultant "keyspace fragmentation". I'm assuming that means a low number of nodes in the btree leafs, so an increase in memory usage for the index?

Just a thought.

Geoff​

Re: INSERT ... ON CONFLICT DO UPDATE

От
Alvaro Herrera
Дата:
Geoff Winkless wrote:
> On 20 July 2015 at 14:33, Rafal Pietrak <rafal@ztk-rp.eu> wrote:
>
> > If I'm not mistaken, the conclusions from posts in this thread are:
> >
> > 3. there are methods (like cryptographic "random" sequence), which
> > guarantee no conflicts. So one should resort to that.
> >
> >
> Some web research suggests that random sequences are not great for indexes
> because of the resultant "keyspace fragmentation". I'm assuming that means
> a low number of nodes in the btree leafs, so an increase in memory usage
> for the index?

Not sure what type of indexes would be affected by that problem, but I
don't think Postgres' btrees would be.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: INSERT ... ON CONFLICT DO UPDATE

От
Geoff Winkless
Дата:
On 20 July 2015 at 15:07, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
Not sure what type of indexes would be affected by that problem, but I
don't think Postgres' btrees would be.

I admit it's not really my area.​
 
​Take it up with Drew Blas, I guess :)


My first stab at comprehending what he means:

​Imagine if you have around a thousand entries in your index. In a sequential set, the btree could be (with each leaf containing 100 values)
                499
      299                 799
100 200 300 400    500 600 700 800 900 ​

​In a (16-bit, say) random set, the btree could be
                   33208
       21728                         45220
927 15927 ​21729 29661 33209 34917 40121 45221 49826

Inserting a new value in a sequential key is always going to go into the 900 node. When that fills, ​you can add a new node and just balance the parent branches (assuming postgres doesn't try to keep a proper balance across the nodes too?).

However in the random tree a new value could be inserted into _any_ of the leaf nodes, which means you either have to create more leaf nodes than you require (in order to leave plenty of blank space, and even if you do that you will probably end up with uneven fill, which means some searches will be slower than others) or you end up having to refactor all of your nodes every time you insert a value.

Geoff​

Re: INSERT ... ON CONFLICT DO UPDATE

От
Igor Neyman
Дата:
Geoff Winkless wrote:
> On 20 July 2015 at 14:33, Rafal Pietrak <rafal@ztk-rp.eu> wrote:
>
> > If I'm not mistaken, the conclusions from posts in this thread are:
> >
> > 3. there are methods (like cryptographic "random" sequence), which
> > guarantee no conflicts. So one should resort to that.
> >
> >
> Some web research suggests that random sequences are not great for
> indexes because of the resultant "keyspace fragmentation". I'm
> assuming that means a low number of nodes in the btree leafs, so an
> increase in memory usage for the index?

Not sure what type of indexes would be affected by that problem, but I don't think Postgres' btrees would be.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

___

Well, there is a caveat.
If I create table and couple indexes like this:

create table test_index_size(c1 int, c2 int, constraint U2 unique (c2));
create index U1 on test_index_size(c1);

and populate them:

insert into test_index_size(c1, c2)
select round(random()*1000000), a from generate_series(1,1000000) a limit 100000;

and then check the size of the indexes:
for "select pg_relation_size('U1')" I get  2834432
while " select pg_relation_size('U2')" returns 2285568.

So, index based on randomly populated column is bigger than the one based on sequentially populated.

But, on the other hand, after:

reindex table test_index_size;

both indexes are of the same size: 2260992.

Regards,
Igor Neyman






Re: INSERT ... ON CONFLICT DO UPDATE

От
Jeff Janes
Дата:
On Mon, Jul 20, 2015 at 7:01 AM, Geoff Winkless <pgsqladmin@geoff.dj> wrote:
On 20 July 2015 at 14:33, Rafal Pietrak <rafal@ztk-rp.eu> wrote:
If I'm not mistaken, the conclusions from posts in this thread are:

3. there are methods (like cryptographic "random" sequence), which
guarantee no conflicts. So one should resort to that.


Some web research suggests that random sequences are not great for indexes because of the resultant "keyspace fragmentation". I'm assuming that means a low number of nodes in the btree leafs, so an increase in memory usage for the index?

Just a thought.

If you are inserting random values into a very large btree index, you dirty at least one randomly-located block (the leaf page of the index) per inserted row.  This can cause some serious IO pain when that dirty data needs to be written out without the benefit of write-combining or sequential writes.

However, this is usually only a concern for bulk loading operations, not real time operations. If you are generating money-equivalent vouchers at a rate of thousands per second, hopefully you can afford some nice hardware to run it on.

Cheers,

Jeff

Re: INSERT ... ON CONFLICT DO UPDATE

От
Francisco Olarte
Дата:
Hi Alvaro.

On Mon, Jul 20, 2015 at 4:07 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
>> Some web research suggests that random sequences are not great for indexes
>> because of the resultant "keyspace fragmentation". I'm assuming that means
>> a low number of nodes in the btree leafs, so an increase in memory usage
>> for the index?
> Not sure what type of indexes would be affected by that problem, but I
> don't think Postgres' btrees would be.

I do not know if postgres btrees do it, but I know you can build btree
inserting code in such a way that inserting nodes sequentially leads
to optimally filled leaf pages an denser trees, as an optimization for
an easy and common case, which are better than the normal ones
generated by random insertion. So is not that random are bad, it is
that ordered are very good, or in another way thay are not affected by
a problem, but do not get the advantage.

Francisco Olarte.


Re: INSERT ... ON CONFLICT DO UPDATE

От
Francisco Olarte
Дата:
Hi Igor:

On Mon, Jul 20, 2015 at 4:56 PM, Igor Neyman <ineyman@perceptron.com> wrote:
> Well, there is a caveat.
> If I create table and couple indexes like this:
..
> and populate them:
> and then check the size of the indexes:
> for "select pg_relation_size('U1')" I get  2834432
> while " select pg_relation_size('U2')" returns 2285568.
> So, index based on randomly populated column is bigger than the one based on sequentially populated.
> But, on the other hand, after:
> reindex table test_index_size;
> both indexes are of the same size: 2260992.

I would totally expect this. On reindex you get the values from a tree
walk, so both of them come in order, and being a reindex ( where you
know in advance the full set of values, so you can plan ahead where to
put the leaves, how many levels you need and how many splits ) you get
an even bigger advantage from the squential insertion case.

Francisco Olarte.


Re: INSERT ... ON CONFLICT DO UPDATE

От
Francisco Olarte
Дата:
Hi Rafal:

On Mon, Jul 20, 2015 at 3:33 PM, Rafal Pietrak <rafal@ztk-rp.eu> wrote:
> 3. there are methods (like cryptographic "random" sequence), which
> guarantee no conflicts. So one should resort to that.
> Regarding the last point. Usually, I implement one-time used vouchers as
> rows in table like:
> CREATE TABLE (voucher int not null, consumed bool, expire timestamp not
> null default timestamp_pl_interval(now()::timestamp, '2
> min'::interval),..., unique (voucher,consumed) );
> with CONSUMED column NULLyfied when voucher is used. The entire row of
> consumed voucher is purged after clearence and verification, which
> happen significantly later.

Random as a primary key is considered a bad practice by many people
with much experience, nullyfing it too. Many people even frown on just
changing the primary key ( and one of the reasons for using serial as
keys in many situations is to have a guaranteed not null unchanging
value ).

> Such short lived (when active) voucher is usually just 6-digit long, to
> help people enter it.

Then, random and with a narrow value domain, they make, IMNSHO,  a
really bad  choice for primery keys.

> I don't know much about cryptography, but would a generic encryption
> function (like that indicated by Daniel) have the same "waking through
> the entire range-space" behavior as the original when that range-space
> is externally (by my application) truncated to those 6 digits? If not,
> would it be as efficient in conflict avoidance as used with original
> 32-bit range-space?

An encryption function never has collisions ( do not confuse with a
hash ). If it had you would be unable to decrypt it. The problem is
the value domain for you. i.e., for your example you could choose a
bit stream cipher applied to a 20 bit value. This is a moderately
complex prolem to find or build ( from the classic cryptographic
primitives nearly every language includes ). This will map every
different 20 bit input value to a different 20 bit output value, so
your value domain will be 20 bit numbers, your inputs will be the 10^6
6 digit numbers and the outputs will be 10^6 DIFFERENT 20bit numbers,
of wich you could expect about 4.8% of them ( 2^20-10^6)/10^6 to have
7 digits ( with a leading one in this case ). To solve that problem
you could use 19 digit input/output numbers or try to fin a decimal
cypher which uses exactly 10^6 input digits. If you use a 32 bit block
cypher it will not have collisions, but if you TRUNCATE the 32 bit ~
9.5 digits output to 6 digits, you are no longer encrypting. You may
call it hashing or whatever, but that is NOTt encryption, you would
have collisions.

> Then again. Is it really a "good practice" to rely on a programmer to
> peek "proper/correct encryption helper" instead of providing him/her
> with a database-integrated tool for a "well defined" and not so rare
> usage scenario as "random default" for UNIQUE/PK column?

Many of us are too old to get caught by this. This question is like
asking "Is it good practice to hit a person with a 10 pound hammer in
the head instead of giving a cookie?". There are other options.

IMO NOT modifying a very complex chunk of code ( the server code doing
the inserts and checking the collision cases and acting on them, plus
the parser for insert queries plus .... ) and risking all the bugs it
may introduce to help with inserting random pk is good practice. It
doesn't matter if the requesting programmer peeks a bad encryption
methods, keeps his old code for inserting random ids or introduces
bugs in his program, the potential harm to the rest of the users is
too great.

> So my conclusion from this thread is that as this usage scenario does
> not seem to be foreseen by current implementation of ON CONFLICT
> transaction, a workaround exists (like: cryptographic range-walker).
> Being it a workaround, I'd vote for some direct supported of that
> scenario in the future at database level.

Bear in mind your problem is totally ill defined. I mean, you want to
insert a random 6 digits ID, and want the database to keep trying
until it finds an unique one. What should it do if there already are
10^6 records in the db? Stall until a free one is found? abort? This
kind of uses is very narrow, and very difficult to get right , and
normally confined to the application domain. Even if you choose a
totally correct encryption function for collision avoidance, like
identity, you are going to have problems in your scheme.

You are not voting for anything, you need a feature proposal to vote
upon. So far the only one I could extract from this thread is
"something which magically solves the current Rafal problem". I would
vote against that.

Francisco Olarte.


Re: INSERT ... ON CONFLICT DO UPDATE

От
John R Pierce
Дата:
On 7/20/2015 7:01 AM, Geoff Winkless wrote:


Some web research suggests that random sequences are not great for indexes because of the resultant "keyspace fragmentation". I'm assuming that means a low number of nodes in the btree leafs, so an increase in memory usage for the index?

that suggests some folks overthink their indexing strategies and end up 'overoptimized'.

anyways, a simple REINDEX fixes all sorts of index fragmentation


-- 
john r pierce, recycling bits in santa cruz

Re: INSERT ... ON CONFLICT DO UPDATE

От
Francisco Olarte
Дата:
Hi Rafal:

On Mon, Jul 20, 2015 at 3:33 PM, Rafal Pietrak <rafal@ztk-rp.eu> wrote:
> Regarding the last point. Usually, I implement one-time used vouchers as
> rows in table like:
> CREATE TABLE (voucher int not null, consumed bool, expire timestamp not
> null default timestamp_pl_interval(now()::timestamp, '2
> min'::interval),..., unique (voucher,consumed) );
> with CONSUMED column NULLyfied when voucher is used. The entire row of
> consumed voucher is purged after clearence and verification, which
> happen significantly later.
> Such short lived (when active) voucher is usually just 6-digit long, to
> help people enter it.

In this case I think you are mixing vouchers with voucher-numbers. IMO
you could get a better dessign by using an auxiliary table and not
nullifying the number after been consumed. Having only 6 digits I
tould try:

1.- Add a serial PK column to voucher table if needed to link it with
the rest of the system.
2.- Create an index on voucher where consumed is true.
3.- Add another table, voucher_nums, with columns voucher, order,
used. Populate it with the 10^6 vouchers and a random order value.
Also, this lets you switch to alphanumeric vouchers, or zap the ones
with two consecutive equal digits, or whatever.
4.- Make a function to select a free voucher, you can do 'select from
voucher_nums where not used order by order limit 1¡', if yout put this
into a with clause of an update-returning setting used to true  to you
get a one shot way of getting a free voucher. If you add a partial
index on order where not used, you get a fast way of getting it.
5.- Make another function to free a voucher num, which sets consumed
to true on vouchers, used to false and order to a random number  on
voucher_nums.

This way you keep the old voucher numbers, and you get no collisions.
If you run for some years, you can see which vouchers have been used,
so you can debug potential problems.

Francisco Olarte.


Re: INSERT ... ON CONFLICT DO UPDATE

От
Rafal Pietrak
Дата:
Franscisco,

W dniu 21.07.2015 o 09:34, Francisco Olarte pisze:
> Hi Rafal:
>
> On Mon, Jul 20, 2015 at 3:33 PM, Rafal Pietrak <rafal@ztk-rp.eu> wrote:
>> Regarding the last point. Usually, I implement one-time used vouchers as
>> rows in table like:
>> CREATE TABLE (voucher int not null, consumed bool, expire timestamp not
>> null default timestamp_pl_interval(now()::timestamp, '2
>> min'::interval),..., unique (voucher,consumed) );
>> with CONSUMED column NULLyfied when voucher is used. The entire row of
>> consumed voucher is purged after clearence and verification, which
>> happen significantly later.
>> Such short lived (when active) voucher is usually just 6-digit long, to
>> help people enter it.
>
> In this case I think you are mixing vouchers with voucher-numbers. IMO
> you could get a better dessign by using an auxiliary table and not
> nullifying the number after been consumed. Having only 6 digits I

Hmmm. I don't think so.

1. I'm not nullifying the number, just the CONSUMED flag. The row stays
otherwise pretty much untouched untill clearing time, when it's removed
from the table.

2. And I don't thing I mix vouchers with voucher-numbers.... since there
is no distinction.

Bringing some real live examples of "vouchers" to back that later
statement, we have:

1) a 6-digit authorization code (a voucher) used by payment system to
confirm payment authorization.

2) 4-8digit one-time PIN delivered by SMS used to open "some accounts".

3) 6-digit SMS confirmation code used by internet banking.

4) 14-digit voucher used to topup mobile pre-paied accounts.

5) 4-8 digit vouchers used as lunch tickets at conferences. (this could
possibly used as printed qr-code of UUID, since cafeterias usually have
bar-code readers; but having it as "human-size" 6-digit pin has it's
benefits too).

In all those cases "the physical problem" needs just a single N-digit
number (a voucher), which is as short as it's lifespan/population allows
for while keeping it relatively safe.

The application just needs to create a unique (for a period of time)
number, and "consume" it at certain point. Everything else would be
"implementation burden", which should be kept to minimum.

> tould try:
>
> 1.- Add a serial PK column to voucher table if needed to link it with
> the rest of the system.
> 2.- Create an index on voucher where consumed is true.
> 3.- Add another table, voucher_nums, with columns voucher, order,
> used. Populate it with the 10^6 vouchers and a random order value.
> Also, this lets you switch to alphanumeric vouchers, or zap the ones
> with two consecutive equal digits, or whatever.
> 4.- Make a function to select a free voucher, you can do 'select from
> voucher_nums where not used order by order limit 1¡', if yout put this
> into a with clause of an update-returning setting used to true  to you
> get a one shot way of getting a free voucher. If you add a partial
> index on order where not used, you get a fast way of getting it.
> 5.- Make another function to free a voucher num, which sets consumed
> to true on vouchers, used to false and order to a random number  on
> voucher_nums.

This looks a bit like an overkill for the above examples.

But I have other thoughts on the use of cryptographic sequences here. It
has the pitfall of being sensitive to out-of-the-sequence poisoning, I
mean: When another instance of an application starts issuing another
sequence of vouchers, at certain point those sequences collide and
applications despite using "guaranteed  lack of collisions" will have a
collision. So the application *will have to have* a re-issuing of an
INSERT implemented anyway. If so, the whole point of using cryptographic
sequence is missing. So, even though this collision is not statistically
significant, but just its possibility results in that application have
to take care of re-issuing of an INSERT.

Using database.sequence() function to seed the cypher is not secure enough.

On the other hand, the "ON CONFLICT RETRY" has a nice feature for an
application programmer (like myself) that it leaves us free of the
implementation of the re-issue of an INSERT. One database-schema
designer does that for all of us.

But knowing if that usage scenario is too rare to match the heavy
lifting the implementation required, is beyond my experience.

-R


Re: INSERT ... ON CONFLICT DO UPDATE

От
Geoff Winkless
Дата:

On 21 July 2015 at 11:43, Rafal Pietrak <rafal@ztk-rp.eu> wrote:
On the other hand, the "ON CONFLICT RETRY" has a nice feature for an
application programmer (like myself) that it leaves us free of the
implementation of the re-issue of an INSERT. One database-schema
designer does that for all of us.

But knowing if that usage scenario is too rare to match the heavy
lifting the implementation required, is beyond my experience.

The usage scenario is rare and I'm sure that doesn't help you but the point is that it's very very easy to write a function that does what you want. It's not easy at all to write a function that does UPSERT or DO NOTHING consistently and efficiently.

The fact that you refuse to use the mechanism provided to you by the database developers doesn't invalidate the fact that that's the simplest and easiest way to achieve what you want.

I'm sorry to be harsh (again) about this but adding extra complexity to the PG system to achieve something that is _easily_ achieved through the existing mechanisms isn't something that is likely to get widespread support.

Geoff

Re: INSERT ... ON CONFLICT DO UPDATE

От
Francisco Olarte
Дата:
Hi Rafal:

On Tue, Jul 21, 2015 at 12:43 PM, Rafal Pietrak <rafal@ztk-rp.eu> wrote:
> W dniu 21.07.2015 o 09:34, Francisco Olarte pisze:
>> In this case I think you are mixing vouchers with voucher-numbers. IMO
>> you could get a better dessign by using an auxiliary table and not
>> nullifying the number after been consumed. Having only 6 digits I
> Hmmm. I don't think so.
....
> The application just needs to create a unique (for a period of time)
> number, and "consume" it at certain point. Everything else would be
> "implementation burden", which should be kept to minimum.

I see your points, totally opposite opinions, so no point in
discussing it, discard my sugestions as not aplicable SVP.

....
> This looks a bit like an overkill for the above examples.

It certainly is for your style of dessign, workng target, discard it.


> But I have other thoughts on the use of cryptographic sequences here.

I wouldn't call it that, its misleading. It's just encrypted sequences.

> It
> has the pitfall of being sensitive to out-of-the-sequence poisoning, I
> mean: When another instance of an application starts issuing another
> sequence of vouchers, at certain point those sequences collide and
> applications despite using "guaranteed  lack of collisions" will have a
> collision.

Well, if you have aplication instance specific sequences, of course
you have. But in this case even plain unencrypted sequences hae them.

> So the application *will have to have* a re-issuing of an
> INSERT implemented anyway.

Of course, because the only point of using instance specific sequences
instead of serial like you normally do must be having the possibility
of collisions to justify a the existence of a re-issuing code and
exercise it.

> If so, the whole point of using cryptographic
> sequence is missing.

No. The whole point of using a global sequence ( in the db ) is
avoiding collisions, be it encrypted or plain. The whole point of
using crypto is to make it look random. If you use an application
specific cryptographic sequence is because you want colisions (
success, as told above ) which looks random ( success too ).

If you do not want colisions, use a global sequence.

> So, even though this collision is not statistically
> significant, but just its possibility results in that application have
> to take care of re-issuing of an INSERT.

I use to tell people there are three meaninful cardinalities in
computing, zero, one and many. And three probabilities ( NOT
possibilities ), zero, one and other==(0,1). Except in some lucky
domains you have to trat every (0,1) probability as been possible ( in
fact my three probability values map nicely to impossible, possible
and certain ).

> Using database.sequence() function to seed the cypher is not secure enough.

What are you talking about? Where did you get that seeding idea? You
do not seed the cipher, you use the ciphered sequence as voucher. In
fact I've done this with session ids. I use a sequence for the ID and
send the user the ciphered val. When it comes back I just decipher it
and search. I did not have your 6-digit problems, so I just used 128
bit blocks, and it worked nicely. And I did not have any ciphered data
in the DB.

> On the other hand, the "ON CONFLICT RETRY" has a nice feature for an
> application programmer (like myself) that it leaves us free of the
> implementation of the re-issue of an INSERT. One database-schema
> designer does that for all of us.
> But knowing if that usage scenario is too rare to match the heavy
> lifting the implementation required, is beyond my experience.

Saying OCR is a nice feature is like saying MAGIC RAINBOW OVERDRIVE is
a nice feature for a car. It does not exist, and nobody has described
it with enough detail so people can assses its usefulness or
implementation difficulty. A careful definition of the corner case
will be needed. And even with it you have the possibility of inifinite
colisions ( either due to generating too many 'vouchers' or to sheer
bad luck ( like collisions among application instances ). If you try
to write a nicely wrapped up description of the functionality maybe
someone could see the usefulness and implement it, but I think this is
possible but unlikely.

Francisco Olarte.