Обсуждение: BUG #14126: INSERT ON CONFLICT DO NOTHING auto increments serial primary key when no insert happens.

Поиск
Список
Период
Сортировка
The following bug has been logged on the website:

Bug reference:      14126
Logged by:          Casey Wireman
Email address:      cwire4@gmail.com
PostgreSQL version: 9.5.2
Operating system:   Windows 7
Description:

It's unclear to me if this is desired behavior or not, but when doing the
following:

insert into table(column) values (value) on conflict DO NOTHING

the first time a new entry is inserted it works as expected and
autoincrements the serial primary key as normal, however if I try to insert
this again the on conflict correctly detects that the value already exists,
but internally seems to autoincrement the serial key as many times as you
try to do the failed insert since on the next insert of a new value the key
value is

(previous successful key value + number of failed inserts)

It seems to me that intuitively the key should only increment on a
successful insert, not just an insert statement being executed.  For one,
this unnecessarily reduces the keyspace available for the column.
On Fri, May 6, 2016 at 8:55 AM,  <cwire4@gmail.com> wrote:
> It's unclear to me if this is desired behavior or not, but when doing the
> following:
....
> (previous successful key value + number of failed inserts)
> It seems to me that intuitively the key should only increment on a
> successful insert, not just an insert statement being executed.  For one,
> this unnecessarily reduces the keyspace available for the column.

I suspect this is "working as designed". The same things happen if you
do a lot of inserts and then rollback a transaction. This is because
serial use sequences, which are not mean to generate exact correlative
values, but to a mean to generate unique keys with very high
concurrency.

What sequences do is grab a chunk of values ( may be of 1 or more, it
depends ) per backend needing them and each time you ask for one they
give you a unique value and burn it. This is great for concurrency,
and it is fast. The fact they normally use correlative numbers is
normally a by-product of being the easier way of generating different
ids, but their purpose is not to do it exactly.

Think of it, if they needed to generate exact correlative amounts
anytime someone needed a number they would need to be locked until the
operation using it commits or rolls back, and informs everyone. It can
be done this way, but is much slower and normally not needed.

Francisco Olarte.
On 05/06/16 13:39, Francisco Olarte wrote:
> On Fri, May 6, 2016 at 8:55 AM,  <cwire4@gmail.com> wrote:
>> It's unclear to me if this is desired behavior or not, but when doing the
>> following:
> ....
>> (previous successful key value + number of failed inserts)
>> It seems to me that intuitively the key should only increment on a
>> successful insert, not just an insert statement being executed.  For one,
>> this unnecessarily reduces the keyspace available for the column.
>
> I suspect this is "working as designed". The same things happen if you
> do a lot of inserts and then rollback a transaction. This is because
> serial use sequences, which are not mean to generate exact correlative
> values, but to a mean to generate unique keys with very high
> concurrency.
>
> What sequences do is grab a chunk of values ( may be of 1 or more, it
> depends ) per backend needing them and each time you ask for one they
> give you a unique value and burn it. This is great for concurrency,
> and it is fast. The fact they normally use correlative numbers is
> normally a by-product of being the easier way of generating different
> ids, but their purpose is not to do it exactly.
>
> Think of it, if they needed to generate exact correlative amounts
> anytime someone needed a number they would need to be locked until the
> operation using it commits or rolls back, and informs everyone. It can
> be done this way, but is much slower and normally not needed.
>
> Francisco Olarte.

A bit more explanation: default value for serial columns are gotten from
a sequence not at saving tuple in a storage, but _before_ tuple tries to
pass triggers, checks (constraints, PKs, uniqueness, not nulls etc.) and
inserts it to a table.

And, as Francisco mentioned, sequence being increased doesn't reverts if
transaction rolls back or insert does nothing. That's why its value
monotonically increased even if number of real insertions is not changed.

--
Best regards,
Vitaly Burovoy
On Fri, May 6, 2016 at 6:39 AM, Francisco Olarte <folarte@peoplecall.com> wrote:
>> (previous successful key value + number of failed inserts)
>> It seems to me that intuitively the key should only increment on a
>> successful insert, not just an insert statement being executed.  For one,
>> this unnecessarily reduces the keyspace available for the column.
>
> I suspect this is "working as designed". The same things happen if you
> do a lot of inserts and then rollback a transaction. This is because
> serial use sequences, which are not mean to generate exact correlative
> values, but to a mean to generate unique keys with very high
> concurrency.

It is. There have been several -bugs threads on this already. e.g.,
the following two:


http://www.postgresql.org/message-id/flat/CAM3SWZQMfR6Zfe3A0Nr4ddko8xZrijAuQQ=EcGjGeJSs2piAXA@mail.gmail.com#CAM3SWZQMfR6Zfe3A0Nr4ddko8xZrijAuQQ=EcGjGeJSs2piAXA@mail.gmail.com


http://www.postgresql.org/message-id/flat/CAM3SWZS2jd1Ci2TR2bjBXqZgK3JdoFin9yGbC5TQdYdF12RBuA@mail.gmail.com#CAM3SWZS2jd1Ci2TR2bjBXqZgK3JdoFin9yGbC5TQdYdF12RBuA@mail.gmail.com

--
Peter Geoghegan
On 2016-05-06 11:18:24 -0700, Peter Geoghegan wrote:
> On Fri, May 6, 2016 at 6:39 AM, Francisco Olarte <folarte@peoplecall.com> wrote:
> >> (previous successful key value + number of failed inserts)
> >> It seems to me that intuitively the key should only increment on a
> >> successful insert, not just an insert statement being executed.  For one,
> >> this unnecessarily reduces the keyspace available for the column.
> >
> > I suspect this is "working as designed". The same things happen if you
> > do a lot of inserts and then rollback a transaction. This is because
> > serial use sequences, which are not mean to generate exact correlative
> > values, but to a mean to generate unique keys with very high
> > concurrency.
>
> It is. There have been several -bugs threads on this already. e.g.,
> the following two:
>
>
http://www.postgresql.org/message-id/flat/CAM3SWZQMfR6Zfe3A0Nr4ddko8xZrijAuQQ=EcGjGeJSs2piAXA@mail.gmail.com#CAM3SWZQMfR6Zfe3A0Nr4ddko8xZrijAuQQ=EcGjGeJSs2piAXA@mail.gmail.com
>
>
http://www.postgresql.org/message-id/flat/CAM3SWZS2jd1Ci2TR2bjBXqZgK3JdoFin9yGbC5TQdYdF12RBuA@mail.gmail.com#CAM3SWZS2jd1Ci2TR2bjBXqZgK3JdoFin9yGbC5TQdYdF12RBuA@mail.gmail.com

Maybe we should add a note to the docs then?

Andres
On Fri, May 6, 2016 at 11:20 AM, Andres Freund <andres@anarazel.de> wrote:
> Maybe we should add a note to the docs then?

I was just about to suggest that. It's getting out of hand.

I'll look into it.


--
Peter Geoghegan
On Fri, May 6, 2016 at 8:20 PM, Andres Freund <andres@anarazel.de> wrote:
> Maybe we should add a note to the docs then?

Well, all I now about sequences comes from there, so I think they are clear.

But I've read the docs and noticed the warning:

"Important: To avoid blocking concurrent transactions that obtain
numbers from the same sequence, a nextval operation is never rolled
back; that is, once a value has been fetched it is considered used,
even if the transaction that did the nextval later aborts. This means
that aborted transactions might leave unused "holes" in the sequence
of assigned values."

Which may lead people to think only rollbacks 'burn' numbers. So I
think some rewording indicating:

1.- once fetched a value is considered used even if the using command
does not complete, which may be caused by rollbacks, aborts, triggers,
insert_or_update and/or any other things.

2.- IIRC using cache>1 can lead to obtain sequence numbers out of
order is using the same sequence in two different sessions. I assumed
this is normal and do not have a problem with it, but people may get
surprised by it, so I think a note there could help.

I do not trust my english to suggest an exact rewording, but I think
this could lead to less reports AND make many of them solvable by
redirecting at the manual.

Francisco Olarte.
Francisco Olarte <folarte@peoplecall.com> writes:
> But I've read the docs and noticed the warning:

> "Important: To avoid blocking concurrent transactions that obtain
> numbers from the same sequence, a nextval operation is never rolled
> back; that is, once a value has been fetched it is considered used,
> even if the transaction that did the nextval later aborts. This means
> that aborted transactions might leave unused "holes" in the sequence
> of assigned values."

> Which may lead people to think only rollbacks 'burn' numbers.

Yeah, good point.  I'll do some wordsmithing on that.

> 2.- IIRC using cache>1 can lead to obtain sequence numbers out of
> order is using the same sequence in two different sessions. I assumed
> this is normal and do not have a problem with it, but people may get
> surprised by it, so I think a note there could help.

I think that's explained already, no?

            regards, tom lane
On Sat, May 7, 2016 at 5:07 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Francisco Olarte <folarte@peoplecall.com> writes:
>> 2.- IIRC using cache>1 can lead to obtain sequence numbers out of
>> order is using the same sequence in two different sessions.
......
>> so I think a note there could help.

> I think that's explained already, no?

Yes, it is. May be a little hard to find but it certainly is, my
fault.  I've repeated my docs reading ( I did on the first reading ).
First I went to 8.1.3 datatypes/serial, which pointed me "see
nextval() in section 9.16 for details.". There I found "with default
parameters, successive nextval calls will return successive values
beginning with 1. Other behaviors can be obtained by using special
parameters in the CREATE SEQUENCE command; see its command reference
page for more information.". Then I rescaned that page AND missed the
Notes. My fault. A link to that note or to create sequence in the
nextval() page may be useful, but maybe I lost it because being
familiar with the behaviour I scanned too fast.

Francisco Olarte.
On Fri, May 6, 2016 at 11:21 AM, Peter Geoghegan <pg@heroku.com> wrote:
> I'll look into it.

Looks like Tom took care of it.

--
Peter Geoghegan