Обсуждение: Concurrent INSERT statements with RETURNING clause resetting SERIAL sequence

Поиск
Список
Период
Сортировка

Concurrent INSERT statements with RETURNING clause resetting SERIAL sequence

От
Sebastien Flaesch
Дата:
Hello!

I try to update the underlying sequence of a SERIAL column, by using a RETURNING clause in my INSERT statement, which is checking that the column value is greater than the last_value of my sequence, and reset the sequence with setval() if needed.

When running several client processes in concurrently, using this INSERT statement, I get duplicate primary key errors.

The SQL code of my INSERT looks like this:

        insert into tab1 (name) VALUES ('aaa')
            returning tab1.pkey,
                      (select case
                         when tab1.pkey>=(select last_value from public.tab1_pkey_seq)
                              then setval('public.tab1_pkey_seq',tab1.pkey,true)
                         else 0
                       end)

To me something is wrong with the behavior of PostgreSQL:

The whole INSERT statement (including the code in the RETURNING clause), should execute in a ATOMIC manner.

Obviously if several processes execute that code in parallel, and if the select last_value / setval() get mixed up, we are in trouble...

Can something confirm this is a PostgreSQL bug, or help me to find the right code to avoid the issue?

To reproduce:
  1. Create tab1 table and stored procedure for testing with insert-tab1.sql
  2. Run several psql processes in parallel, calling the stored procedure: sh ./start2.sh (Linux/bash)
Thanks!

Seb
Вложения

Re: Concurrent INSERT statements with RETURNING clause resetting SERIAL sequence

От
Tom Lane
Дата:
Sebastien Flaesch <sebastien.flaesch@4js.com> writes:
> I try to update the underlying sequence of a SERIAL column, by using a RETURNING clause in my INSERT statement, which
ischecking that the column value is greater than the last_value of my sequence, and reset the sequence with setval() if
needed.

It's not too surprising that that doesn't work, if you're coding it
based on this assumption:

> The whole INSERT statement (including the code in the RETURNING clause), should execute in a ATOMIC manner.

Sequence-related actions are always carried out immediately, they do
not participate in any atomicity guarantees about the calling transaction.
Without this, any sequence update would have to block all concurrent
uses of that sequence until they see whether the first update commits.

If that's the behavior you want, you can build it out of standard SQL
facilities (e.g. update a one-row table).  The point of sequence objects
is exactly to provide a feature with better concurrent performance,
at the cost of no rollback guarantees.

So, there's no bug here, and calling it one isn't going to change
anybody's mind about that.

            regards, tom lane



Re: Concurrent INSERT statements with RETURNING clause resetting SERIAL sequence

От
Sebastien Flaesch
Дата:
Tom,
If that's the behavior you want, you can build it out of standard SQL facilities (e.g. update a one-row table).
Can you elaborate please?

Do you mean the code should use an UPDATE on a one-row table to acquire a lock?

Seb

Re: Concurrent INSERT statements with RETURNING clause resetting SERIAL sequence

От
Francisco Olarte
Дата:
On Tue, 19 Jul 2022 at 18:50, Sebastien Flaesch
<sebastien.flaesch@4js.com> wrote:
>> If that's the behavior you want, you can build it out of standard SQL facilities (e.g. update a one-row table).
> Can you elaborate please?
> Do you mean the code should use an UPDATE on a one-row table to acquire a lock?

That's the usual SQL way. You can, using the appropiate isolation
level, select the max() from the table using the IDs  ( but this will
probably lock it whole ), you can use a one-row table with a current
int column as a sequence, and set current=current+1, but this will
lock everyone inserting ( and need an extra table ). You can use a
name-current unique sequences table and update that ( it saves tables
but it can worsen locking, as many engines locks whole pages ). The
consecutive gapless sequence problem surfaces a lot, but there is no
good way to solve it without locking, as once you get the value
everybody else needing it has to wait until you decide whether to
commit on rollback.

Sequences are for when you need autogenerated keys. They use
consecutive integer blocks nearly everywhere because they are easy to
code, play well with btree indexes, correlate well with insertion
times, but they give up the "no hole" approach to improve performance.
They even can return values unordered in different sessions (
sequences acquire, and burn,  blocks in a locking way, when you need
high concurrency you make them return big blocks, but them values from
different sessions can be delivered out of order ).

If you need correlative numbers, byte the bullet and code it, trying
to do it with sequences will probably end with an approach which will
fail in some corner cases and be slower than just coding it with a
single row table ( just remember to check it, so you do not have the
fun which ensues in oracle when someone puts two rows in dual in the
old days ). All the queries you have posted are probably more complex
than a couple procedures to update and read a table, even using eval
to simulate named sequences if you want ( just remember to always
pre-increment before reading, there is a reason everybody does it that
way ).

As a remark, in Spain bill numbers need to be gapless increasing. I
have done it with a sequence ( with cache 1, so I doubt it is much
more performant than a single row table, as it has a backing element
which needs to be saved ), and just used another process which swept
the values and filled the gap a posteriori ( there is a way to fill a
gap, has it legal uses, you can use it for that as long as you do not
have too many, basically insert cancelled bills in the gaps ). I
probably would do it with the one-row table now for simplicity, I
doubt I gained enough performance to justify the 20-something perl
lines plus the crontab entry to use sequences. As beard turns grey I
apreciate more the simpler ways to do things.

Francisco Olarte.



Re: Concurrent INSERT statements with RETURNING clause resetting SERIAL sequence

От
Gavan Schneider
Дата:

On 20 Jul 2022, at 4:08, Francisco Olarte wrote:

As a remark, in Spain bill numbers need to be gapless increasing. I
have done it with a sequence ( with cache 1, so I doubt it is much
more performant than a single row table, as it has a backing element
which needs to be saved ), and just used another process which swept
the values and filled the gap a posteriori ( there is a way to fill a
gap, has it legal uses, you can use it for that as long as you do not
have too many, basically insert cancelled bills in the gaps ). I
probably would do it with the one-row table now for simplicity, I
doubt I gained enough performance to justify the 20-something perl
lines plus the crontab entry to use sequences. As beard turns grey I
apreciate more the simpler ways to do things.

One answer to this problem has been around for a while, and my version is shown below. No extra table is needed, just a windowed look at the table where the reference is used. My usage is at human speed so performance is not an issue but it should be pretty good if there are minimal holes. What is not addressed is that a reference can be reissued upto the time the calling process commits an entry in the table and takes that reference out of circulation. There are different approaches to handling/preventing such collisions.

CREATE OR REPLACE FUNCTION accounts.next_active_reference()
RETURNS integer
LANGUAGE 'sql'
VOLATILE  LEAKPROOF STRICT
PARALLEL UNSAFE
COST 3000 -- pure guesstimate
AS $BODY$	SELECT L.transaction_ref + 1 AS start	FROM accounts.transaction_refs AS L	LEFT OUTER JOIN accounts.transaction_refs AS R	ON L.transaction_ref + 1 = R.transaction_ref	WHERE R.transaction_ref IS NULL	AND L.transaction_ref >700 -- historic reasons only, added to existing system	ORDER BY L.transaction_ref	LIMIT 1;
$BODY$;

COMMENT ON FUNCTION accounts.next_active_reference() IS
$COMMENT$
Determines the next available reference number, making sure to fill any holes.
The holes in the sequence prior to 700 are ignored (pure history), and this code
will make sure any out of sequence data blocks will not cause conflicts.
Credits:
Ref: <http://www.xaprb.com/blog/2005/12/06/find-missing-numbers-in-a-sequence-with-sql/>
2022 update: this link is now dead, only reporting "There is nothing here".
$COMMENT$;

Regards

Gavan Schneider
——
Gavan Schneider, Sodwalls, NSW, Australia
Explanations exist; they have existed for all time; there is always a well-known solution to every human problem — neat, plausible, and wrong.
— H. L. Mencken, 1920

Re: Concurrent INSERT statements with RETURNING clause resetting SERIAL sequence

От
Sebastien Flaesch
Дата:

Changing a bit the CASE/WHEN logic in the RETURNING clause solves the issue when concurrently inserting rows without specifying explicitly a value for the serial column (INSERTs specifying values for the serial column are seldom used by concurrent programs inserting many rows):

        insert into tab1 (name) VALUES ('aaa')
            returning tab1.pkey,
                      (select case
                         when tab1.pkey>(select last_value from public.tab1_pkey_seq)
                              then setval('public.tab1_pkey_seq',tab1.pkey,true)
                         else 0
                       end)

Example (assuming there is no unique constraint!):

  1. INSERT without value for serial column => sequence=0+1 => last_value = 1
  2. INSERT without value for serial column => sequence=1+1 => last_value = 2
  3. INSERT with value 2 for serial column => tab1.pkey(2) > last_value(2) ? false => no sequence reset
  4. INSERT without value for serial column => sequence=2+1 => last_value = 3
  5. INSERT with value 4 for serial column => tab1.pkey(4) > last_value ? true => setval(seqname,4,true)
  6. INSERT without value for serial column => sequence=4+1 => last_value = 5
This will also save setval() calls for each INSERT not specifying a value explicitly for the serial column.

Stop me if I am wrong... 🙂

Seb




Seb

From: Tom Lane <tgl@sss.pgh.pa.us>
Sent: Tuesday, July 19, 2022 5:41 PM
To: Sebastien Flaesch <sebastien.flaesch@4js.com>
Cc: pgsql-general <pgsql-general@lists.postgresql.org>
Subject: Re: Concurrent INSERT statements with RETURNING clause resetting SERIAL sequence
 
EXTERNAL: Do not click links or open attachments if you do not recognize the sender.

Sebastien Flaesch <sebastien.flaesch@4js.com> writes:
> I try to update the underlying sequence of a SERIAL column, by using a RETURNING clause in my INSERT statement, which is checking that the column value is greater than the last_value of my sequence, and reset the sequence with setval() if needed.

It's not too surprising that that doesn't work, if you're coding it
based on this assumption:

> The whole INSERT statement (including the code in the RETURNING clause), should execute in a ATOMIC manner.

Sequence-related actions are always carried out immediately, they do
not participate in any atomicity guarantees about the calling transaction.
Without this, any sequence update would have to block all concurrent
uses of that sequence until they see whether the first update commits.

If that's the behavior you want, you can build it out of standard SQL
facilities (e.g. update a one-row table).  The point of sequence objects
is exactly to provide a feature with better concurrent performance,
at the cost of no rollback guarantees.

So, there's no bug here, and calling it one isn't going to change
anybody's mind about that.

                        regards, tom lane

Re: Concurrent INSERT statements with RETURNING clause resetting SERIAL sequence

От
Thomas Kellerer
Дата:
Sebastien Flaesch schrieb am 19.07.2022 um 18:50:
> Tom,
>
>     /If that's the behavior you want, you can build it out of standard SQL facilities (e.g. update a one-row table).
>     /
>
> Can you elaborate please?
>
> Do you mean the code should use an UPDATE on a one-row table to acquire a lock?

I assume something like this:

https://blog.sql-workbench.eu/post/gapless-sequence/





Re: Concurrent INSERT statements with RETURNING clause resetting SERIAL sequence

От
Francisco Olarte
Дата:
Hi Gavan.

On Wed, 20 Jul 2022 at 00:10, Gavan Schneider <list.pg.gavan@pendari.org> wrote:
> On 20 Jul 2022, at 4:08, Francisco Olarte wrote:
> As a remark, in Spain bill numbers need to be gapless increasing. I
> have done it with ....
>
> One answer to this problem has been around for a while, and my version is shown below.

If I read it correctly, leaving the 700 stuff aside, this function
gives you the first free transaction_ref. This is not valid for my
problem, I need them increasing, 1-3-2 is not a valid sequence. The
trick I use is that I can have "dummy" records, which do not have real
data ( or dates, which is what, among other things, makes me need them
increasing ), so I generate 1-3-4 and then insert 2 in batch which
values adequate for legal (i.e., I can just use 1.date for 2.date ).

I think what you do is generate account numbers which should be
gapless in the medium term and should be increasing as needed, wherte
the gapless is more important to you than the increasing.

> No extra table is needed, just a windowed look at the table where the reference is used.

 It is normally never needed, thay are used solely for performance and
to avoid locking. A current_transaction_seq

> My usage is at human speed so performance is not an issue but it should be pretty good if there are minimal holes.
Whatis not addressed is that a reference can be reissued upto the time the calling process commits an entry in the
tableand takes that reference out of circulation. There are different approaches to handling/preventing such
collisions.

Your construct seems pretty expensive, and I'm not sure how much
locking it does at serializable. Also, given it needs recovery ( the
single row table plus back-fill does not, if you consider the
back-filling "bussines as usual" ). Also the reissuing of a number is
a no-go in my automated systems, it would need extensive recovery, in
that case I can use a gap-less approach by simpy selecting max(id) (
and given it is indexed, using a select for update on the max record
by ordering desc and limit 1, if I'm not too confused this would give
no gaps, increasing at the expense of reduced concurrency ).

...
> $COMMENT$
> Determines the next available reference number, making sure to fill any holes.

No going back allowed in my case. Simplifying it, a bill may reference
an older one in its data, and its number MUST be less. The gap problem
is there MUST be NO gaps WHEN I "print" them. And I can use dummies,
but I SHOULD not use them. In practice, you rarely use more than a
couple of dummies a year. I could strictly print dummies when I need
them, but they MUST be always the same, and the easier way is to
insert them.

Francisco Olarte.



Re: Concurrent INSERT statements with RETURNING clause resetting SERIAL sequence

От
Sebastien Flaesch
Дата:
Thomas, we already have a similar solution.
The idea is to use the native PostgreSQL SERIAL type.
Seb

From: Thomas Kellerer <shammat@gmx.net>
Sent: Wednesday, July 20, 2022 8:56 AM
To: pgsql-general@lists.postgresql.org <pgsql-general@lists.postgresql.org>
Subject: Re: Concurrent INSERT statements with RETURNING clause resetting SERIAL sequence
 
EXTERNAL: Do not click links or open attachments if you do not recognize the sender.

Sebastien Flaesch schrieb am 19.07.2022 um 18:50:
> Tom,
>
>     /If that's the behavior you want, you can build it out of standard SQL facilities (e.g. update a one-row table).
>     /
>
> Can you elaborate please?
>
> Do you mean the code should use an UPDATE on a one-row table to acquire a lock?

I assume something like this:

https://urldefense.com/v3/__https://blog.sql-workbench.eu/post/gapless-sequence/__;!!I_DbfM1H!F7_2cNahve0cmwPMP6QBBwwpyP6UAum4ukFj71_21ebcxTKXZFtU0_3O6l1lfG5jYiKjO7wEzRt_E1GbJ9Q$




Re: Concurrent INSERT statements with RETURNING clause resetting SERIAL sequence

От
Karsten Hilbert
Дата:
Am Wed, Jul 20, 2022 at 09:15:29AM +0000 schrieb Sebastien Flaesch:

> Thomas, we already have a similar solution.
> The idea is to use the native PostgreSQL SERIAL type.

Which does not guarantuee gaplessness.

Karsten
--
GPG  40BE 5B0E C98E 1713 AFA6  5BC0 3BEA AC80 7D4F C89B