Обсуждение: Semantics around INSERT INTO with SELECT and ORDER BY.

Поиск
Список
Период
Сортировка

Semantics around INSERT INTO with SELECT and ORDER BY.

От
Steve Krenzel
Дата:
If I insert using the results of a select statement, are the inserts guaranteed to happen in the order of the rows returned from the select?

That is, are these two equivalent:

    INSERT INTO <table2> SELECT <field> FROM <table1> ORDER BY <col> DESC;

And:

    FOR row IN SELECT <field> FROM <table1> ORDER BY <col> DESC LOOP
        INSERT INTO <table2> VALUES (row.<field>);
    END LOOP;

I read through the SQL spec on insertion but they don't address insertion ordering (perhaps, purposefully).

Any clarification here would be super helpful.

Thank you!
Steve

Re: Semantics around INSERT INTO with SELECT and ORDER BY.

От
Ravi Krishna
Дата:
Why is it even important?  Once you use ORDER BY clause, you are guaranteed to get the rows in the order.  Why do you need how it was inserted in the first place.

Re: Semantics around INSERT INTO with SELECT and ORDER BY.

От
Steve Krenzel
Дата:
This is relevant for tables that have a column with a SERIAL type, I need to guarantee that the relative ordering remains the same as the ordering of the selected result set.

More concretely, given:

    > CREATE TABLE foo (id SERIAL, val TEXT);
    > CREATE TABLE bar (id SERIAL, val TEXT);
    > INSERT INTO foo (val) VALUES ('A'), ('B'), ('C');
    > TABLE foo;
     id | val
    ----+-----
      1 | A
      2 | B
      3 | C
    (3 rows)

Then,

    > INSERT INTO bar (val) SELECT val FROM foo ORDER BY id DESC;
    > TABLE bar;
     id | val
    ----+-----
      1 | C
      2 | B
      3 | A
    (3 rows)

The rows should be inserted in reverse. (Note: I don't care about the actual value of the id, only the relative ordering).

Inserting more values should similarly append into the table in order (where "append" is used in terms of the serial id).

    > INSERT INTO bar (val) SELECT val FROM foo ORDER BY id DESC;
    > TABLE bar;
     id | val
    ----+-----
      1 | C
      2 | B
      3 | A
      4 | C
      5 | B
      6 | A
    (6 rows)

Or to put it another way, I want to select values from one table ordered by complex criteria and insert them into another table. I want to be able to retrieve the rows from the target table in the same order they were inserted, but I don't care about the specific ordering criteria. I only care about the order they were inserted.

On Tue, Jun 12, 2018 at 1:24 AM Ravi Krishna <sravikrishna3@gmail.com> wrote:
Why is it even important?  Once you use ORDER BY clause, you are guaranteed to get the rows in the order.  Why do you need how it was inserted in the first place.

Re: Semantics around INSERT INTO with SELECT and ORDER BY.

От
Ravi Krishna
Дата:
>Or to put it another way, I want to select values from one table ordered by 
>complex criteria and insert them into another table. I want to be able to 
>retrieve the rows from the target table in the same order they were inserted,
>but I don't care about the specific ordering criteria. I only care about the order they were inserted.

As I understand, your business requirement is to retrieve the rows from the target 
table the same way they were inserted.  The one and only way to achieve it is
to use the same ORDER by clause to SELECT from target, what it was used
to insert into target. In your case, the insert is
 INSERT INTO bar (val) SELECT val FROM foo ORDER BY id DESC;
So the SELECT should also be ORDER BY ID desc

Just don't care on how it inserts and stores row internally.

Re: Semantics around INSERT INTO with SELECT and ORDER BY.

От
"David G. Johnston"
Дата:
On Tuesday, June 12, 2018, Steve Krenzel <sgk284@gmail.com> wrote:
This is relevant for tables that have a column with a SERIAL type, I need to guarantee that the relative ordering remains the same as the ordering of the selected result set.

The logical insertion order, and thus the sequence values, will be assigned according to the order by.

David J. 

Re: Semantics around INSERT INTO with SELECT and ORDER BY.

От
Adrian Klaver
Дата:
On 06/12/2018 01:39 AM, Steve Krenzel wrote:
> This is relevant for tables that have a column with a SERIAL type, I 
> need to guarantee that the relative ordering remains the same as the 
> ordering of the selected result set.
> 
> More concretely, given:
> 
>      > CREATE TABLE foo (id SERIAL, val TEXT);
>      > CREATE TABLE bar (id SERIAL, val TEXT);
>      > INSERT INTO foo (val) VALUES ('A'), ('B'), ('C');
>      > TABLE foo;
>       id | val
>      ----+-----
>        1 | A
>        2 | B
>        3 | C
>      (3 rows)
> 
> Then,
> 
>      > INSERT INTO bar (val) SELECT val FROM foo ORDER BY id DESC;
>      > TABLE bar;
>       id | val
>      ----+-----
>        1 | C
>        2 | B
>        3 | A
>      (3 rows)
> 
> The rows should be inserted in reverse. (Note: I don't care about the 
> actual value of the id, only the relative ordering).
> 
> Inserting more values should similarly append into the table in order 
> (where "append" is used in terms of the serial id).
> 
>      > INSERT INTO bar (val) SELECT val FROM foo ORDER BY id DESC;
>      > TABLE bar;
>       id | val
>      ----+-----
>        1 | C
>        2 | B
>        3 | A
>        4 | C
>        5 | B
>        6 | A
>      (6 rows)
> 
> Or to put it another way, I want to select values from one table ordered 
> by complex criteria and insert them into another table. I want to be 
> able to retrieve the rows from the target table in the same order they 
> were inserted, but I don't care about the specific ordering criteria. I 
> only care about the order they were inserted.

That will only work until some other INSERT or UPDATE occurs. Using 
table from your example:

UPDATE bar SET val = 'C1' where id = 1;

TABLE bar;
  id | val
----+-----
   2 | B
   3 | A
   4 | C
   5 | B
   6 | A
   1 | C1
(6 rows)

You can use CLUSTER:

https://www.postgresql.org/docs/10/static/sql-cluster.html

to reestablish order based on an index, though that has the same issue:

"Clustering is a one-time operation: when the table is subsequently 
updated, the changes are not clustered."

As has been pointed out order of rows is not guaranteed.


> 
> On Tue, Jun 12, 2018 at 1:24 AM Ravi Krishna <sravikrishna3@gmail.com 
> <mailto:sravikrishna3@gmail.com>> wrote:
> 
>     Why is it even important?  Once you use ORDER BY clause, you are
>     guaranteed to get the rows in the order.  Why do you need how it was
>     inserted in the first place.
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com


Re: Semantics around INSERT INTO with SELECT and ORDER BY.

От
Tom Lane
Дата:
"David G. Johnston" <david.g.johnston@gmail.com> writes:
> On Tuesday, June 12, 2018, Steve Krenzel <sgk284@gmail.com> wrote:
>> This is relevant for tables that have a column with a SERIAL type, I need
>> to guarantee that the relative ordering remains the same as the ordering of
>> the selected result set.

> The logical insertion order, and thus the sequence values, will be assigned
> according to the order by.

What is actually going to happen, given say

create table targ (d text, id serial);

insert into targ select x from src order by y;

is that you're going to get a parse tree equivalent to

    select x, nextval('targ_id_seq')
    from (select x from src order by y) ss;

and then it's a question of whether the planner is capable of reordering
the steps into something you don't want.  I think that the presence of
the explicit "ORDER BY" in the sub-select will prevent flattening of the
sub-select, which is enough to make it safe.  However, if for some reason
you did not say "ORDER BY" but nonetheless expected the serial values to
get assigned in the same order that the underlying query would produce
rows natively, you might get burnt.

As of 9.6, there are more guarantees in this area than there used
to be (cf commit 9118d03a8), but I don't think it matters as long
as you write an ORDER BY.

            regards, tom lane


Re: Semantics around INSERT INTO with SELECT and ORDER BY.

От
Steve Krenzel
Дата:
Thank you all! Your answers cleared this up for me.

Tom, in particular, that reference commit removed any ambiguity for me. Thank you. It's much appreciated.

On Tue, Jun 12, 2018 at 7:27 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
"David G. Johnston" <david.g.johnston@gmail.com> writes:
> On Tuesday, June 12, 2018, Steve Krenzel <sgk284@gmail.com> wrote:
>> This is relevant for tables that have a column with a SERIAL type, I need
>> to guarantee that the relative ordering remains the same as the ordering of
>> the selected result set.

> The logical insertion order, and thus the sequence values, will be assigned
> according to the order by.

What is actually going to happen, given say

create table targ (d text, id serial);

insert into targ select x from src order by y;

is that you're going to get a parse tree equivalent to

        select x, nextval('targ_id_seq')
        from (select x from src order by y) ss;

and then it's a question of whether the planner is capable of reordering
the steps into something you don't want.  I think that the presence of
the explicit "ORDER BY" in the sub-select will prevent flattening of the
sub-select, which is enough to make it safe.  However, if for some reason
you did not say "ORDER BY" but nonetheless expected the serial values to
get assigned in the same order that the underlying query would produce
rows natively, you might get burnt.

As of 9.6, there are more guarantees in this area than there used
to be (cf commit 9118d03a8), but I don't think it matters as long
as you write an ORDER BY.

                        regards, tom lane