Обсуждение: Semantics around INSERT INTO with SELECT and ORDER BY.
If I insert using the results of a select statement, are the inserts guaranteed to happen in the order of the rows returned from the select?
That is, are these two equivalent:
INSERT INTO <table2> SELECT <field> FROM <table1> ORDER BY <col> DESC;
And:
FOR row IN SELECT <field> FROM <table1> ORDER BY <col> DESC LOOP
INSERT INTO <table2> VALUES (row.<field>);
END LOOP;
I read through the SQL spec on insertion but they don't address insertion ordering (perhaps, purposefully).
Any clarification here would be super helpful.
Thank you!
Steve
Why is it even important? Once you use ORDER BY clause, you are guaranteed to get the rows in the order. Why do you need how it was inserted in the first place.
This is relevant for tables that have a column with a SERIAL type, I need to guarantee that the relative ordering remains the same as the ordering of the selected result set.
More concretely, given:
> CREATE TABLE foo (id SERIAL, val TEXT);
> CREATE TABLE bar (id SERIAL, val TEXT);
> INSERT INTO foo (val) VALUES ('A'), ('B'), ('C');
> TABLE foo;
id | val
----+-----
1 | A
2 | B
3 | C
(3 rows)
Then,
> INSERT INTO bar (val) SELECT val FROM foo ORDER BY id DESC;
> TABLE bar;
id | val
----+-----
1 | C
2 | B
3 | A
(3 rows)
The rows should be inserted in reverse. (Note: I don't care about the actual value of the id, only the relative ordering).
Inserting more values should similarly append into the table in order (where "append" is used in terms of the serial id).
> INSERT INTO bar (val) SELECT val FROM foo ORDER BY id DESC;
> TABLE bar;
id | val
----+-----
1 | C
2 | B
3 | A
4 | C
5 | B
6 | A
(6 rows)
Or to put it another way, I want to select values from one table ordered by complex criteria and insert them into another table. I want to be able to retrieve the rows from the target table in the same order they were inserted, but I don't care about the specific ordering criteria. I only care about the order they were inserted.
On Tue, Jun 12, 2018 at 1:24 AM Ravi Krishna <sravikrishna3@gmail.com> wrote:
Why is it even important? Once you use ORDER BY clause, you are guaranteed to get the rows in the order. Why do you need how it was inserted in the first place.
>Or to put it another way, I want to select values from one table ordered by
>complex criteria and insert them into another table. I want to be able to
>retrieve the rows from the target table in the same order they were inserted,
>but I don't care about the specific ordering criteria. I only care about the order they were inserted.
As I understand, your business requirement is to retrieve the rows from the target
table the same way they were inserted. The one and only way to achieve it is
to use the same ORDER by clause to SELECT from target, what it was used
to insert into target. In your case, the insert is
INSERT INTO bar (val) SELECT val FROM foo ORDER BY id DESC;
So the SELECT should also be ORDER BY ID desc
Just don't care on how it inserts and stores row internally.
On Tuesday, June 12, 2018, Steve Krenzel <sgk284@gmail.com> wrote:
This is relevant for tables that have a column with a SERIAL type, I need to guarantee that the relative ordering remains the same as the ordering of the selected result set.
The logical insertion order, and thus the sequence values, will be assigned according to the order by.
David J.
On 06/12/2018 01:39 AM, Steve Krenzel wrote: > This is relevant for tables that have a column with a SERIAL type, I > need to guarantee that the relative ordering remains the same as the > ordering of the selected result set. > > More concretely, given: > > > CREATE TABLE foo (id SERIAL, val TEXT); > > CREATE TABLE bar (id SERIAL, val TEXT); > > INSERT INTO foo (val) VALUES ('A'), ('B'), ('C'); > > TABLE foo; > id | val > ----+----- > 1 | A > 2 | B > 3 | C > (3 rows) > > Then, > > > INSERT INTO bar (val) SELECT val FROM foo ORDER BY id DESC; > > TABLE bar; > id | val > ----+----- > 1 | C > 2 | B > 3 | A > (3 rows) > > The rows should be inserted in reverse. (Note: I don't care about the > actual value of the id, only the relative ordering). > > Inserting more values should similarly append into the table in order > (where "append" is used in terms of the serial id). > > > INSERT INTO bar (val) SELECT val FROM foo ORDER BY id DESC; > > TABLE bar; > id | val > ----+----- > 1 | C > 2 | B > 3 | A > 4 | C > 5 | B > 6 | A > (6 rows) > > Or to put it another way, I want to select values from one table ordered > by complex criteria and insert them into another table. I want to be > able to retrieve the rows from the target table in the same order they > were inserted, but I don't care about the specific ordering criteria. I > only care about the order they were inserted. That will only work until some other INSERT or UPDATE occurs. Using table from your example: UPDATE bar SET val = 'C1' where id = 1; TABLE bar; id | val ----+----- 2 | B 3 | A 4 | C 5 | B 6 | A 1 | C1 (6 rows) You can use CLUSTER: https://www.postgresql.org/docs/10/static/sql-cluster.html to reestablish order based on an index, though that has the same issue: "Clustering is a one-time operation: when the table is subsequently updated, the changes are not clustered." As has been pointed out order of rows is not guaranteed. > > On Tue, Jun 12, 2018 at 1:24 AM Ravi Krishna <sravikrishna3@gmail.com > <mailto:sravikrishna3@gmail.com>> wrote: > > Why is it even important? Once you use ORDER BY clause, you are > guaranteed to get the rows in the order. Why do you need how it was > inserted in the first place. > -- Adrian Klaver adrian.klaver@aklaver.com
"David G. Johnston" <david.g.johnston@gmail.com> writes: > On Tuesday, June 12, 2018, Steve Krenzel <sgk284@gmail.com> wrote: >> This is relevant for tables that have a column with a SERIAL type, I need >> to guarantee that the relative ordering remains the same as the ordering of >> the selected result set. > The logical insertion order, and thus the sequence values, will be assigned > according to the order by. What is actually going to happen, given say create table targ (d text, id serial); insert into targ select x from src order by y; is that you're going to get a parse tree equivalent to select x, nextval('targ_id_seq') from (select x from src order by y) ss; and then it's a question of whether the planner is capable of reordering the steps into something you don't want. I think that the presence of the explicit "ORDER BY" in the sub-select will prevent flattening of the sub-select, which is enough to make it safe. However, if for some reason you did not say "ORDER BY" but nonetheless expected the serial values to get assigned in the same order that the underlying query would produce rows natively, you might get burnt. As of 9.6, there are more guarantees in this area than there used to be (cf commit 9118d03a8), but I don't think it matters as long as you write an ORDER BY. regards, tom lane
Thank you all! Your answers cleared this up for me.
Tom, in particular, that reference commit removed any ambiguity for me. Thank you. It's much appreciated.
On Tue, Jun 12, 2018 at 7:27 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
"David G. Johnston" <david.g.johnston@gmail.com> writes:
> On Tuesday, June 12, 2018, Steve Krenzel <sgk284@gmail.com> wrote:
>> This is relevant for tables that have a column with a SERIAL type, I need
>> to guarantee that the relative ordering remains the same as the ordering of
>> the selected result set.
> The logical insertion order, and thus the sequence values, will be assigned
> according to the order by.
What is actually going to happen, given say
create table targ (d text, id serial);
insert into targ select x from src order by y;
is that you're going to get a parse tree equivalent to
select x, nextval('targ_id_seq')
from (select x from src order by y) ss;
and then it's a question of whether the planner is capable of reordering
the steps into something you don't want. I think that the presence of
the explicit "ORDER BY" in the sub-select will prevent flattening of the
sub-select, which is enough to make it safe. However, if for some reason
you did not say "ORDER BY" but nonetheless expected the serial values to
get assigned in the same order that the underlying query would produce
rows natively, you might get burnt.
As of 9.6, there are more guarantees in this area than there used
to be (cf commit 9118d03a8), but I don't think it matters as long
as you write an ORDER BY.
regards, tom lane