Обсуждение: Idea: INSERT INTO ... NATURAL SELECT ...

Поиск
Список
Период
Сортировка

Idea: INSERT INTO ... NATURAL SELECT ...

От
Sven Berkvens-Matthijsse
Дата:
Hi everyone,

I started a discussion on the hackers mailing list regarding a patch 
that was made more than two years ago but was never merged for lack of 
time then and lack of interest and time now (see 
https://www.postgresql.org/message-id/flat/709e06c0-59c9-ccec-d216-21e38cb5ed61@joh.to 
for the original thread and 
https://www.postgresql.org/message-id/840eb7b0-17ae-fe52-1643-cd7395eed5df%40berkvens.net 
for the new thread). During that discussion, I arrived at the standpoint 
that the original idea was perhaps not the best idea to start with after 
all.

I write quite some handmade SQL, and usually, SQL allows me to write 
what I want to express in a compact, clear fashion. There is one 
exception that I keep running into. I frequently need to insert manually 
entered values into various tables with many columns. Of course, INSERT 
INTO ... (...) {SELECT | VALUES} ... allows me to do just that, however, 
it is very hard to find which values will go into which column that way: 
column names and values are not near to each other in this syntax.

The patch that I initially mailed about implements a syntax that MySQL 
has had for a long time: INSERT INTO ... SET a = 1, b = 2, etc. That 
looks okay on first sight but has several drawbacks. One is that it is 
non-standard, but a more significant drawback is that it allows only one 
row to be inserted per statement.

The idea that I got was the following, and I'd like to know what other 
people think about this. If other people think it's a good idea too and 
nobody has objections against it, I'd be willing to try and create a 
patch for an implementation.

The syntax that I'm proposing is:

INSERT INTO table NATURAL query
        [ RETURNING * | output_expression [ [ AS ] output_name ] [, ...] ]

The idea of the NATURAL keyword is that one does not explicitly name the 
columns of the target table in parentheses, and does not use its default 
ordering of columns if one does not name any columns, but instead have 
the NATURAL keyword trigger behavior where the column names to be used 
in the target table are determined from the column names produced by the 
query instead. For example:

CREATE TABLE test_table(a INTEGER, b INTEGER, c INTEGER, d INTEGER);

These two would be equivalent:

INSERT INTO test_table (b, c, a) VALUES (3, 5, 7);

INSERT INTO test_table NATURAL SELECT 3 AS b, 5 AS c, 7 AS a;

As would these two:

INSERT INTO test_table (b, c, a)
      SELECT source.foo, source.bar, source.beep FROM source;

INSERT INTO test_table NATURAL SELECT
      source.foo AS b, source.bar AS c, source.beep AS a FROM source;

These examples are of course contrived, and the benefit only really 
shows when the table has many columns.

The query would fail to execute if one (or more) of the column names in 
the query does not exist in the target table, or if a column name is 
used more than once in the query. Everything else works just as you 
would expect, the behavior is identical to using a normal SELECT or 
VALUES where one specifies the column names by hand. So, specifically 
highlighting the differences, these queries would fail:

INSERT INTO test_table NATURAL SELECT
       1 AS does_not_exist, 2 AS also_nonexistent;

INSERT INTO test_table NATURAL SELECT 1 AS a, 2 AS a;

Anyone with any thoughts about this? An implementation would make 
inserting data into wide tables by hand very much easier. Because of the 
placement of the NATURAL keyword, I don't think this will conflict with 
any current or future proposal from the SQL standard (except maybe for 
this one :-) ).

With kind regards,
Sven Berkvens-Matthijsse

PS I have not subscribed to the SQL list yet, please copy me in explicitly.




Re: Idea: INSERT INTO ... NATURAL SELECT ...

От
Steve Midgley
Дата:
On Fri, Feb 1, 2019 at 8:07 AM Sven Berkvens-Matthijsse <sven@postgresql.berkvens.net> wrote:
<snip>
INSERT INTO test_table NATURAL SELECT
       1 AS does_not_exist, 2 AS also_nonexistent;

INSERT INTO test_table NATURAL SELECT 1 AS a, 2 AS a;

Anyone with any thoughts about this? An implementation would make
inserting data into wide tables by hand very much easier. Because of the
placement of the NATURAL keyword, I don't think this will conflict with
any current or future proposal from the SQL standard (except maybe for
this one :-) ).

Hi Sven,

I can clearly see the benefits of your proposal from a custom SQL writing perspective. Personally, I would just write this capability as a DSL in a higher level language and have it translated into the ugly SQL you're trying to avoid (that has the field values and names split far apart, visually). I have no idea what the implications of your NATURAL keyword on the implementation of Postgres itself, but I do think (as another person who writes manual SQL on occasion) that your proposal has merits for this narrow use-case. Personally I don't think the use-case is substantial enough to merit implementing a custom keyword unless there are a lot more people with use-cases that would benefit also. But if you can write an optional module that can be installed outside of core, it seems like why not? Just my two cents.

Sincerely,
Steve


Re: Idea: INSERT INTO ... NATURAL SELECT ...

От
Martin Stöcker
Дата:
Hi Sven,

in many cases I prefer "copy from" to import data.  So I can create my 
test data via spreadsheet and csv.
Sometimes it is helpfull to do some shell stuff to create it or import 
with psql.

But if you prefer to have column names and column data near to each 
other, why not using json?

postgres=#create table test ( i integer, t text);
postgres=# insert into test(select * from 
json_to_recordset('[{"i":1,"t":"foo"},{"i":"7","t":"bar"}]') as x(i int, 
t text));
INSERT 0 2
postgres=# select * from test;
  i |  t
---+-----
   1 | foo
  7 | bar
(2 rows)

Regards Martin

Am 01.02.2019 um 17:06 schrieb Sven Berkvens-Matthijsse:
> Hi everyone,
>
> I started a discussion on the hackers mailing list regarding a patch 
> that was made more than two years ago but was never merged for lack of 
> time then and lack of interest and time now (see 
> https://www.postgresql.org/message-id/flat/709e06c0-59c9-ccec-d216-21e38cb5ed61@joh.to 
> for the original thread and 
> https://www.postgresql.org/message-id/840eb7b0-17ae-fe52-1643-cd7395eed5df%40berkvens.net 
> for the new thread). During that discussion, I arrived at the 
> standpoint that the original idea was perhaps not the best idea to 
> start with after all.
>
> I write quite some handmade SQL, and usually, SQL allows me to write 
> what I want to express in a compact, clear fashion. There is one 
> exception that I keep running into. I frequently need to insert 
> manually entered values into various tables with many columns. Of 
> course, INSERT INTO ... (...) {SELECT | VALUES} ... allows me to do 
> just that, however, it is very hard to find which values will go into 
> which column that way: column names and values are not near to each 
> other in this syntax.
>
> The patch that I initially mailed about implements a syntax that MySQL 
> has had for a long time: INSERT INTO ... SET a = 1, b = 2, etc. That 
> looks okay on first sight but has several drawbacks. One is that it is 
> non-standard, but a more significant drawback is that it allows only 
> one row to be inserted per statement.
>
> The idea that I got was the following, and I'd like to know what other 
> people think about this. If other people think it's a good idea too 
> and nobody has objections against it, I'd be willing to try and create 
> a patch for an implementation.
>
> The syntax that I'm proposing is:
>
> INSERT INTO table NATURAL query
>        [ RETURNING * | output_expression [ [ AS ] output_name ] [, ...] ]
>
> The idea of the NATURAL keyword is that one does not explicitly name 
> the columns of the target table in parentheses, and does not use its 
> default ordering of columns if one does not name any columns, but 
> instead have the NATURAL keyword trigger behavior where the column 
> names to be used in the target table are determined from the column 
> names produced by the query instead. For example:
>
> CREATE TABLE test_table(a INTEGER, b INTEGER, c INTEGER, d INTEGER);
>
> These two would be equivalent:
>
> INSERT INTO test_table (b, c, a) VALUES (3, 5, 7);
>
> INSERT INTO test_table NATURAL SELECT 3 AS b, 5 AS c, 7 AS a;
>
> As would these two:
>
> INSERT INTO test_table (b, c, a)
>      SELECT source.foo, source.bar, source.beep FROM source;
>
> INSERT INTO test_table NATURAL SELECT
>      source.foo AS b, source.bar AS c, source.beep AS a FROM source;
>
> These examples are of course contrived, and the benefit only really 
> shows when the table has many columns.
>
> The query would fail to execute if one (or more) of the column names 
> in the query does not exist in the target table, or if a column name 
> is used more than once in the query. Everything else works just as you 
> would expect, the behavior is identical to using a normal SELECT or 
> VALUES where one specifies the column names by hand. So, specifically 
> highlighting the differences, these queries would fail:
>
> INSERT INTO test_table NATURAL SELECT
>       1 AS does_not_exist, 2 AS also_nonexistent;
>
> INSERT INTO test_table NATURAL SELECT 1 AS a, 2 AS a;
>
> Anyone with any thoughts about this? An implementation would make 
> inserting data into wide tables by hand very much easier. Because of 
> the placement of the NATURAL keyword, I don't think this will conflict 
> with any current or future proposal from the SQL standard (except 
> maybe for this one :-) ).
>
> With kind regards,
> Sven Berkvens-Matthijsse
>
> PS I have not subscribed to the SQL list yet, please copy me in 
> explicitly.
>
>
>



Re: Idea: INSERT INTO ... NATURAL SELECT ...

От
Sven Berkvens-Matthijsse
Дата:
Hi Steve,

On 01/02/2019 18.32, Steve Midgley wrote:
> On Fri, Feb 1, 2019 at 8:07 AM Sven Berkvens-Matthijsse 
> <sven@postgresql.berkvens.net <mailto:sven@postgresql.berkvens.net>> 
> wrote:
>
>     <snip>
>     INSERT INTO test_table NATURAL SELECT
>            1 AS does_not_exist, 2 AS also_nonexistent;
>
>     INSERT INTO test_table NATURAL SELECT 1 AS a, 2 AS a;
>
>     Anyone with any thoughts about this? An implementation would make
>     inserting data into wide tables by hand very much easier. Because
>     of the
>     placement of the NATURAL keyword, I don't think this will conflict
>     with
>     any current or future proposal from the SQL standard (except maybe
>     for
>     this one :-) ).
>
>
> Hi Sven,
>
> I can clearly see the benefits of your proposal from a custom SQL 
> writing perspective. Personally, I would just write this capability as 
> a DSL in a higher level language and have it translated into the ugly 
> SQL you're trying to avoid (that has the field values and names split 
> far apart, visually).

Sure, that's definitely an option! The INSERT statements I have are 
usually part of larger SQL files that do other things as well, but 
there's no reason I couldn't make a preprocessor or something that 
expands something custom into the usually formatted INSERT statements.

> I have no idea what the implications of your NATURAL keyword on the 
> implementation of Postgres itself, but I do think (as another person 
> who writes manual SQL on occasion) that your proposal has merits for 
> this narrow use-case.

Maybe it's just not useful that often because people don't write much 
SQL by hand these days (maybe they never did, I don't know).

> Personally I don't think the use-case is substantial enough to merit 
> implementing a custom keyword unless there are a lot more people with 
> use-cases that would benefit also. But if you can write an optional 
> module that can be installed outside of core, it seems like why not? 
> Just my two cents.

I'm afraid the plug-in architecture of PostgreSQL wouldn't allow my 
proposal to be implemented that way (unless I'm mistaken and I didn't 
read and/or understand correctly).

> Sincerely,
> Steve


Thanks for your response and thoughts!
Sven



Re: Idea: INSERT INTO ... NATURAL SELECT ...

От
Sven Berkvens-Matthijsse
Дата:
Hi Martin,

On 01/02/2019 19.41, Martin Stöcker wrote:
> Hi Sven,
>
> in many cases I prefer "copy from" to import data.  So I can create my 
> test data via spreadsheet and csv.

Sure, that works, but it would then be useful if the COPY command would 
actually read the first line of a CSV file and use it to find the target 
columns in the table, but it does not.

> Sometimes it is helpfull to do some shell stuff to create it or import 
> with psql.

Yes, that definitely works.

> But if you prefer to have column names and column data near to each 
> other, why not using json?
>
> postgres=#create table test ( i integer, t text);
> postgres=# insert into test(select * from 
> json_to_recordset('[{"i":1,"t":"foo"},{"i":"7","t":"bar"}]') as x(i 
> int, t text));

That also works, but requires one to name all the columns and their 
types in the "AS x" part. That makes the statement very verbose. My 
proposal would not require the types to be stated and would only require 
the column names in the "data part" of the statement.

> INSERT 0 2
> postgres=# select * from test;
>  i |  t
> ---+-----
>   1 | foo
>  7 | bar
> (2 rows)
>
> Regards Martin

Thanks for your thoughts!

With kind regards,
Sven