Re: Guidance on INSERT RETURNING order

Поиск
Список
Период
Сортировка
От Kirk Wolak
Тема Re: Guidance on INSERT RETURNING order
Дата
Msg-id CACLU5mT5x-76uy_Rh821-WfnD3Z2iuMsZL4y1dsSBK2ypERcmA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Guidance on INSERT RETURNING order  (Federico <cfederico87@gmail.com>)
Ответы Re: Guidance on INSERT RETURNING order  (Adrian Klaver <adrian.klaver@aklaver.com>)
Список pgsql-general
On Tue, Apr 11, 2023 at 4:38 PM Federico <cfederico87@gmail.com> wrote:
Thanks for the ansers

> 2) What would you order by, id or data or both?

by values order, (that incidentally seems to be what PG does)

> with i as (INSERT INTO t(data) VALUES ('a', 1), ('b', 2), ('c', 3)
> returning id)
> select i.id from i order by id;

The problem here is not having the auto increment id in a particular
order, is that there
is apparently no correlation with the position of an element in the
values clause with the
id generated. That's the reason for using the sentinel column in the
general solution in the previous message.

The extend on the use case, SQLAlchemy has 3 objects T that have
T(data='a'), T(data='b'), T(data='c') but no
value for the id column. The objective is to insert the 3 data values,
get back the ids and correctly match them with
the correct 3 objects.

> No.  Sadly, adding that ORDER BY is just voodoo programming, because
> it applies to the result of the SELECT while promising nothing about
> the order in which INSERT/RETURNING will act on those rows.

I wasn't probably clear, it's fine if INSERT/RETURNING order is
arbitrary, what matters is that the
autoincementing values is executed in the same order as select, like
mentioned in this
previous message
https://www.postgresql.org/message-id/29386.1528813619%40sss.pgh.pa.us

Is that not the case?

> Re-reading that 2012 thread, the main new observation I'd make today
> is that parallel operation is a thing now, and it's not hard to foresee
> that sometime soon we'll want to parallelize INSERTs.  Which'd make it
> *really* hard to promise anything about the order of RETURNING output.

I think it's fine not promising anything about the order of RETURNING, but
it would be very helpful having a way of tracking what input row
generated a particular
output row. Basically the sentinel case in the original post,
without actually having to insert the sentinel into the table.

> I think if you want to use RETURNING with multi-row inserts, the
> thing to do is more like
>
>   INSERT INTO t(data) VALUES ('a'), ('b'), ('c') RETURNING data, id
>
> and then explicitly match up the returned "data" values rather than
> presuming they appear in the same order you wrote them in in VALUES.
> Admittedly this might be problematic if some of the VALUES rows
> are identical, but how much should you care?

Well, the example is very easy, but it's hard to generalize when
inserting multiple columns
with possible complex values in them, since it would mean matching on
possibly large json values,
arrays, etc. So definitely not ideal

Thanks,
Federico

On Tue, 11 Apr 2023 at 22:06, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
>
> On 4/11/23 12:47, Federico wrote:
> > Hello list,
> >
> https://www.sqlite.org/lang_returning.html#limitations_and_caveats
> >
> > Searching the archive seems that a using the INSERT SELECT ORDER BY
> > form should be a better solution,
> > so the above insert should be rewritten as
> >
> >      INSERT INTO t(data)
> >      SELECT data FROM (VALUES ('a', 1), ('b', 2), ('c', 3)) as vv(data,
> > num) ORDER BY num
> >      RETURNING id
>
> Or
>
> with i as (INSERT INTO t(data) VALUES ('a', 1), ('b', 2), ('c', 3)
> returning id)
> select i.id from i order by id;
>
> > Sorry for the long email,
> > Thanks
> >
> >      Federico
> >
> >
>
> --
> Adrian Klaver
> adrian.klaver@aklaver.com
>

A couple of comments.  For the more generic, I prefer RETURNING *
you get back all the columns for matching.  To me, this solves the problem in a very generic way.
But SQL (and SET THEORY) basically imply you cannot trust the sequencing of a set of transactions.  Parallel execution is just a great simple example.

Secondarily, many frameworks I've worked with (and custom ones developed) would actually call the SEQUENCE.NEXTVAL, and assign the IDs, in memory, accepting that we would have gaping holes if some transactions were never actually sent to the server.  We did this a lot in master-detail GUI type stuff.  It's just easier.  The children knew their parent ID, and all the children ID's were effectively known before committing.  It made for simple code that never failed. 
(for large datasets we would want one query that returned a set of IDs, we could order that.  And apply it to the records we were about to insert). [Be Careful with GENERATED ALWAYS pks to OVERRIDE]

HTH 

В списке pgsql-general по дате отправления:

Предыдущее
От: Rob Sargent
Дата:
Сообщение: Re: TEXT column > 1Gb
Следующее
От: Kirk Wolak
Дата:
Сообщение: Re: FW: Error!