Re: Separate the attribute physical order from logical order

Поиск
Список
Период
Сортировка
От Julien Rouhaud
Тема Re: Separate the attribute physical order from logical order
Дата
Msg-id 20220628093212.a4ipowowtmpzmsyh@jrouhaud
обсуждение исходный текст
Ответ на Re: Separate the attribute physical order from logical order  (Alvaro Herrera <alvherre@alvh.no-ip.org>)
Ответы Re: Separate the attribute physical order from logical order  (Isaac Morland <isaac.morland@gmail.com>)
Re: Separate the attribute physical order from logical order  (Alvaro Herrera <alvherre@alvh.no-ip.org>)
Список pgsql-hackers
Hi,

On Tue, Jun 28, 2022 at 10:53:14AM +0200, Alvaro Herrera wrote:
> On 2022-Jun-28, Julien Rouhaud wrote:
>
> > So, assuming that the current JOIN expansion order shouldn't be
> > changed, I implemented the last approach I mentioned.
>
> Yeah, I'm not sure that this is a good assumption.  I mean, if logical
> order is the order in which users see the table columns, then why
> shouldn't JOIN expand in the same way?  My feeling is that every aspect
> of user interaction should show columns ordered in logical order.  When
> I said that "only star expansion changes" upthread, what I meant is that
> there was no need to support any additional functionality such as
> letting the column order be changed or the server changing things
> underneath to avoid alignment padding, etc.

I'm not entirely sure of what you meant.  Assuming tables a(a, z) and b(b, z),
what do you think those queries should return?

SELECT * FROM a JOIN b on a.z = b.z
Currently it returns (a.a, a.z, b.b, b.z)

SELECT * FROM a JOIN b USING (z)
Currently it returns a.z, a.a, b.b.

Should it now return (a.a, z, b.b) as long as the tables have that logical
order, whether or not any other position (attnum / attphysnum) is different or
stay the same as now?

> Anyway, I think your 0001 is not a good first step.

FWIW this is just what you were previously suggesting at [1].

> I think a better
> first step is a patch that adds two more columns to pg_attribute:
> attphysnum and attlognum (or something like that.  This is the name I
> used years ago, but if you want to choose different, it's okay.)  In
> 0001, these columns would all be always identical, and there's no
> functionality to handle the case where they differ (probably even add
> some Assert that they are identical).  The idea behind these three
> columns is: attnum is a column identity and it never changes from the
> first value that is assigned to the column.  attphysnum represents the
> physical position of the table.  attlognum is the position where the
> column appears for user interaction.

I'm not following.  If we keep attnum as the official identity position and
use attlognum as the position that should be used in any interactive command,
wouldn't that risk to break every single client?

Imagine you have some framework that automatically generates queries based on
the catalog, if it sees table abc with:
c: attnum 1, attphysnum 1, attlognum 3
b: attnum 2, attphysnum 2, attlognum 2
a: attnum 3, attphysnum 3, attlognum 1

and you ask that layer to generate an insert with something like {'a': 'a',
'b': 'b', 'c': 'c'}, what would prevent it from generating:

INSERT INTO abc VALUES ('c', 'b', 'a');

while attlognum says it should have been

INSERT INTO abc VALUES ('a', 'b', 'c');

> In a 0002 patch, you would introduce backend support for the case where
> attlognum differs from the other two; but the other two are always the
> same and it's okay if the server misbehaves or crashes if attphysnum is
> different from attnum (best: keep the asserts that they are always the
> same).  Doing it this way limits the number of cases that you have to
> deal with, because there will be enough difficulty already.  You need to
> change RTE expansion everywhere: *-expansion, COPY, JOIN, expansion of
> SQL function results, etc ...  even psql \d ;-)  But, again: the
> physical position is always column identity and there's no way to
> reorder the columns physically for storage efficiency.

Just to clarify my understanding, apart from the fact that I'm only using
attphysnum (for your attnum and attphysnum) and attnum (for your attlognum), is
there any difference in the behavior with what I started to implement (if what
I started to implement was finished of course) and what you're saying here?

Also, about the default values evaluation (see [2]), should it be tied to your
attnum, attphysnum or attlognum?

> You could put ALTER TABLE support for moving columns as 0003.  (So
> testing for 0002 would just be some UPDATE sentences or some hack that
> lets you test various cases.)
>
> In a 0004 patch, you would introduce backend support for attphysnum to
> be different.  Probably no DDL support yet, since maybe we don't want
> that, but instead we would like the server to figure out the best
> possible packing based on alignment padding, nullability varlenability.
> So testing for this part is again just some UPDATEs.
>
> I think 0001+0002 are already a submittable patchset.

I think that supporting at least a way to specify the logical order during the
table creation should be easy to implement (there shouldn't be any
question on whether it needs to invalidate any cache or what lock level to
use), and could also be added in the initial submission without much extra
efforts, which could help with the testing.

[1] https://www.postgresql.org/message-id/202108181639.xjuovrpwgkr2@alvherre.pgsql
[2] https://www.postgresql.org/message-id/20220626024824.qnlpp6vikzjvuxs3%40jrouhaud



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Dagfinn Ilmari Mannsåker
Дата:
Сообщение: Re: Logging query parmeters in auto_explain
Следующее
От: Simon Riggs
Дата:
Сообщение: Re: Allowing REINDEX to have an optional name