Обсуждение: [ADMIN] pg_dump making schema output consistent.

Поиск
Список
Период
Сортировка

[ADMIN] pg_dump making schema output consistent.

От
James Lawrence
Дата:
Hello,

first off I realize my approach may not be the best solution to my problem and I am open to other approaches. I'm looking for suggestions/feedback on the issue below, and if confirmation of if the collation order is what dictates the column order for table output.

The main goal is I'd like our CI system to create a new database, run all the migrations, dump the database schema and then compare it to what we have recorded and error out if there is a difference. We've had issues in the past due to a developer's database having left bad changes in their local DB which then get pushed into the schema, but not into the migrations, not super critical to the system but causes confusion.

unfortunately pg_dump has two problems for this usecase.
1) schema-only generates superfluous comments, which are inconsistent between version. easily fixed post dump, but would be nice to add a option to not dump these comments.
2) column ordering within a table is inconsistent. couple of our tables columns get reordered (in a consistent manner) depending on which developer generates the schema.

The inconsistent columns is the big problem, I suspect it is somehow related to the collation order of the database. What is particularly interesting are the differences between each OS's collation order for the same collation value, a great post about it.

Beyond that, a quick search shows this to be a fairly common question, with no real good solutions.

--
James Lawrence

Re: [ADMIN] pg_dump making schema output consistent.

От
Tom Lane
Дата:
James Lawrence <jljatone@gmail.com> writes:
> unfortunately pg_dump has two problems for this usecase.
> ...
> 2) column ordering within a table is inconsistent. couple of our tables
> columns get reordered (in a consistent manner) depending on which developer
> generates the schema.

> The inconsistent columns is the big problem, I suspect it is somehow
> related to the collation order of the database.

No, certainly not.  Ordinarily PG will preserve column order faithfully.
The only case I know of where it will not is if you have an inheritance
relationship and the child table's columns are out of order with respect
to the parent, which generally requires having done some sort of after-
the-fact DDL on the parent.  For instance

create table parent (a int, b int, c int);

create table child (d int, e int) inherits (parent);

-- at this point child has columns a, b, c, d, e

alter table parent add column f int;

-- at this point child has columns a, b, c, d, e, f


If you dump at this point, what'll be emitted is

create table parent (a int, b int, c int, f int);

create table child (d int, e int) inherits (parent);

If you restore that, the child has columns a, b, c, f, d, e, in that
order.  It'd be possible for pg_dump to preserve the original ordering,
but historically it hasn't done so, and from the aesthetic standpoint
it's not clear that'd be better.  After all, if you now do

create table child2 (d int, e int) inherits (parent);

then child2 is going to have a, b, c, f, d, e in either database.

I don't have enough info to say exactly how this is manifesting
as your problem, but I think you'll find it's something like that.
        regards, tom lane


-- 
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

Re: [ADMIN] pg_dump making schema output consistent.

От
James Lawrence
Дата:
When you said 'certainly not' you're referring to the collation order impacting the output order of columns within a table correct? which would be great I can stop exploring that avenue.

Your explaination is also a little confusing, just for clarity going to ask some explicit questions.

1) 'Ordinarily PG will preserve column order faithfully.' what is column order? do you just mean the order in which columns were added to a table? your example implies this.
2) what is the difference between 'PG' and pg_dump? (I'm reading it as PG = postgresql server process, pg_dump as the dump tool).

atm I'm reading your explaination as:
postgresql the server process oridinarily preserves columns by their creation order, pg_dump will reflect that order, but if you dump/restore then the relationship of the ordering changes for parent/child tables.

given that I've correctly understood your email, it doesn't directly address the issue I'm pointing out with column order in the schema dump.

the example you gave would be perfectly acceptable. I'm going to assume you left out the fact pg_dump -s also emits the parent columns + the inherit clause within the child table for brevity.

Given the fact without any schema details you hit on the fact these tables happen to use table inheritance, I'm inclined to lean towards that being the root cause of the differences in the schema dumps between machines.


On Mon, Sep 11, 2017 at 4:44 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
James Lawrence <jljatone@gmail.com> writes:
> unfortunately pg_dump has two problems for this usecase.
> ...
> 2) column ordering within a table is inconsistent. couple of our tables
> columns get reordered (in a consistent manner) depending on which developer
> generates the schema.

> The inconsistent columns is the big problem, I suspect it is somehow
> related to the collation order of the database.

No, certainly not.  Ordinarily PG will preserve column order faithfully.
The only case I know of where it will not is if you have an inheritance
relationship and the child table's columns are out of order with respect
to the parent, which generally requires having done some sort of after-
the-fact DDL on the parent.  For instance

create table parent (a int, b int, c int);

create table child (d int, e int) inherits (parent);

-- at this point child has columns a, b, c, d, e

alter table parent add column f int;

-- at this point child has columns a, b, c, d, e, f


If you dump at this point, what'll be emitted is

create table parent (a int, b int, c int, f int);

create table child (d int, e int) inherits (parent);

If you restore that, the child has columns a, b, c, f, d, e, in that
order.  It'd be possible for pg_dump to preserve the original ordering,
but historically it hasn't done so, and from the aesthetic standpoint
it's not clear that'd be better.  After all, if you now do

create table child2 (d int, e int) inherits (parent);

then child2 is going to have a, b, c, f, d, e in either database.

I don't have enough info to say exactly how this is manifesting
as your problem, but I think you'll find it's something like that.

                        regards, tom lane



--
James Lawrence

Re: [ADMIN] pg_dump making schema output consistent.

От
Tom Lane
Дата:
James Lawrence <jljatone@gmail.com> writes:
> Your explaination is also a little confusing, just for clarity going to ask
> some explicit questions.

> 1) 'Ordinarily PG will preserve column order faithfully.' what is column
> order? do you just mean the order in which columns were added to a table?
> your example implies this.

Whatever the table's column order is, as shown by e.g. "SELECT * FROM tab".

> 2) what is the difference between 'PG' and pg_dump? (I'm reading it as PG =
> postgresql server process, pg_dump as the dump tool).

In this case I was using it to mean the whole system, server and pg_dump
both.

> given that I've correctly understood your email, it doesn't directly
> address the issue I'm pointing out with column order in the schema dump.

Well, then I'm confused.  But you certainly haven't given enough
information for anyone to do more than guess at what your issue is.

> Given the fact without any schema details you hit on the fact these tables
> happen to use table inheritance, I'm inclined to lean towards that being
> the root cause of the differences in the schema dumps between machines.

It seems likely to me that it's something like this type of difference
having bit you when these databases were set up, ie the column orders
are already different on your two source databases.  Perhaps one was
made by dump/restore from the other, and the original had already had
some order-affecting DDL done on it.
        regards, tom lane


-- 
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin