Re: pg_dump --split patch

Поиск
Список
Период
Сортировка
От Joel Jacobson
Тема Re: pg_dump --split patch
Дата
Msg-id AANLkTim+sFO7N539V5C+yZFx7_fTFQxdHxtCUyhPD-3V@mail.gmail.com
обсуждение исходный текст
Ответ на Re: pg_dump --split patch  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: pg_dump --split patch  (Andrew Dunstan <andrew@dunslane.net>)
Re: pg_dump --split patch  (Greg Smith <greg@2ndquadrant.com>)
Список pgsql-hackers
2010/12/29 Tom Lane <tgl@sss.pgh.pa.us>

If you've solved the deterministic-ordering problem, then this entire
patch is quite useless.  You can just run a normal dump and diff it.


No, that's only half true.

Diff will do a good job minimizing the "size" of the diff output, yes, but such a diff is still quite useless if you want to quickly grasp the context of the change.

If you have a hundreds of functions, just looking at the changed source code is not enough to figure out which functions were modified, unless you have the brain power to memorize every single line of code and are able to figure out the function name just by looking at the old and new line of codes.

To understand a change to my database functions, I would start by looking at the top-level, only focusing on the names of the functions modified/added/removed.
At this stage, you want as little information as possible about each change, such as only the names of the functions.
To do this, get a list of changes functions, you cannot compare two full schema plain text dumps using diff, as it would only reveal the lines changed, not the name of the functions, unless you are lucky to get the name of the function within the (by default) 3 lines of copied context.

While you could increase the number of copied lines of context to a value which would ensure you would see the name of the function in the diff, that is not feasible if you want to quickly "get a picture" of the code areas modified, since you would then need to read through even more lines of diff output.

For a less database-centric system where you don't have hundreds of stored procedures, I would agree it's not an issue to keep track of changes by diffing entire schema files, but for extremely database-centric systems, such as the one we have developed at my company, it's not possible to "get the whole picture" of a change by analyzing diffs of entire schema dumps.

The patch has been updated:

*) Only spit objects with a namespace (schema) not being null
*) Append all objects of same tag (name) of same type (desc) of same namespace (schema) to the same file (i.e., do not append -2, -3, like before) (Suggested by David Wilson, thanks.)

I also tested to play around with "ORDER BY pronargs" and "ORDER BY pronargs DESC" to the queries in getFuncs() in pg_dump.c, but it had no effect to the order the functions of same name but different number of arguments were dumped.
Perhaps functions are already sorted?
Anyway, it doesn't matter that much, keeping all functions of the same name in the same file is a fair trade-off I think. The main advantage is the ability to quickly get a picture of the names of all changed functions, secondly to optimize the actual diff output.


--
Best regards,

Joel Jacobson
Glue Finance

E: jj@gluefinance.com
T: +46 70 360 38 01

Postal address:
Glue Finance AB
Box  549
114 11  Stockholm
Sweden

Visiting address:
Glue Finance AB
Birger Jarlsgatan 14
114 34 Stockholm
Sweden
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Revised patches to add table function support to PL/Tcl (TODO item)
Следующее
От: Andrew Dunstan
Дата:
Сообщение: Re: pg_dump --split patch