Обсуждение: WIP: a way forward on bootstrap data

Поиск
Список
Период
Сортировка

WIP: a way forward on bootstrap data

От
John Naylor
Дата:
Hi,

I was looking through the archives one day for a few topics that
interest me, and saw there was continued interest in making bootstrap
data less painful [1] [2]. There were quite a few good ideas thrown
around in those threads, but not much in the way of concrete results.
I took a few of them as a starting point and threw together the
attached WIP patchset.

==
An overview (warning: long):

1 through 3 are small tweaks worth doing even without a data format
change. 4 through 7 are separated for readability and flexibility, but
should be understood as one big patch. I tried to credit as many
people's ideas as possible. Some things are left undone until basic
agreement is reached.

--
Patch 1 - Minor corrections

--
Patch 2 - Minor cleanup

Be more consistent style-wise, change a confusing variable name, fix
perltidy junk.

--
Patch 3 - Remove hard-coded schema information about pg_attribute from
genbki.pl.

This means slightly less code maintenance, but more importantly it's a
proving ground for mechanisms used in later patches.

1. Label a column's default value in the catalog header [3].

Note: I implemented it in the simplest way possible for now, which
happens to prevents a column from having both FORCE_(NOT_)NULL and a
default at the same time, but I think in practice that would almost
never matter. If more column options are added in the future, this
will have to be rewritten.

2. Add a new function to Catalog.pm to fill in a tuple with default
values. It will complain loudly if it can't find either a default or a
given value, so change the signature of emit_pgattr_row() so we can
pass a partially built tuple to it.

3. Format the schema macro entries according to their types.

4. Commit 8137f2c32322c624e0431fac1621e8e9315202f9 labeled which
columns are variable length. Expose that label so we can exclude those
columns from schema macros in a general fashion.

--
Patch 4 - Infrastructure for the data conversion

1. Copy a modified version of Catalogs() from Catalog.pm to a new
script that turns DATA()/(SH)DESCR() statements into serialized Perl
data structures in pg_*.dat files, preserving comments along the way.
Toast and index statements are unaffected. Although it's a one-off as
far as the core project is concerned, I imagine third-parties might
want access to this tool, so it's in the patch and not separate.

2. Remove data parsing from the original Catalogs() function and
rename it to reflect its new, limited role of extracting the schema
info from a header. The data files are handled by a new function.

3. Introduce a script to rewrite pg_*.dat files in a standard format
[4], strip default values, preserve comments and normalize blank
lines. It can also change default values on the fly. It is intended to
be run when adding new data.

4. Add default values to a few catalog headers for demonstration
purposes. There might be some questionable choices here. Review is
appreciated.

Note: At this point, the build is broken.

TODO: See what pgindent does to the the modified headers.

--
Patch 5 - Mechanical data conversion

This is the result of:

cd src/include/catalog
perl convert_header2dat.pl  list-of-catalog-headers-from-the-Makefile
perl -I ../../backend/catalog  rewrite_dat.pl  *.dat
rm *.bak

Note: The data has been stripped of all double-quotes for readability,
since the Perl hash literals have single quotes everywhere. Patches 6
and 7 restore them where needed.

--
Patch 6 - Hand edits

Re-doublequote values that are macros expanded by initdb.c, remove
stray comments, fix up whitespace, and do a minimum of comment editing
to reflect the new data format.

At this point the files are ready to take a look at. Format is the
central issue, of course. I tried to structure things so it wouldn't
be a huge lift to bikeshed on the format. See pg_authid.dat for a
conveniently small example. Each entry is 1 or 2 lines long, depending
on whether oid or (sh)descr is present.

Regarding pg_proc.dat, I think readability is improved, and to some
extent line length, although it's not great:

pg_proc.h:   avg=175, stdev=25
pg_proc.dat: avg=159, stdev=43

(grep -E '^DATA' pg_proc.h | awk '{print length}'
grep prosrc pg_proc.dat | awk '{print length}')

Many lines now fit in an editor window, but the longest line has
ballooned from 576 chars to 692. I made proparallel default to 'u' for
safety, but the vast majority are 's'. If we risked making 's' the
default, most entries would shrink by 20 chars. On the other hand,
human-readable types would inflate that again, but maybe patch 8 below
can help with editing. There was some discussion about abbreviated
labels that were mapped to column names - I haven't thought about that
yet.

--
Patch 7 - Distprep scripts

1. Teach genbki.pl and Gen_fmgrtab.pl to read the data files, and
arrange for the former to double-quote certain values so bootscanner.l
can read them.

2. Introduce Makefile dependencies on the data files.

The build works now.

Note: Since the original DATA() entries contained some (as far as I
can tell) useless quotes, the postgres.bki diff won't be zero, but it
will be close.

Performance note: On my laptop, running Make in the catalog dir with
no compilation goes from ~700ms on master to ~1000ms with the new data
files.

--
Patch 8 - SQL output

1. Write out postgres.sql, which allows you to insert all the source
BKI data into a development catalog schema for viewing and possibly
bulk-editing [5]. It retains oids, and (sh)descr fields in their own
columns, and implements default values.

2. Make it a distclean target.

TODO: Find a way to dump dev catalog tuples into the canonical format.

--
Not implemented yet:

-Gut the header files of DATA() statements. I'm thinking we should
keep the #defines in the headers, but see below:
-Update README and documentation

--
Future work:
-More lookups for human-readable types, operators, etc.
-Automate pg_type #defines a la pg_proc [6], which could also be used
to maintain ecpg's copy of pg_type #defines.

--
[1] https://www.postgresql.org/message-id/flat/20150220234142.GH12653%40awork2.anarazel.de

[2]
https://www.postgresql.org/message-id/flat/CAGoxFiFeW64k4t95Ez2udXZmKA%2BtazUFAaSTtYQLM4Jhzw%2B-pg%40mail.gmail.com

[3] https://www.postgresql.org/message-id/20161113171017.7sgaqdeq6jslmsr3%40alap3.anarazel.de

[4] https://www.postgresql.org/message-id/D8F1D509-6498-43AC-BEFC-052DFE16847A%402ndquadrant.com

[5] https://www.postgresql.org/message-id/20150304150712.GV29780%40tamriel.snowman.net

[6] https://www.postgresql.org/message-id/15697.1479161432%40sss.pgh.pa.us

==

I'll add this to the next commitfest soon.

-John Naylor

Вложения

Re: WIP: a way forward on bootstrap data

От
John Naylor
Дата:
There doesn't seem to be any interest in bootstrap data at the moment,
but rather than give up just yet, I've added a couple features to make
a data migration more compelling:

1. Human-readable types, operators, opfamilies, and access methods
2. Column abbreviations

For an example of both practices, an entry from pg_amop changes from

DATA(insert (   1976   21 21 1 s    95  403 0 ));

to

{ opf => 'btree/integer_ops', lt => 'int2', rt => 'int2', str => '1',
oper => '<(int2,int2)', am => 'btree' },


3. Reduce redundancy in pg_proc data by
-computing pronargs from proargtypes and
-leaving out prosrc if it's just a copy of proname.

This, plus a few column abbreviations drastically shrinks pg_proc.dat
line length, even with human-readable types:

pg_proc.h:   avg=175, stdev=25
pg_proc.dat: avg=92,  stdev=43

An example before:

DATA(insert OID = 300 (  float48ne         PGNSP PGUID 12 1 0 0 0 f f
f t t f i s 2 0 16 "700 701" _null_ _null_ _null_ _null_ _null_
float48ne _null_ _null_ _null_ ));

and after:

{ oid => '300',
  n => 'float48ne', lp => 't', p => 's', rt => 'bool', at => 'float4 float8' },

--
I've changed the numbering so that patches with the same number should
be taken as unit, separated only for readability. When referring to
the previous email overview, they map like this:

1-3 : unchanged
4-7 : 4A-4D
8   : N/A - I've left out the SQL generation for now, but I can add it later.

New in this patch set:
Patch 5 rips out the DATA() and DESCR() lines from the headers and
updates the comments to reflect that.
Patches 6A and 6B implement human-readable types etc. as described above.


-John Naylor

Вложения

Re: WIP: a way forward on bootstrap data

От
Peter Eisentraut
Дата:
On 12/13/17 04:06, John Naylor wrote:
> There doesn't seem to be any interest in bootstrap data at the moment,
> but rather than give up just yet, I've added a couple features to make
> a data migration more compelling:

I took a brief look at your patches, and there appear to be a number of
good cleanups in there at least.  But could you please send patches in
git format-patch format with commit messages, so we don't have to guess
what each patch does?

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: WIP: a way forward on bootstrap data

От
John Naylor
Дата:
On 12/13/17, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
> On 12/13/17 04:06, John Naylor wrote:
>> There doesn't seem to be any interest in bootstrap data at the moment,
>> but rather than give up just yet, I've added a couple features to make
>> a data migration more compelling:
>
> I took a brief look at your patches, and there appear to be a number of
> good cleanups in there at least.  But could you please send patches in
> git format-patch format with commit messages, so we don't have to guess
> what each patch does?

Thanks for taking a look and for pointing me to git format-patch.
That's much nicer than trying to keep emails straight. I've attached a
new patchset.

Note that 4-7 and 9-10 are units as far as the build is concerned.
Meaning, once 4 is applied, the build is broken until 7 is applied.
Also, postgres.bki won't diff 100% clean with the master branch
because of some useless quotes in the latter.

One thing that occured to me while looking over patch 0004 again: It's
now a bit uglier to handle indexing.h and toasting.h. I think it might
be cleaner to keep those statements in the header of the catalog they
refer to. That has the additional benefit of making the headers the
Single Point of Truth for a catalog schema.

TODO:
-Docs and README
-Finish SQL generation patch
-Consider generating pg_type #defines

-John Naylor

Вложения

Re: WIP: a way forward on bootstrap data

От
David Fetter
Дата:
On Thu, Dec 14, 2017 at 05:59:12PM +0700, John Naylor wrote:
> On 12/13/17, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
> > On 12/13/17 04:06, John Naylor wrote:
> >> There doesn't seem to be any interest in bootstrap data at the moment,
> >> but rather than give up just yet, I've added a couple features to make
> >> a data migration more compelling:
> >
> > I took a brief look at your patches, and there appear to be a number of
> > good cleanups in there at least.  But could you please send patches in
> > git format-patch format with commit messages, so we don't have to guess
> > what each patch does?
> 
> Thanks for taking a look and for pointing me to git format-patch.
> That's much nicer than trying to keep emails straight. I've attached a
> new patchset.

Thanks for your hard work on this.  It'll really make developing this
part of the code a lot more pleasant.

Unfortunately, it no longer applies to master.  Can we get a rebase?

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


Re: WIP: a way forward on bootstrap data

От
John Naylor
Дата:
On 12/20/17, David Fetter <david@fetter.org> wrote:
> Thanks for your hard work on this.  It'll really make developing this
> part of the code a lot more pleasant.

I hope so, thanks.

> Unfortunately, it no longer applies to master.  Can we get a rebase?

Hmm, it still applied for me, except when I forgot to gunzip the
larger patches. In any case, I've attached version 4 which contains
some recent improvements. It was rebased against master as of
6719b238e8f0ea. If it doesn't apply for you please let me know the
details.
--

New in this patch set:
-Remove code duplication and improve modularity in data parsing
-Update README at appropriate points
-Shift some hunks around to make patches more focused and readable
-Comment and commit message editing
-Patch 11 reduces indentation
-Patch 12 moves all the toast/index commands into the individual
catalog headers and simplifies some #includes (Note: I failed to shave
the yak of selinux, so the #include changes for contrib/sepgsql are
untested)

At this point, I think it's no longer a WIP, and will only make
further changes based on review or if I find a mistake.

Left for future projects:
-SQL generation for querying source bootstrap data
-Generating pg_type #defines

-John Naylor

Вложения

Re: WIP: a way forward on bootstrap data

От
Alvaro Herrera
Дата:
Pushed 0001 and 0002.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: WIP: a way forward on bootstrap data

От
Alvaro Herrera
Дата:
Hmm, patch 0008 removes data lines from the .h but leaves the dependent
OID define lines around:

#define BTREE_AM_OID 403

This is not good, because then the define depends on data that appears
in a distant file.  Another consideration is that the current system has
the property that these OIDs are discoverable by a hacker by navigating
to the containing .h file; and a missing symbol is easily fixable if
they need to hardcode the OID for which there isn't a symbol yet.

Maybe a generated .h file containing defines for OIDs from all catalogs
is okay.  Of course, not all symbols are to be listed -- we should have
a special marker in the data lines for those that are.  Maybe something
like this

{ oid => '403', descr => 'b-tree index access method',
  amname => 'btree', amhandler => 'bthandler', amtype => 'i',
  cpp_symbol => 'BTREE_AM_OID' },

(where 'cpp_symbol' would be skipped by genbki explicitly).

Any better ideas?

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: WIP: a way forward on bootstrap data

От
Robert Haas
Дата:
On Thu, Dec 21, 2017 at 5:32 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> Hmm, patch 0008 removes data lines from the .h but leaves the dependent
> OID define lines around:

Just a question here -- do we actually have consensus on doing the
stuff that these patches do?  Because I'm not sure we do.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: WIP: a way forward on bootstrap data

От
John Naylor
Дата:
On 12/22/17, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> Hmm, patch 0008 removes data lines from the .h but leaves the dependent
> OID define lines around:
>
> #define BTREE_AM_OID 403
>
> This is not good, because then the define depends on data that appears
> in a distant file.

I see what you mean.

> Another consideration is that the current system has
> the property that these OIDs are discoverable by a hacker by navigating
> to the containing .h file; and a missing symbol is easily fixable if
> they need to hardcode the OID for which there isn't a symbol yet.

I'm not sure I follow you here.

> Maybe a generated .h file containing defines for OIDs from all catalogs
> is okay.  Of course, not all symbols are to be listed -- we should have
> a special marker in the data lines for those that are.  Maybe something
> like this
>
> { oid => '403', descr => 'b-tree index access method',
>   amname => 'btree', amhandler => 'bthandler', amtype => 'i',
>   cpp_symbol => 'BTREE_AM_OID' },
>
> (where 'cpp_symbol' would be skipped by genbki explicitly).

The last part makes sense and would be a fairly mechanical change. I'm
not sure of the best way to include those generated symbols back in
the code again, though. I think a single file might not be desirable
because of namespace pollution. The alternative would be to have, say,
pg_am.h include pg_am_sym.h. More complex but doable. Also, no need to
skip non-data values explicitly. The code knows where to find the
schema. :-)

Thanks for pushing 1 and 2, BTW.

-John Naylor


Re: WIP: a way forward on bootstrap data

От
Alvaro Herrera
Дата:
Robert Haas wrote:
> On Thu, Dec 21, 2017 at 5:32 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> > Hmm, patch 0008 removes data lines from the .h but leaves the dependent
> > OID define lines around:
> 
> Just a question here -- do we actually have consensus on doing the
> stuff that these patches do?  Because I'm not sure we do.

My reading of the old threads (linked provided by John in his initial
email in this thread) is that we have a consensus that we want to get
rid of the old data representation, because it causes endless amount of
pain.  The proposed format seems to satisfy the constraints that we all
discussed, namely

1. be easier to modify than the current format,
2. in particular, allow for default values on certain columns
3. not cause git merge problems because of too similar lines in every
record
4. not require onerous Perl modules

The one thing we seem to lack is a tool to edit the data files, as you
suggested[1].  Stephen Frost mentioned[2] that we could do this by
allowing the .data files be loaded in a database table, have the changes
made via SQL, then have a way to create an updated .data file.  Tom
said[3] he didn't like that particular choice.

So we already have Catalog.pm that (after these patches) knows how to
load .data files; we could use that as a basis to enable easy oneliners
to do whatever editing is needed.

Also, the CPP symbols remaining in the pg_*.h that I commented yesterday
was already mentioned[4] before.

[1] https://www.postgresql.org/message-id/CA%2BTgmoa4%3D5oz7wSMcLNLh8h6cXzPoxxNJKckkdSQA%2BzpUG0%2B0A%40mail.gmail.com
[2] https://www.postgresql.org/message-id/20150304150712.GV29780%40tamriel.snowman.net
[3] https://www.postgresql.org/message-id/24766.1478821303%40sss.pgh.pa.us
[4] https://www.postgresql.org/message-id/15697.1479161432@sss.pgh.pa.us

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: WIP: a way forward on bootstrap data

От
John Naylor
Дата:
I  wrote:

> On 12/22/17, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>> Maybe a generated .h file containing defines for OIDs from all catalogs
>> is okay.  Of course, not all symbols are to be listed -- we should have
>> a special marker in the data lines for those that are.  Maybe something
>> like this
>>
>> { oid => '403', descr => 'b-tree index access method',
>>   amname => 'btree', amhandler => 'bthandler', amtype => 'i',
>>   cpp_symbol => 'BTREE_AM_OID' },
>>
>> (where 'cpp_symbol' would be skipped by genbki explicitly).
>
> The last part makes sense and would be a fairly mechanical change. I'm
> not sure of the best way to include those generated symbols back in
> the code again, though. I think a single file might not be desirable
> because of namespace pollution.
[snip]

I've attached version 5 (rebased onto be2343221fb), which removes OID
#define symbols from the headers and stores them with the records they
refer to.

I went ahead with your suggestion to try a single generated header. I
believe there is no chance of namespace pollution since the symbols
already have a nomenclature that reflects what catalog they belong to.
This required some additional Makefile changes, since some code
outside the backend needs certain OID symbols to be visible. There are
probably bugs, but I wanted to share the initial design. The MSVC
changes are untested. In addition, FindDefinedSymbol() now doesn't
work for catalog headers, so I added a new function to search within
the data.

On the plus side, there is now a mechanism to generate pg_type OID
symbols, and ECPG's knowledge of types is now maintained
automatically.

Since I had to rebase over recent additions to SP-GiST opclasses
anyway, I restructured the patches to have a clean separation between
data migration and data compaction. This makes the patches easier to
follow.

The pg_proc defaults have been tweaked slightly to match those
suggested by Andrew Dunstan [1].

There are now human-readable entries for oprcom and oprnegate in
pg_operator.dat.

Finally, assorted cosmetic improvements and README/comment editing.

[1] https://www.postgresql.org/message-id/b76d153a-33d7-7827-746c-1109f7bf529d%40dunslane.net
--

-John Naylor

Вложения

Re: WIP: a way forward on bootstrap data

От
Alvaro Herrera
Дата:
Pushed 0001 with some changes of my own.  I'm afraid I created a few
conflicts for the later patches in your series; please rebase.

I don't think we introduced anything that would fail on old Perls, but
let's see what buildfarm has to say.

Others: Now is the time to raise concerns related to the proposed file
formats and tooling, so please do have a look when you have a moment.
At this stage, the proposed data format seems a good choice to me.

Thanks

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: WIP: a way forward on bootstrap data

От
Tom Lane
Дата:
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> Others: Now is the time to raise concerns related to the proposed file
> formats and tooling, so please do have a look when you have a moment.
> At this stage, the proposed data format seems a good choice to me.

It's not very clear to me what the proposed data format actually is,
and I don't really want to read several hundred KB worth of patches
in order to reverse-engineer that information.  Nor do I see
anything in the patch list that obviously looks like it updates
doc/src/sgml/bki.sgml to explain things.

So could we have an explanation of what it is we're agreeing to?

            regards, tom lane


Re: WIP: a way forward on bootstrap data

От
David Fetter
Дата:
On Fri, Jan 12, 2018 at 11:38:54AM -0500, Tom Lane wrote:
> Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> > Others: Now is the time to raise concerns related to the proposed
> > file formats and tooling, so please do have a look when you have a
> > moment.  At this stage, the proposed data format seems a good
> > choice to me.
> 
> It's not very clear to me what the proposed data format actually is,
> and I don't really want to read several hundred KB worth of patches
> in order to reverse-engineer that information.  Nor do I see
> anything in the patch list that obviously looks like it updates
> doc/src/sgml/bki.sgml to explain things.
> 
> So could we have an explanation of what it is we're agreeing to?

That would be awesome.  A walk-through example or two would also help.

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


Re: WIP: a way forward on bootstrap data

От
Alvaro Herrera
Дата:
Tom Lane wrote:
> Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> > Others: Now is the time to raise concerns related to the proposed file
> > formats and tooling, so please do have a look when you have a moment.
> > At this stage, the proposed data format seems a good choice to me.
> 
> It's not very clear to me what the proposed data format actually is,
> and I don't really want to read several hundred KB worth of patches
> in order to reverse-engineer that information.  Nor do I see
> anything in the patch list that obviously looks like it updates
> doc/src/sgml/bki.sgml to explain things.
> 
> So could we have an explanation of what it is we're agreeing to?

Here's a small sample pg_proc entry:

 { oid => '2147', descr => 'number of input rows for which the input expression is not null',
  n => 'count', proisagg => 't', v => 'i', p => 's', rt => 'int8', at => 'any', s => 'aggregate_dummy' },

An pg_amop entry:
{ opf => 'btree/integer_ops', lt => 'int2', rt => 'int2', str => '1', oper => '<(int2,int2)', am => 'btree' },

Notes:
1. this is Perl data; it is read with 'eval' without any external modules.
2. the pg_proc entry has been compressed to two lines, to avoid
   content-free lines that would easily confuse git merge, but keep line
   length reasonable.
3. references to objects in other catalogs are by name, such as "int8"
   or "btree/integer_ops" rather than OID.
4. for each attribute, an abbreviation can be declared.  In the
   pg_proc sample we have "n" which stands for proname, because we have
   this line:
   +   NameData    proname BKI_ABBREV(n);

I think John has gone overboard with some of these choices, but we can
argue the specific choices once we decide that abbreviation is a good
idea.  (Prior discussion seems to suggest we already agreed on that.)

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: WIP: a way forward on bootstrap data

От
Tom Lane
Дата:
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> Tom Lane wrote:
>> So could we have an explanation of what it is we're agreeing to?

> Here's a small sample pg_proc entry:

>  { oid => '2147', descr => 'number of input rows for which the input expression is not null',
>   n => 'count', proisagg => 't', v => 'i', p => 's', rt => 'int8', at => 'any', s => 'aggregate_dummy' },

> An pg_amop entry:
> { opf => 'btree/integer_ops', lt => 'int2', rt => 'int2', str => '1', oper => '<(int2,int2)', am => 'btree' },

> Notes:
> 1. this is Perl data; it is read with 'eval' without any external modules.

Check.  Where is it coming from --- I suppose we aren't going to try to
store this in the existing .h files?  What provisions will there be for
comments?

> 2. the pg_proc entry has been compressed to two lines, to avoid
>    content-free lines that would easily confuse git merge, but keep line
>    length reasonable.

Seems like we would almost need a per-catalog convention on how to lay out
the entries, or else we're going to end up (over time) with lots of cowboy
coding leading to entries that look randomly different from the ones
around them.

> 3. references to objects in other catalogs are by name, such as "int8"
>    or "btree/integer_ops" rather than OID.

+1

> 4. for each attribute, an abbreviation can be declared.  In the
>    pg_proc sample we have "n" which stands for proname, because we have
>    this line:
>    +   NameData    proname BKI_ABBREV(n);

I think single-letter abbreviations here are a pretty bad space vs
readability tradeoff, particularly for wider catalogs where it risks
ambiguity.  The pg_amop sample you show looks noticeably more legible than
the other one.  Still, this is something we can debate on a case-by-case
basis, it's not a defect in the mechanism.

One other question is how we'll verify the conversion.  Is there an
expectation that the .bki file immediately after the conversion will
be identical to immediately before?

            regards, tom lane


Re: WIP: a way forward on bootstrap data

От
John Naylor
Дата:
Tom, everyone,
It's getting late in my timezone, but I wanted to give a few quick
answers. I'll follow up tomorrow. Thanks Alvaro for committing my
refactoring of pg_attribute data creation. I think your modifications
are sensible and I'll rebase soon.

On 1/13/18, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> It's not very clear to me what the proposed data format actually is,
> and I don't really want to read several hundred KB worth of patches
> in order to reverse-engineer that information.  Nor do I see
> anything in the patch list that obviously looks like it updates
> doc/src/sgml/bki.sgml to explain things.

Alvaro gave a good overview, so I'll just point out a few things.

-Patches 0002 through 0007 represent a complete one-to-one migration
of data entries. I didn't see much in bki.sgml specific to the current
format, so my documentation changes are confined largely to the
README, in patch 0005.
-Patches 0008 and 0009 implement techniques to make the data lines
shorter. My choices are certainly debatable. There is a brief addition
to the README in patch 0008. The abbreviation technique was only used
in three catalogs to demonstrate.
-Patches 0010 and 0011 implement human-readable OID references.
-Patches 0012 and 0013 are cosmetic, but invasive.

> Seems like we would almost need a per-catalog convention on how to lay out
> the entries, or else we're going to end up (over time) with lots of cowboy
> coding leading to entries that look randomly different from the ones
> around them.

If I understand your concern correctly, the convention is enforced by
a script (rewrite_dat.pl). At the very least this would be done at the
same time as pg_indent and perltidy. To be sure, because of default
values many entries will look randomly different from the ones around
them regardless. I have a draft patch to load the source data into
tables for viewing, but it's difficult to rebase, so I thought I'd
offer that enhancement later.

> One other question is how we'll verify the conversion.  Is there an
> expectation that the .bki file immediately after the conversion will
> be identical to immediately before?

Not identical. First, as part of the base migration, I stripped almost
all double quotes from the data entries since the new Perl hash values
are already single-quoted. (The exception is macros expanded by
initdb.c) I made genbki.pl add quotes on output to match what
bootscanner.l expects. Where a simple rule made it possible, it also
matches the original .bki. The new .bki will only diff where the
current data has superfluous quotes. (ie. "0", "sql"). Second, if the
optional cosmetic patch 0013 is applied, the individual index and
toast commands will be in a different order.

> Check.  Where is it coming from --- I suppose we aren't going to try to
> store this in the existing .h files?  What provisions will there be for
> comments?

Yes, they're in ".dat" files. Perl comments (#) on their own line are
supported. I migrated all existing comments from the header files as
part of the conversion. This is scripted, so I can rebase over new
catalog entries that get committed.

> I think single-letter abbreviations here are a pretty bad space vs
> readability tradeoff, particularly for wider catalogs where it risks
> ambiguity.

Ironically, I got that one from you [1] ;-), but if you have a
different opinion upon seeing concrete, explicit examples, I think
that's to be expected.

--
Now is probably a good time to disclose concerns of my own:
1. MSVC dependency tracking is certainly broken until such time as I
can shave that yak and test.
2. Keeping the oid symbols with the data entries required some
Makefile trickery to make them visible to .c files outside the backend
(patch 0007). It builds fine, but the dependency tracking might have
bugs.

--
[1] https://www.postgresql.org/message-id/15697.1479161432%40sss.pgh.pa.us

Thanks,
John Naylor


Re: WIP: a way forward on bootstrap data

От
Peter Eisentraut
Дата:
On 1/12/18 12:24, Alvaro Herrera wrote:
> Here's a small sample pg_proc entry:
> 
>  { oid => '2147', descr => 'number of input rows for which the input expression is not null',
>   n => 'count', proisagg => 't', v => 'i', p => 's', rt => 'int8', at => 'any', s => 'aggregate_dummy' },
> 
> An pg_amop entry:
> { opf => 'btree/integer_ops', lt => 'int2', rt => 'int2', str => '1', oper => '<(int2,int2)', am => 'btree' },
> 
> Notes:
> 1. this is Perl data; it is read with 'eval' without any external modules.
> 2. the pg_proc entry has been compressed to two lines, to avoid
>    content-free lines that would easily confuse git merge, but keep line
>    length reasonable.

I don't think I like this.  I know pg_proc.h is a pain to manage, but at
least right now it's approachable programmatically.  I recently proposed
to patch to replace the columns proisagg and proiswindow with a combined
column prokind.  I could easily write a small Perl script to make that
change in pg_proc.h, because the format is easy to parse and has one
line per entry.  With this new format, that approach would no longer
work, and I don't know what would replace it.

> 3. references to objects in other catalogs are by name, such as "int8"
>    or "btree/integer_ops" rather than OID.

I think we could already do this by making more use of things like
regtype and regproc.  That should be an easy change to make.

> 4. for each attribute, an abbreviation can be declared.  In the
>    pg_proc sample we have "n" which stands for proname, because we have
>    this line:
>    +   NameData    proname BKI_ABBREV(n);

I'm afraid a key value system would invite writing the attributes in
random order and create a mess over time.

But if we want to do it, I think we could also add it to the current BKI
format.  The same goes for defining default values for some columns.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: WIP: a way forward on bootstrap data

От
David Fetter
Дата:
On Fri, Jan 12, 2018 at 04:22:26PM -0500, Peter Eisentraut wrote:
> On 1/12/18 12:24, Alvaro Herrera wrote:
> > Here's a small sample pg_proc entry:
> > 
> >  { oid => '2147', descr => 'number of input rows for which the input expression is not null',
> >   n => 'count', proisagg => 't', v => 'i', p => 's', rt => 'int8', at => 'any', s => 'aggregate_dummy' },
> > 
> > An pg_amop entry:
> > { opf => 'btree/integer_ops', lt => 'int2', rt => 'int2', str => '1', oper => '<(int2,int2)', am => 'btree' },
> > 
> > Notes:
> > 1. this is Perl data; it is read with 'eval' without any external modules.
> > 2. the pg_proc entry has been compressed to two lines, to avoid
> >    content-free lines that would easily confuse git merge, but keep line
> >    length reasonable.
> 
> I don't think I like this.  I know pg_proc.h is a pain to manage,
> but at least right now it's approachable programmatically.  I
> recently proposed to patch to replace the columns proisagg and
> proiswindow with a combined column prokind.  I could easily write a
> small Perl script to make that change in pg_proc.h, because the
> format is easy to parse and has one line per entry.  With this new
> format, that approach would no longer work, and I don't know what
> would replace it.

How about ingesting with Perl, manipulating there, and spitting back
out as Perl data structures?

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


Re: WIP: a way forward on bootstrap data

От
Alvaro Herrera
Дата:
Peter Eisentraut wrote:
> On 1/12/18 12:24, Alvaro Herrera wrote:
> > Here's a small sample pg_proc entry:
> > 
> >  { oid => '2147', descr => 'number of input rows for which the input expression is not null',
> >   n => 'count', proisagg => 't', v => 'i', p => 's', rt => 'int8', at => 'any', s => 'aggregate_dummy' },
> > 
> > An pg_amop entry:
> > { opf => 'btree/integer_ops', lt => 'int2', rt => 'int2', str => '1', oper => '<(int2,int2)', am => 'btree' },
> > 
> > Notes:
> > 1. this is Perl data; it is read with 'eval' without any external modules.
> > 2. the pg_proc entry has been compressed to two lines, to avoid
> >    content-free lines that would easily confuse git merge, but keep line
> >    length reasonable.
> 
> I don't think I like this.  I know pg_proc.h is a pain to manage, but at
> least right now it's approachable programmatically.  I recently proposed
> to patch to replace the columns proisagg and proiswindow with a combined
> column prokind.  I could easily write a small Perl script to make that
> change in pg_proc.h, because the format is easy to parse and has one
> line per entry.  With this new format, that approach would no longer
> work, and I don't know what would replace it.

The idea in my mind is that you'd write a Perl program to do such
changes, yeah.  If the code we supply contains enough helpers and a few
samples, it should be reasonably simple for people that don't do much
Perl.

The patch series does contain a few helper programs to write the data
files.  I haven't looked in detail what can they do and what they cannot.

> > 3. references to objects in other catalogs are by name, such as "int8"
> >    or "btree/integer_ops" rather than OID.
> 
> I think we could already do this by making more use of things like
> regtype and regproc.  That should be an easy change to make.

Well, that assumes we *like* the current format, which I think is not a
given ... more the opposite.

> > 4. for each attribute, an abbreviation can be declared.  In the
> >    pg_proc sample we have "n" which stands for proname, because we have
> >    this line:
> >    +   NameData    proname BKI_ABBREV(n);
> 
> I'm afraid a key value system would invite writing the attributes in
> random order and create a mess over time.


Yeah, I share this concern.  But you could fix it if the Perl tooling to
write these files had a hardcoded list to work with.  Maybe we could put
it in a declaration of sorts at the start of each data file.

> But if we want to do it, I think we could also add it to the current BKI
> format.  The same goes for defining default values for some columns.

As above -- do we really like our current format so much that we're
satisfied with minor tweaks?

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: WIP: a way forward on bootstrap data

От
Andres Freund
Дата:
On 2018-01-12 18:36:14 -0300, Alvaro Herrera wrote:
> As above -- do we really like our current format so much that we're
> satisfied with minor tweaks?

A definite no from here.  I think especially pg_proc desperately needs
something key=value like to be understandable, and that very clearly
seems to be something we can't do in the current format.

- Andres


Re: WIP: a way forward on bootstrap data

От
Tom Lane
Дата:
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> Peter Eisentraut wrote:
>> I don't think I like this.  I know pg_proc.h is a pain to manage, but at
>> least right now it's approachable programmatically.  I recently proposed
>> to patch to replace the columns proisagg and proiswindow with a combined
>> column prokind.  I could easily write a small Perl script to make that
>> change in pg_proc.h, because the format is easy to parse and has one
>> line per entry.  With this new format, that approach would no longer
>> work, and I don't know what would replace it.

> The idea in my mind is that you'd write a Perl program to do such
> changes, yeah.  If the code we supply contains enough helpers and a few
> samples, it should be reasonably simple for people that don't do much
> Perl.

It would be good to see a sample --- for a concrete example, how about
creating a Perl script to do the conversion Peter mentions?

>>> 3. references to objects in other catalogs are by name, such as "int8"
>>> or "btree/integer_ops" rather than OID.

>> I think we could already do this by making more use of things like
>> regtype and regproc.  That should be an easy change to make.

> Well, that assumes we *like* the current format, which I think is not a
> given ... more the opposite.

Note that we *can't* easily improve that given the current tooling,
mainly because the bootstrap-time capabilities of regproc_in et al are
so limited.  We don't even have regxxx types for many of the other
cross-reference columns like opclass references, and I don't think
I want to build them because they'd also have bootstrapping issues.

According to my understanding, part of what's going on here is that
we're going to teach genbki.pl to parse these object references and
convert them to hard-coded OIDs in the emitted BKI file.  That seems
good to me, but one thing we're going to need is a spec for how
genbki.pl knows what to do.

>> I'm afraid a key value system would invite writing the attributes in
>> random order and create a mess over time.

> Yeah, I share this concern.  But you could fix it if the Perl tooling to
> write these files had a hardcoded list to work with.  Maybe we could put
> it in a declaration of sorts at the start of each data file.

This is more or less the same concern I stated upthread.  But the
impression I'm getting is that we expect these files to often be written
out from a Perl script, so it's mostly a question of how we teach the
Perl scripts to emit stylistically consistent data.  Then we can use the
Perl scripts as a kind of pgindent for this data.

>> But if we want to do it, I think we could also add it to the current BKI
>> format.  The same goes for defining default values for some columns.

> As above -- do we really like our current format so much that we're
> satisfied with minor tweaks?

I'm sure not.  This will be a big change, without a doubt, but I think
we'll end up in a better place.

            regards, tom lane


Re: WIP: a way forward on bootstrap data

От
John Naylor
Дата:
On 1/13/18, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:

> I'm afraid a key value system would invite writing the attributes in
> random order and create a mess over time.

A developer can certainly write them in random order, and it will
still work. However, in patch 0002 I have a script to enforce a
standard appearance. Of course, for it to work, you have to run it. I
describe it, if rather tersely, in the README changes in patch 0008.
Since several people have raised this concern, I will go into a bit
more depth here. Perhaps I should reuse some of this language for the
README to improve it.

src/include/catalog/rewrite_dat.pl knows where to find the schema of
each catalog, namely the pg_*.h header, accessed via ParseHeader() in
Catalog.pm. It writes key/value pairs in the order found in the
schema:

{ key_1 => 'value_1', key_2 => 'value_2', ..., key_n => 'value_n' }

The script also has an array of four hard-coded metadata fields: oid,
oid_symbol, descr, and shdescr. If any of these are present, they will
go on their own line first, in the order given:

{ oid => 9999, oid_symbol => 'FOO_OID', descr => 'comment on foo',
  key_1 => 'value_1', key_2 => 'value_2', ..., key_n => 'value_n' }

> I don't think I like this.  I know pg_proc.h is a pain to manage, but at
> least right now it's approachable programmatically.  I recently proposed
> to patch to replace the columns proisagg and proiswindow with a combined
> column prokind.  I could easily write a small Perl script to make that
> change in pg_proc.h, because the format is easy to parse and has one
> line per entry.  With this new format, that approach would no longer
> work, and I don't know what would replace it.

I've attached four diffs/patches to walk through how you would replace
the columns proisagg and proiswindow with a combined column prokind.

Patch 01: Add new prokind column to pg_proc.h, with a default of 'n'.
In many cases, this is all you would have to do, as far as
bootstrapping is concerned.

Diff 02: This is a one-off script diffed against rewrite_dat.pl. In
rewrite_dat.pl, I have a section with this comment, and this is where
I put the one-off code:

# Note: This is also a convenient place to do one-off
# bulk-editing.

(I haven't documented this with explicit examples, so I'll have to remedy that)

You would run it like this:

cd src/include/catalog
perl -I ../../backend/catalog/  rewrite_dat_with_prokind.pl  pg_proc.dat

While reading pg_proc.dat, the default value for prokind is added
automatically. We inspect proisagg and proiswindow, and change prokind
accordingly. pg_proc.dat now has all three columns, prokind, proisagg,
and proiswindow.

Patch 03: Remove old columns from pg_proc.h

Now we run the standard rewrite:

perl -I ../../backend/catalog/ rewrite_dat.pl pg_proc.dat

Any values not found in the schema will simply not be written to
pg_proc.dat, so the old columns are now gone.

The result is found in patch 04.
--

Note: You could theoretically also load the source data into tables,
do the updates with SQL, and dump back out again. I made some progress
with this method, but it's not complete. I think the load and dump
steps add too much complexity for most use cases, but it's a
possibility.


-John Naylor

Вложения

Re: WIP: a way forward on bootstrap data

От
John Naylor
Дата:
On 1/13/18, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> According to my understanding, part of what's going on here is that
> we're going to teach genbki.pl to parse these object references and
> convert them to hard-coded OIDs in the emitted BKI file.  That seems
> good to me, but one thing we're going to need is a spec for how
> genbki.pl knows what to do.

I don't know if it qualifies as a spec, but here's my implementation:

Use dummy type aliases in the header files: regam, regoper, regopf, and regtype
These are #defined away in genbki.h:

+/* ----------------
+ *    Some columns of type Oid have human-readable entries that are
+ *    resolved when creating postgres.bki.
+ * ----------------
+ */
+#define regam Oid
+#define regoper Oid
+#define regopf Oid
+#define regtype Oid

Likewise, in genbki.pl (and I just noticed a typo, s/names/types/):

+# We use OID aliases to indicate when to do OID lookups, so column names
+# have to be turned back into 'oid' before writing the CREATE command.
+my %RENAME_REGOID = (
+    regam => 'oid',
+    regoper => 'oid',
+    regopf => 'oid',
+    regtype => 'oid');
+

When genbki.pl sees one of these type aliases, it consults the
appropriate lookup table, exactly how we do now for regproc. One
possibly dubious design point is that I declined to teach the
pg_attribute logic about this, so doing lookups in tables with schema
macros has to be done explicitly. There is only one case of this right
now, and I noted the tradeoff:

+                # prorettype
+                # Note: We could handle this automatically by using the
+                # 'regtype' alias, but then we would have to teach
+                # morph_row_for_pgattr() to change the attribute type back to
+                # oid. Since we have to treat pg_proc differently anyway,
+                # just do the type lookup manually here.
+                my $rettypeoid = $regtypeoids{ $bki_values{prorettype}};
+                $bki_values{prorettype} = $rettypeoid
+                  if defined($rettypeoid);

This is all in patch 0011.

-John Naylor


Re: WIP: a way forward on bootstrap data

От
John Naylor
Дата:
On 1/12/18, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> Pushed 0001 with some changes of my own.  I'm afraid I created a few
> conflicts for the later patches in your series; please rebase.

Attached, rebased over 255f14183ac. This time it's a single .tar.gz.
Let me know if single files are better. Here's the renumbered overview
copied from one of my emails responding to Tom:

-Patches 0001 through 0006 represent a complete one-to-one migration
of data entries. I didn't see much in bki.sgml specific to the current
format, so my documentation changes are confined largely to the
README, in patch 0004.
-Patches 0007 and 0008 implement techniques to make the data lines
shorter. My choices are certainly debatable. There is a brief addition
to the README in patch 0007. The abbreviation technique was only used
in three catalogs to demonstrate.
-Patches 0009 and 0010 implement human-readable OID references.
-Patches 0011 and 0012 are cosmetic, but invasive.

-John Naylor

Вложения

Re: WIP: a way forward on bootstrap data

От
Tom Lane
Дата:
John Naylor <jcnaylor@gmail.com> writes:
> On 1/12/18, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>> Pushed 0001 with some changes of my own.  I'm afraid I created a few
>> conflicts for the later patches in your series; please rebase.

> Attached, rebased over 255f14183ac.

I decided that I wasn't going to get answers about the things I cared
about without looking through the patchset, so I've now done so,
in a once-over-lightly fashion.  Here's a quick summary of what it
does, for those who may be equally confused, and then some comments
(not really a review).

The patch removes DATA and (SH)DESCR lines from all the catalog/pg_*.h
files, as well as the #defines for OID-value macros, and puts that
information into pg_*.dat files corresponding to the .h files, in a form
that is easily readable and writable by Perl scripts.  Comments associated
with this info are also transferred to the .dat files, and the scripts
that can rewrite the .dat files are able to preserve the comments.

genbki.pl is able to generate postgres.bki and other initdb input files
directly from the .dat files.  It also creates a single header file
src/include/catalog/oid_symbols.h that contains all of the OID-value
macros that were formerly in the pg_*.h files.

The patch gets rid of the need to write hard-wired OIDs when referencing
operators, opfamilies, etc in the .dat files; now you can write their
names instead.  genbki.pl will look up the names and substitute numeric
OIDs in the emitted postgres.bki file.  There are also provisions to
shorten the .dat files through the use of abbreviated field names,
default values for fields, and some other less-general techniques.

--

OK, now some comments:

I'm not sure about the decision to move all the OID macros into
one file; that seems like namespace pollution.  It's especially
odd that you did that but did not consolidate fmgroids.h with
the macros for symbols from other catalogs.  Now it's true that
we need all those symbols to be distinct, since it needs to be
possible for .c files to include any combination of pg_*.h headers,
but I don't think it's an especially good idea that you have to
include all of them or none.  Even if we're willing to put up with
that namespace pollution for backend code, there are clients that
currently include pg_*.h headers to get some of those macros, and
they're likely to be less happy about it.

The design I'd kind of imagined was one generated file of #define's
per pg_*.h file, not just one giant one.

It would be really nice, also, if the attribute number macros
(Natts_pg_proc, Anum_pg_proc_proname, etc) could be autogenerated.
Manually renumbering those is one of the bigger pains in the rear
when adding catalog columns.  It was less of a pain than adjusting
the DATA lines of course, so I never figured it was worth doing
something about in isolation --- but with this infrastructure in
place, that's manual work we shouldn't have to do anymore.

Another thing that I'd sort of hoped might happen from this patchset
is to cure the problem of keeping some catalog headers safe for
client-side inclusion, because some clients want the OID value macros
and/or macros for column values (eg PROVOLATILE_IMMUTABLE), so they
currently have to #include those headers or else hard-code the values.
We've worked around that to date with ad-hoc solutions like splitting
function declarations out to pg_*_fn.h files, but I never liked that
much.  With the OID value macros being moved out to separate generated
file(s), there's now a possibility that we could fix this once and for all
by making client-side code include those file(s) not pg_type.h et al
themselves.  But we'd need a way to put the column-value macros into
those files too; maybe that's too messy to make it practical.

The .dat files need to have header comments that follow project
conventions, in particular they need to contain copyright statements.
Likewise for generated files.

I've got zero faith that the .h files will hold still long enough
for these patches to be reviewed and applied.  The ones that touch
significant amounts of data need to be explained as "run this script
on the current data", rather than presented as static diffs.

I'm not really thrilled by the single-purpose "magic" behaviors added
in 0007, such as computing prosrc from proname.  I think those will
add more confusion than they're worth.

In 0010, you relabel the types of some OID columns so that genbki.pl
will know which lookup to apply to them.  That's not such a problem for
the relabelings that are just macros and genbki.pl converts back to
type OID in the .bki file.  But you also did things like s/Oid/regtype/,
and that IS a problem because it will affect what client code sees in
those catalog columns.  We've discussed changing those columns to
regfoo types in the past, and decided not to, because of the likelihood
of breaking client queries.  I do not think this patch gets to change
that policy.  So the way to identify the lookup rule needs to be
independent of whether the column is declared as Oid or an Oid alias type.
Perhaps an explicit marker telling what transformation to make, like

    Oid            rngsubtype BKI_LOOKUP(pg_type);

would work for that.

I'm not really on board at all with 0012, which AFAICS moves the indexing
and toast-table information out of indexing.h and toasting.h for no good
reason whatever.  We'll have quite enough code thrash and pending-patch
breakage from this patch set; we don't need to take on rearrangements that
aren't buying anything.

            regards, tom lane


Re: WIP: a way forward on bootstrap data

От
Greg Stark
Дата:
I'm 1000% on board with replacing oid constants with symbolic names
that get substituted programmatically.

However I wonder why we're bothering inventing a new syntax that
doesn't actually do much more than present static tabular data. If
things like magic proname->prosrc behaviour are not valuable then
we're not getting much out of this perl-friendly syntax that a simpler
more standard format wouldn't get us.

So just as a straw man proposal.... What if we just replaced the data
file with a csv file that could be maintained in a spreadsheet. It
could easily be parsed by perl and we could even have perl scripts
that load the records into memory and modify them. You could even
imagine writing a postgres script that loaded the csv file into a
temporary table, did complex SQL updates or other DML, then wrote it
back out to a csv file.


Re: WIP: a way forward on bootstrap data

От
Tom Lane
Дата:
Greg Stark <stark@mit.edu> writes:
> I'm 1000% on board with replacing oid constants with symbolic names
> that get substituted programmatically.

Yeah, that's almost an independent feature --- we could do that without
any of this other stuff, if we wanted.

> However I wonder why we're bothering inventing a new syntax that
> doesn't actually do much more than present static tabular data. If
> things like magic proname->prosrc behaviour are not valuable then
> we're not getting much out of this perl-friendly syntax that a simpler
> more standard format wouldn't get us.

TBH, the thing that was really drawing my ire about that was that John was
inventing random special rules and documenting them *noplace* except for
the guts of some perl code.  If I have to read perl code to find out what
the catalog data means, I'm going to be bitching loudly.  That could be
done better --- one obvious idea is to add a comment to the relevant .h
file, next to the field whose value will be implicitly calculated.

> So just as a straw man proposal.... What if we just replaced the data
> file with a csv file that could be maintained in a spreadsheet.

Bleah --- that's no better than what we have today, just different.
And "maintained in a spreadsheet" doesn't sound attractive to me;
you'd almost certainly lose comments, for instance.

            regards, tom lane


Re: WIP: a way forward on bootstrap data

От
Tom Lane
Дата:
I wrote:
> Another thing that I'd sort of hoped might happen from this patchset
> is to cure the problem of keeping some catalog headers safe for
> client-side inclusion, because some clients want the OID value macros
> and/or macros for column values (eg PROVOLATILE_IMMUTABLE), so they
> currently have to #include those headers or else hard-code the values.
> We've worked around that to date with ad-hoc solutions like splitting
> function declarations out to pg_*_fn.h files, but I never liked that
> much.  With the OID value macros being moved out to separate generated
> file(s), there's now a possibility that we could fix this once and for all
> by making client-side code include those file(s) not pg_type.h et al
> themselves.  But we'd need a way to put the column-value macros into
> those files too; maybe that's too messy to make it practical.

I had a thought about how to do that.  It's clearly desirable that that
sort of material remain in the manually-maintained pg_*.h files, because
that's basically where you look to find out C-level details of what's
in a particular catalog.  However, that doesn't mean that that's where
the compiler has to find it.  Imagine that we write such sections of the
catalog .h files like

#ifdef EXPOSE_TO_CLIENT_CODE

/*
 * ... comment here ...
 */
#define PROVOLATILE_IMMUTABLE   'i' /* never changes for given input */
#define PROVOLATILE_STABLE      's' /* does not change within a scan */
#define PROVOLATILE_VOLATILE    'v' /* can change even within a scan */

#endif /* EXPOSE_TO_CLIENT_CODE */

Like CATALOG_VARLEN, the symbol EXPOSE_TO_CLIENT_CODE is never actually
defined to the compiler.  What it does is to instruct genbki.pl to copy
the material up to the matching #endif into the generated file for this
catalog.  So, for each catalog header pg_foo.h, there would be a
generated file, say pg_foo_d.h, containing:

* Natts_ and Anum_ macros for pg_foo

* Any EXPOSE_TO_CLIENT_CODE sections copied from pg_foo.h

* Any OID-value macros for entries in that catalog

pg_foo.h would contain a #include for pg_foo_d.h, so that backend-side
code would obtain all these values the same as it did before.  But the
new policy for client code would be to include pg_foo_d.h *not* pg_foo.h,
and so we are freed of any worry about whether pg_foo.h has to be clean
for clients to include.  We could re-merge the various pg_foo_fn.h files
back into the main files, if we wanted.

The contents of EXPOSE_TO_CLIENT_CODE sections wouldn't necessarily
have to be just macros --- they could be anything that's safe and
useful for client code.  But that's probably the main usage.

            regards, tom lane


Re: WIP: a way forward on bootstrap data

От
John Naylor
Дата:
On 1/14/18, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I'm not sure about the decision to move all the OID macros into
> one file; that seems like namespace pollution.
<snip>
> The design I'd kind of imagined was one generated file of #define's
> per pg_*.h file, not just one giant one.

First, thanks for the comments. I will start incorporating them in a
few days to give others the chance to offer theirs.

I'm inclined to agree about namespace pollution. One stumbling block
is the makefile changes to allow OID symbols to be visible to
non-backend code. Assuming I took the correct approach for the single
file case, adapting that to multiple files would require some
rethinking.

> It's especially
> odd that you did that but did not consolidate fmgroids.h with
> the macros for symbols from other catalogs.

Point taken.

> It would be really nice, also, if the attribute number macros
> (Natts_pg_proc, Anum_pg_proc_proname, etc) could be autogenerated.
> Manually renumbering those is one of the bigger pains in the rear
> when adding catalog columns.  It was less of a pain than adjusting
> the DATA lines of course, so I never figured it was worth doing
> something about in isolation --- but with this infrastructure in
> place, that's manual work we shouldn't have to do anymore.

Searching the archives for discussion of Anum_* constants [1], I
prefer your one-time suggestion to use enums instead. I'd do that, and
then have Catalog.pm throw an error if Natts_* doesn't match the
number of columns. That's outside the scope of this patch, however.

> Another thing that I'd sort of hoped might happen from this patchset
> is to cure the problem of keeping some catalog headers safe for
> client-side inclusion, because some clients want the OID value macros
> and/or macros for column values (eg PROVOLATILE_IMMUTABLE), so they
> currently have to #include those headers or else hard-code the values.
> We've worked around that to date with ad-hoc solutions like splitting
> function declarations out to pg_*_fn.h files, but I never liked that
> much.  With the OID value macros being moved out to separate generated
> file(s), there's now a possibility that we could fix this once and for all
> by making client-side code include those file(s) not pg_type.h et al
> themselves.  But we'd need a way to put the column-value macros into
> those files too; maybe that's too messy to make it practical.

To make sure I understand you correctly:
Currently we have

pg_type.h          oid symbols, column symbols
pg_type_fn.h     function decls

And assuming we go with one generated oid symbol file per header, my
patch would end up with something like

pg_type.h          column symbols (#includes pg_type_sym.h)
pg_type_fn.h     function decls
pg_type_sym.h oid symbols (generated)

And you're saying you'd prefer

pg_type.h          function decls (#includes pg_type_sym.h)
pg_type_sym.h oid symbols, column symbols (generated)

I agree that it'd be messy to drive the generation of the column
symbols. I'll think about it. What about

pg_type.h          function decls (#includes pg_type_sym.h)
pg_type_sym.h column symbols (static, #includes pg_type_oids.h)
pg_type_oids.h oid symbols (generated)

It's complicated, but arguably no more so than finding someplace more
distant to stick the column symbols and writing code to copy them.
It'd be about than 20 *_sym.h headers and 10 *_oids.h headers.

> The .dat files need to have header comments that follow project
> conventions, in particular they need to contain copyright statements.
> Likewise for generated files.

Okay.

> I've got zero faith that the .h files will hold still long enough
> for these patches to be reviewed and applied.  The ones that touch
> significant amounts of data need to be explained as "run this script
> on the current data", rather than presented as static diffs.

I've already rebased over a catalog change and it was not much fun, so
I'd be happy to do it this way.

> I'm not really thrilled by the single-purpose "magic" behaviors added
> in 0007, such as computing prosrc from proname.  I think those will
> add more confusion than they're worth.

Okay. I still think generating pg_type OID symbols is worth doing, but
I no longer think this is a good place to do it.

> In 0010, you relabel the types of some OID columns so that genbki.pl
> will know which lookup to apply to them.  That's not such a problem for
> the relabelings that are just macros and genbki.pl converts back to
> type OID in the .bki file.  But you also did things like s/Oid/regtype/,
> and that IS a problem because it will affect what client code sees in
> those catalog columns.  We've discussed changing those columns to
> regfoo types in the past, and decided not to, because of the likelihood
> of breaking client queries.  I do not think this patch gets to change
> that policy.  So the way to identify the lookup rule needs to be
> independent of whether the column is declared as Oid or an Oid alias type.
> Perhaps an explicit marker telling what transformation to make, like
>
>     Oid            rngsubtype BKI_LOOKUP(pg_type);
>
> would work for that.

Okay. I fail to see how client queries are affected, since I change
everything back to oid, but I think your design is cleaner anyway.

> I'm not really on board at all with 0012, which AFAICS moves the indexing
> and toast-table information out of indexing.h and toasting.h for no good
> reason whatever.  We'll have quite enough code thrash and pending-patch
> breakage from this patch set; we don't need to take on rearrangements that
> aren't buying anything.

I don't have a convincing rebuttal, so I'll withdraw it.
--

[1] https://www.postgresql.org/message-id/25254.1248533810%40sss.pgh.pa.us


-John Naylor


Re: WIP: a way forward on bootstrap data

От
Alvaro Herrera
Дата:
Tom Lane wrote:

> I had a thought about how to do that.  It's clearly desirable that that
> sort of material remain in the manually-maintained pg_*.h files, because
> that's basically where you look to find out C-level details of what's
> in a particular catalog.  However, that doesn't mean that that's where
> the compiler has to find it.
>
> [ elided explanation of pg_foo_d.h and pg_foo.h ]

Sounds good to me.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: WIP: a way forward on bootstrap data

От
John Naylor
Дата:
On 1/14/18, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>  So, for each catalog header pg_foo.h, there would be a
> generated file, say pg_foo_d.h, containing:
>
> * Natts_ and Anum_ macros for pg_foo
>
> * Any EXPOSE_TO_CLIENT_CODE sections copied from pg_foo.h
>
> * Any OID-value macros for entries in that catalog

I'm on board in principle, but I have some questions:

How do we have the makefiles gracefully handle 62 generated headers
which need to be visible outside the backend? Can I generalize the
approach I took for the single OIDs file I had, or is that not even
the right way to go? (In short, I used a new backend make target that
was invoked in src/common/Makefile - the details are in patch v6-0006)

If we move fmgr oid generation here as you suggested earlier, I
imagine we don't want to create a lot of #include churn. My idea is to
turn src/include/utils/fmgroids.h into a static file that just
#includes catalog/pg_proc_d.h. Thoughts?

And I'm curious, what is "_d" intended to convey?

(While I'm thinking outloud, I'm beginning to think that these headers
lie outside the scope of genbki.pl, and belong in a separate script.)

-John Naylor


Re: WIP: a way forward on bootstrap data

От
Tom Lane
Дата:
John Naylor <jcnaylor@gmail.com> writes:
> On 1/14/18, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> So, for each catalog header pg_foo.h, there would be a
>> generated file, say pg_foo_d.h, containing:
>> * Natts_ and Anum_ macros for pg_foo
>> * Any EXPOSE_TO_CLIENT_CODE sections copied from pg_foo.h
>> * Any OID-value macros for entries in that catalog

> I'm on board in principle, but I have some questions:

> How do we have the makefiles gracefully handle 62 generated headers
> which need to be visible outside the backend?

There are other people around here who are better make wizards than I, but
I'm sure this is soluble.  A substitution like $(CATALOG_HEADERS:_d.h=.h)
might get you started.  (It looks like CATALOG_HEADERS would have to be
separated out of POSTGRES_BKI_SRCS, but that's easy.)

> If we move fmgr oid generation here as you suggested earlier, I
> imagine we don't want to create a lot of #include churn. My idea is to
> turn src/include/utils/fmgroids.h into a static file that just
> #includes catalog/pg_proc_d.h. Thoughts?

Yeah ... or vice versa.  I don't know if touching the way fmgroids.h is
built is worthwhile.  Certainly, if we'd built all this to begin with
we'd have unified pg_proc.h's OID macro handling with the other catalogs,
but we didn't and that might not be worth changing.  I'm not strongly
convinced either way.

> And I'm curious, what is "_d" intended to convey?

I was thinking "#define" or "data".  You could make as good a case for
"_g" for "generated", or probably some other choices.  I don't have a
strong preference; but I didn't especially like your original suggestion
of "_sym", because that seemed like it would invite confusion with
possible actual names for catalogs.  A one-letter suffix seems less
likely to conflict with anything anybody would think was a good choice
of catalog name.

> (While I'm thinking outloud, I'm beginning to think that these headers
> lie outside the scope of genbki.pl, and belong in a separate script.)

Maybe, but the conditions for regenerating these files would be exactly
the same as for the .bki file, no?  So we might as well just have one
script do both, rather than writing duplicative rules in the Makefiles
and the MSVC scripts.

            regards, tom lane


Re: WIP: a way forward on bootstrap data

От
John Naylor
Дата:
On 1/14/18, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I've got zero faith that the .h files will hold still long enough
> for these patches to be reviewed and applied.  The ones that touch
> significant amounts of data need to be explained as "run this script
> on the current data", rather than presented as static diffs.

For version 7, I've attached a bash script along with the patches
(against master a044378ce2f) that does exactly this. To run, one would
save the patches somewhere and change the PATCH_DIR and REPO_DIR
variables to match.

-I've added MSVC changes, but they are untested.
-I've moved a cosmetic patch up in the series to reduce rebasing
effort. There are some additional comment and style changes.

Not done in this version:

-For the time being I've left out human-readable OIDs and data file
compaction strategies. This is just to reduce effort in rebasing. I'll
add some form of those back after the basics have had serious review.
-Change who is responsible for fmgroids.h. It's debatable whether that
would be a gain anyway.

The README might need to be fleshed out further, possibly with a
separate README for working with the new data format.

>  So, for each catalog header pg_foo.h, there would be a
> generated file, say pg_foo_d.h, containing:
>
> * Natts_ and Anum_ macros for pg_foo
>
> * Any EXPOSE_TO_CLIENT_CODE sections copied from pg_foo.h
>
> * Any OID-value macros for entries in that catalog

This is done (patch 0006). As I mentioned earlier, the sticking point
is the makefiles. I have a working build, but it's not up to project
standards. In particular, for the first attempt I've resorted to
discarding conventions for parallel make safety, so if anyone can
review and offer improvements, I'd be grateful.

> The .dat files need to have header comments that follow project
> conventions, in particular they need to contain copyright statements.
> Likewise for generated files.

Done.

I'll also go ahead and move this to next commitfest.

-John Naylor

Вложения

Re: WIP: a way forward on bootstrap data

От
John Naylor
Дата:

Re: WIP: a way forward on bootstrap data

От
Tom Lane
Дата:
John Naylor <jcnaylor@gmail.com> writes:
> Version 8, rebased against 76b6aa41f41d.

I took a preliminary look through this, without yet attempting to execute
the script against HEAD.  I have a few thoughts:

* I'm inclined not to commit the conversion scripts to the repo.  I doubt
there are third parties out there with a need for them, and if they do
need them they can get 'em out of this thread in the mailing list
archives.  (If anyone has a different opinion about that, speak up!)

* I notice you have a few "preliminary cleanup" changes here and there
in the series, such as fixing the inconsistent spelling of
Anum_pg_init_privs_initprivs.  These could be applied before we start
the main conversion process, and I'm inclined to do that since we could
get 'em out of the way early.  Ideally, the main conversion would only
touch the header files and related scripts/makefiles.

* I'm a little disturbed by the fact that 0002 has to "re-doublequote
values that are macros expanded by initdb.c".  I see that there are only
a small number of affected places, so maybe it's not really worth working
harder, but I worry that something might get missed.  Is there any way to
include this consideration in the automated conversion, or at least to
verify that we found all the places to quote?  Or, seeing that 0004 seems
to be introducing some quoting-related hacks to genbki.pl anyway, maybe
we could take care of the issue there?

* In 0003, I'd recommend leaving the re-indentation to happen in the next
perltidy run (assuming perltidy would fix that, which I think is true but
I might be wrong).  It's just creating more review work to do it here.
In any case, the patch summary line is pretty misleading since it's
*not* just reindenting, but also refactoring genbki.pl.  (BTW, if that
refactoring would work on the script as it is, maybe that's another
thing we could do early?  The more we can do before "flag day", the
better IMO.)

* In 0006, I'm not very pleased with the introduction of
"Makefile.headers".  I'd keep those macros where they are in
catalog/Makefile.  backend/Makefile doesn't need to know about that,
especially since it's doing an unconditional invocation of
catalog/Makefile anyway.  It could just do something like

submake-schemapg:
    $(MAKE) -C catalog generated-headers

and leave it to catalog/Makefile to know what needs to happen for
both schemapg.h and the other generated files.

Overall, though, this is looking pretty promising.

            regards, tom lane


Re: WIP: a way forward on bootstrap data

От
John Naylor
Дата:
Thanks for taking a look.

On 3/3/18, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> John Naylor <jcnaylor@gmail.com> writes:
>> Version 8, rebased against 76b6aa41f41d.
>
> I took a preliminary look through this, without yet attempting to execute
> the script against HEAD.  I have a few thoughts:
>
> * I'm inclined not to commit the conversion scripts to the repo.  I doubt
> there are third parties out there with a need for them, and if they do
> need them they can get 'em out of this thread in the mailing list
> archives.  (If anyone has a different opinion about that, speak up!)

If no one feels strongly otherwise, I'll just attach the conversion
script along with the other patches for next version. To be clear, the
rewrite script is intended be committed, both to enforce formatting
and as a springboard for bulk editing. Now, whether that belongs in
/src/include/catalog or /src/tools is debatable.

> * I'm a little disturbed by the fact that 0002 has to "re-doublequote
> values that are macros expanded by initdb.c".  I see that there are only
> a small number of affected places, so maybe it's not really worth working
> harder, but I worry that something might get missed.  Is there any way to
> include this consideration in the automated conversion, or at least to
> verify that we found all the places to quote?  Or, seeing that 0004 seems
> to be introducing some quoting-related hacks to genbki.pl anyway, maybe
> we could take care of the issue there?

The quoting hacks are really to keep the postgres.bki diff as small as
possible (attached). The simplest and most air-tight way to address
your concern would be to double-quote everything when writing the bki
file. That could be done last as a follow-up.

> * I notice you have a few "preliminary cleanup" changes here and there
> in the series, such as fixing the inconsistent spelling of
> Anum_pg_init_privs_initprivs.  These could be applied before we start
> the main conversion process, and I'm inclined to do that since we could
> get 'em out of the way early.  Ideally, the main conversion would only
> touch the header files and related scripts/makefiles.
...
> * In 0003, I'd recommend leaving the re-indentation to happen in the next
> perltidy run (assuming perltidy would fix that, which I think is true but
> I might be wrong).  It's just creating more review work to do it here.
> In any case, the patch summary line is pretty misleading since it's
> *not* just reindenting, but also refactoring genbki.pl.  (BTW, if that
> refactoring would work on the script as it is, maybe that's another
> thing we could do early?  The more we can do before "flag day", the
> better IMO.)

I tested perltidy 20090616 and it handles it fine. I'll submit a
preliminary patch soon to get some of those items out of the way.

> * In 0006, I'm not very pleased with the introduction of
> "Makefile.headers".  I'd keep those macros where they are in
> catalog/Makefile.  backend/Makefile doesn't need to know about that,
> especially since it's doing an unconditional invocation of
> catalog/Makefile anyway.  It could just do something like
>
> submake-schemapg:
>     $(MAKE) -C catalog generated-headers
>
> and leave it to catalog/Makefile to know what needs to happen for
> both schemapg.h and the other generated files.

I wasn't happy with it either, but I couldn't get it to build
otherwise. The sticking point was the symlinks in
$(builddir)/src/include/catalog.  $(MAKE) -C catalog doesn't handle
that. The makefile in /src/common relies on the backend makefile to
know what to invoke for a given header. IIRC, relpath.c includes
pg_tablespace.h, which now requires pg_tablespace_d.h to be built.

Perhaps /src/common/Makefile could invoke the catalog makefile
directly, and the pg_*_d.h headers could be written to
$(builddir)/src/include/catalog directly? I'll hack on it some more.

> Overall, though, this is looking pretty promising.
>
>             regards, tom lane
>

Glad to hear it.

-John Naylor

Вложения

Re: WIP: a way forward on bootstrap data

От
John Naylor
Дата:
I wrote:

> I'll submit a
> preliminary patch soon to get some of those items out of the way.

I've attached a patch that takes care of these cleanups so they don't
clutter the patch set.

-John Naylor

Вложения

Re: WIP: a way forward on bootstrap data

От
Tom Lane
Дата:
John Naylor <jcnaylor@gmail.com> writes:
> On 3/3/18, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> * I'm a little disturbed by the fact that 0002 has to "re-doublequote
>> values that are macros expanded by initdb.c".  I see that there are only
>> a small number of affected places, so maybe it's not really worth working
>> harder, but I worry that something might get missed.  Is there any way to
>> include this consideration in the automated conversion, or at least to
>> verify that we found all the places to quote?  Or, seeing that 0004 seems
>> to be introducing some quoting-related hacks to genbki.pl anyway, maybe
>> we could take care of the issue there?

> The quoting hacks are really to keep the postgres.bki diff as small as
> possible (attached). The simplest and most air-tight way to address
> your concern would be to double-quote everything when writing the bki
> file. That could be done last as a follow-up.

Oh, if you're cross-checking by diff'ing the produced .bki file, then
that's sufficient to address my concern here.  No need to do more.

            regards, tom lane


Re: WIP: a way forward on bootstrap data

От
Tom Lane
Дата:
John Naylor <jcnaylor@gmail.com> writes:
> I've attached a patch that takes care of these cleanups so they don't
> clutter the patch set.

Pushed.  I made a couple of cosmetic changes in genbki.pl.

            regards, tom lane


Re: WIP: a way forward on bootstrap data

От
Tom Lane
Дата:
John Naylor <jcnaylor@gmail.com> writes:
> On 3/3/18, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> * In 0006, I'm not very pleased with the introduction of
>> "Makefile.headers".

> I wasn't happy with it either, but I couldn't get it to build
> otherwise. The sticking point was the symlinks in
> $(builddir)/src/include/catalog.  $(MAKE) -C catalog doesn't handle
> that. The makefile in /src/common relies on the backend makefile to
> know what to invoke for a given header. IIRC, relpath.c includes
> pg_tablespace.h, which now requires pg_tablespace_d.h to be built.

I'm not following.  AFAICS, what you put in src/common/Makefile was just

+.PHONY: generated-headers
+
+generated-headers:
+    $(MAKE) -C ../backend generated-headers

which doesn't appear to care whether backend/Makefile knows anything
about specific generated headers or not.  I think all we need to do
is consider that the *_d.h files ought to be built as another consequence
of invoking the generated-headers target.

BTW, there's already a submake-generated-headers target in
Makefile.global, which you should use in preference to rolling your own.

            regards, tom lane


Re: WIP: a way forward on bootstrap data

От
John Naylor
Дата:
On 3/4/18, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> John Naylor <jcnaylor@gmail.com> writes:
>> On 3/3/18, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> * In 0006, I'm not very pleased with the introduction of
>>> "Makefile.headers".
>
>> I wasn't happy with it either, but I couldn't get it to build
>> otherwise. The sticking point was the symlinks in
>> $(builddir)/src/include/catalog.  $(MAKE) -C catalog doesn't handle
>> that. The makefile in /src/common relies on the backend makefile to
>> know what to invoke for a given header. IIRC, relpath.c includes
>> pg_tablespace.h, which now requires pg_tablespace_d.h to be built.
>
> I'm not following.  AFAICS, what you put in src/common/Makefile was just
>
> +.PHONY: generated-headers
> +
> +generated-headers:
> +    $(MAKE) -C ../backend generated-headers
>
> which doesn't appear to care whether backend/Makefile knows anything
> about specific generated headers or not.  I think all we need to do
> is consider that the *_d.h files ought to be built as another consequence
> of invoking the generated-headers target.
>
> BTW, there's already a submake-generated-headers target in
> Makefile.global, which you should use in preference to rolling your own.

I've attached version 9, whose biggest change is to address the above
points of review. I pushed all of the catalog header build logic into
catalog Makefile to avoid creating a separate symbol file. This
involved putting the distprep logic there as well. Enough of the
structure changed that one or two names didn't make sense anymore, so
I changed them.

As suggested, the conversion script is now part of the patchset and
not committed to the repo. To run the conversion, save everything to a
directory and update the dir vars at the top of
apply-bootstrap-data-patches.sh accordingly.

A couple things to note that I didn't do:
-With all the new generated headers, the message "Writing ..." is now
quite verbose. It might be worth changing that.
-I'm not sure if I need to change anything involving "make install".
-I haven't tested the MSVC changes.
-I didn't change any clients to actually use the new headers directly.
That might be too ambitious for this cycle anyway.

While this goes through review, I'll get a head start rebasing the
human readable OIDs and data compaction patches.

-John Naylor

Вложения

Re: WIP: a way forward on bootstrap data

От
John Naylor
Дата:
I wrote:

> I've attached version 9, whose biggest change is to address the above
> points of review. I pushed all of the catalog header build logic into
> catalog Makefile to avoid creating a separate symbol file. This
> involved putting the distprep logic there as well. Enough of the
> structure changed that one or two names didn't make sense anymore, so
> I changed them.
>
> As suggested, the conversion script is now part of the patchset and
> not committed to the repo. To run the conversion, save everything to a
> directory and update the dir vars at the top of
> apply-bootstrap-data-patches.sh accordingly.
>
> A couple things to note that I didn't do:
> -With all the new generated headers, the message "Writing ..." is now
> quite verbose. It might be worth changing that.
> -I'm not sure if I need to change anything involving "make install".
> -I haven't tested the MSVC changes.
> -I didn't change any clients to actually use the new headers directly.
> That might be too ambitious for this cycle anyway.
>
> While this goes through review, I'll get a head start rebasing the
> human readable OIDs and data compaction patches.

It didn't take that long to rebase the remaining parts of the
patchset, so despite what I said above I went ahead and put them in
version 10 (attached), this time via scripted bulk editing rather than
as large patches. Changes since the last patchset that contained these
parts:

-Split out the generation of pg_type OID symbols into its own patch.
-Remove single-purpose magic behaviors.
-Ditto for the ability to abbreviate attribute names. I decided the
added complexity and possible confusion wasn't worth the space
savings.
-Add some more OID macros for pg_aggregate and pg_range that I missed before.

Also, more generally, I cleaned up the apply-patches script and edited
its comments and commit messages.

Tom Lane wrote:
> In 0010, you relabel the types of some OID columns so that genbki.pl
> will know which lookup to apply to them.  That's not such a problem for
> the relabelings that are just macros and genbki.pl converts back to
> type OID in the .bki file.  But you also did things like s/Oid/regtype/,
> and that IS a problem because it will affect what client code sees in
> those catalog columns.  We've discussed changing those columns to
> regfoo types in the past, and decided not to, because of the likelihood
> of breaking client queries.  I do not think this patch gets to change
> that policy.  So the way to identify the lookup rule needs to be
> independent of whether the column is declared as Oid or an Oid alias type.
> Perhaps an explicit marker telling what transformation to make, like
>
>     Oid            rngsubtype BKI_LOOKUP(pg_type);
>
> would work for that.

This is also done (now in 0007).

-John Naylor

Вложения

Re: WIP: a way forward on bootstrap data

От
Tom Lane
Дата:
John Naylor <jcnaylor@gmail.com> writes:
> It didn't take that long to rebase the remaining parts of the
> patchset, so despite what I said above I went ahead and put them in
> version 10 (attached), this time via scripted bulk editing rather than
> as large patches.

Starting to look into this version now, but a small suggestion while
it's still fresh in mind: it might be easier, in future rounds, to
put all the files in a tarball and attach 'em as one big attachment.
At least with my mail setup, it's way easier to save off a tarball
and "tar xf" it than it is to individually save a dozen attachments.
I suspect that way might be easier on your end, too.

There's some value in posting a patchset as separate attachments
when it's possible to just apply the patches in series; Munro's patch
tester knows what to do with that, but not with a tarball AFAIK.
But in this case, there's little hope that the patch tester would
get it right anyhow.

            regards, tom lane


Re: WIP: a way forward on bootstrap data

От
John Naylor
Дата:
On 3/15/18, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> John Naylor <jcnaylor@gmail.com> writes:
>> It didn't take that long to rebase the remaining parts of the
>> patchset, so despite what I said above I went ahead and put them in
>> version 10 (attached), this time via scripted bulk editing rather than
>> as large patches.
>
> Starting to look into this version now, but a small suggestion while
> it's still fresh in mind: it might be easier, in future rounds, to
> put all the files in a tarball and attach 'em as one big attachment.

Sure thing. I've done so here for version 11, which is just a rebase
over the removal of pg_class.relhaspkey.

-John Naylor

Вложения

Re: WIP: a way forward on bootstrap data

От
Tom Lane
Дата:
John Naylor <jcnaylor@gmail.com> writes:
> [ v11-bootstrap-data-conversion.tar.gz ]

I've done a round of review work on this, focusing on the Makefile
infrastructure.  I found a bunch of problems with parallel builds and
VPATH builds, which are addressed in the attached incremental patch.

The parallel-build issues are a bit of a mess really: it's surprising
we've not had problems there earlier.  For instance, catalog/Makefile has

postgres.description: postgres.bki ;
postgres.shdescription: postgres.bki ;
schemapg.h: postgres.bki ;

However, genbki.pl doesn't make any particular guarantee that postgres.bki
will be written sooner than its other output files, which means that in
principle make might think it needs to rebuild these other files on every
subsequent check.  That was somewhat harmless given the empty update rule,
but it's not really the right thing.  Your patch extended this to make
all the generated headers dependent on postgres.bki, and those are
definitely written before postgres.bki, meaning make *will* think they
are out of date.  Worse, it'll also think the symlinks to them are out
of date.  So I was running into problems with different parallel make
sub-tasks removing and recreating the symlinks.  VPATH builds didn't
work well either, because out-of-date-ness ties into whether make will
accept a file in the source dir as a valid replacement target.

I resolved this mess by setting up a couple of stamp files, which is
a technique we also use elsewhere.  bki-stamp is a single file
representing the action of having run genbki.pl, and header-stamp
likewise represents the action of having made the symlinks to the
generated headers.  By depending on those rather than individual
files, we avoid questions of exactly what the timestamps on the
individual output files are.

In the attached, I've also done some more work on the README file
and cleaned up a few other little things.

I've not really looked at the MSVC build code at all.  Personally,
I'm willing to just commit this (when the time comes) and let the
buildfarm see if the MSVC code works ... but if anyone else wants
to check that part beforehand, please do.

I also have not spent much time yet looking at the end-product .h and .dat
files.  I did note a bit of distressing inconsistency in the formatting of
the catalog struct declarations, some of which predates this patch but it
seems like you've introduced more.  I think what we ought to standardize
on is a format similar to this in pg_opclass.h:

CATALOG(pg_opclass,2616)
{
    /* index access method opclass is for */
    Oid         opcmethod       BKI_LOOKUP(pg_am);

    /* name of this opclass */
    NameData    opcname;

    /* namespace of this opclass */
    Oid         opcnamespace    BKI_DEFAULT(PGNSP);

    /* opclass owner */
    Oid         opcowner        BKI_DEFAULT(PGUID);

The former convention used in some places, of field descriptions in
same-line comments, clearly won't work anymore if we're sticking
BKI_DEFAULT annotations there.  I also don't like the format used in, eg,
pg_aggregate.h of putting field descriptions in a separate comment block
before the struct proper.  Bitter experience has shown that there are a
lot of people on this project who won't update comments that are more than
about two lines away from the code they change; so the style in
pg_aggregate.h is just inviting maintenance oversights.

I've got mixed feelings about the whitespace lines between fields.  They
seem like they are mostly bulking up the code and we could do without 'em.
On the other hand, pgindent will insist on putting one before any
multi-line field comment, and so that would create inconsistent formatting
if we don't use 'em elsewhere.  Thoughts?

Speaking of pgindent, those prettily aligned BKI annotations are a waste
of effort, because when pgindent gets done with the code it will look
like

    regproc        aggfnoid;
    char        aggkind BKI_DEFAULT(n);
    int16        aggnumdirectargs BKI_DEFAULT(0);
    regproc        aggtransfn BKI_LOOKUP(pg_proc);
    regproc        aggfinalfn BKI_DEFAULT(-) BKI_LOOKUP(pg_proc);
    regproc        aggcombinefn BKI_DEFAULT(-) BKI_LOOKUP(pg_proc);
    regproc        aggserialfn BKI_DEFAULT(-) BKI_LOOKUP(pg_proc);
    regproc        aggdeserialfn BKI_DEFAULT(-) BKI_LOOKUP(pg_proc);
    regproc        aggmtransfn BKI_DEFAULT(-) BKI_LOOKUP(pg_proc);
    regproc        aggminvtransfn BKI_DEFAULT(-) BKI_LOOKUP(pg_proc);
    regproc        aggmfinalfn BKI_DEFAULT(-) BKI_LOOKUP(pg_proc);

I'm not sure if there's anything much to be done about this.  BKI_DEFAULT
isn't so bad, but additional annotations get unreadable fast.  Maybe
BKI_LOOKUP was a bad idea after all, and we should just invent more
Oid-equivalent typedef names.

The attached is just one incremental patch on top of your v11 series.
I couldn't think of an easy way to migrate the changes back into the
most relevant diffs of your series, so I didn't try.

            regards, tom lane

diff --git a/src/Makefile b/src/Makefile
index febbced..433d00b 100644
*** a/src/Makefile
--- b/src/Makefile
*************** SUBDIRS = \
*** 37,42 ****
--- 37,47 ----

  $(recurse)

+ # Update the commonly used headers before building the subdirectories;
+ # otherwise, in a parallel build, several different sub-jobs will try to
+ # remake them concurrently
+ $(SUBDIRS:%=all-%-recurse): | submake-generated-headers
+
  install: install-local

  install-local: installdirs-local
diff --git a/src/backend/Makefile b/src/backend/Makefile
index ef00b38..775f7a3 100644
*** a/src/backend/Makefile
--- b/src/backend/Makefile
*************** utils/probes.h: utils/probes.d
*** 150,156 ****

  # run this unconditionally to avoid needing to know its dependencies here:
  submake-catalog-headers:
!     $(MAKE) -C catalog distprep builddir-headers

  .PHONY: submake-catalog-headers

--- 150,156 ----

  # run this unconditionally to avoid needing to know its dependencies here:
  submake-catalog-headers:
!     $(MAKE) -C catalog distprep generated-header-symlinks

  .PHONY: submake-catalog-headers

*************** endif
*** 299,312 ****
  ##########################################################################

  clean:
!     rm -f $(LOCALOBJS) postgres$(X) $(POSTGRES_IMP) \
!         $(top_builddir)/src/include/parser/gram.h \
!         $(top_builddir)/src/include/catalog/pg_*_d.h \
!         $(top_builddir)/src/include/catalog/schemapg.h \
!         $(top_builddir)/src/include/storage/lwlocknames.h \
!         $(top_builddir)/src/include/utils/fmgroids.h \
!         $(top_builddir)/src/include/utils/fmgrprotos.h \
!         $(top_builddir)/src/include/utils/probes.h
  ifeq ($(PORTNAME), cygwin)
      rm -f postgres.dll libpostgres.a
  endif
--- 299,305 ----
  ##########################################################################

  clean:
!     rm -f $(LOCALOBJS) postgres$(X) $(POSTGRES_IMP)
  ifeq ($(PORTNAME), cygwin)
      rm -f postgres.dll libpostgres.a
  endif
*************** distclean: clean
*** 318,333 ****
      rm -f port/tas.s port/dynloader.c port/pg_sema.c port/pg_shmem.c

  maintainer-clean: distclean
      rm -f bootstrap/bootparse.c \
            bootstrap/bootscanner.c \
            parser/gram.c \
            parser/gram.h \
            parser/scan.c \
-           catalog/pg_*_d.h \
-           catalog/schemapg.h \
-           catalog/postgres.bki \
-           catalog/postgres.description \
-           catalog/postgres.shdescription \
            replication/repl_gram.c \
            replication/repl_scanner.c \
            replication/syncrep_gram.c \
--- 311,322 ----
      rm -f port/tas.s port/dynloader.c port/pg_sema.c port/pg_shmem.c

  maintainer-clean: distclean
+     $(MAKE) -C catalog $@
      rm -f bootstrap/bootparse.c \
            bootstrap/bootscanner.c \
            parser/gram.c \
            parser/gram.h \
            parser/scan.c \
            replication/repl_gram.c \
            replication/repl_scanner.c \
            replication/syncrep_gram.c \
diff --git a/src/backend/catalog/.gitignore b/src/backend/catalog/.gitignore
index 1044a1c..9abe91d 100644
*** a/src/backend/catalog/.gitignore
--- b/src/backend/catalog/.gitignore
***************
*** 3,5 ****
--- 3,6 ----
  /postgres.shdescription
  /schemapg.h
  /pg_*_d.h
+ /bki-stamp
diff --git a/src/backend/catalog/Catalog.pm b/src/backend/catalog/Catalog.pm
index 6fe5566..39dae86 100644
*** a/src/backend/catalog/Catalog.pm
--- b/src/backend/catalog/Catalog.pm
*************** sub ParseHeader
*** 201,207 ****
  #
  # The parameter $preserve_formatting needs to be set for callers that want
  # to work with non-data lines in the data files, such as comments and blank
! # lines. If a caller just wants consume the data, leave it unset.
  sub ParseData
  {
      my ($input_file, $schema, $preserve_formatting) = @_;
--- 201,207 ----
  #
  # The parameter $preserve_formatting needs to be set for callers that want
  # to work with non-data lines in the data files, such as comments and blank
! # lines. If a caller just wants to consume the data, leave it unset.
  sub ParseData
  {
      my ($input_file, $schema, $preserve_formatting) = @_;
*************** sub ParseData
*** 299,304 ****
--- 299,305 ----

  # Fill in default values of a record using the given schema. It's the
  # caller's responsibility to specify other values beforehand.
+ # If we fail to fill all fields, return a nonempty error message.
  sub AddDefaultValues
  {
      my ($row, $schema) = @_;
*************** sub FindDefinedSymbolFromData
*** 391,398 ****
          {
              return $row->{oid};
          }
-         die "no definition found for $symbol\n";
      }
  }

  1;
--- 392,399 ----
          {
              return $row->{oid};
          }
      }
+     die "no definition found for $symbol\n";
  }

  1;
diff --git a/src/backend/catalog/Makefile b/src/backend/catalog/Makefile
index 9b87f85..17213d4 100644
*** a/src/backend/catalog/Makefile
--- b/src/backend/catalog/Makefile
*************** CATALOG_HEADERS := \
*** 44,68 ****
      pg_default_acl.h pg_init_privs.h pg_seclabel.h pg_shseclabel.h \
      pg_collation.h pg_partitioned_table.h pg_range.h pg_transform.h \
      pg_sequence.h pg_publication.h pg_publication_rel.h pg_subscription.h \
!     pg_subscription_rel.h \

  GENERATED_HEADERS := $(CATALOG_HEADERS:%.h=%_d.h) schemapg.h

- distprep: $(BKIFILES) $(GENERATED_HEADERS)
-
- .PHONY: builddir-headers
-
- builddir-headers: $(addprefix $(top_builddir)/src/include/catalog/, $(GENERATED_HEADERS))
-
- all: distprep builddir-headers
-
  # In the list of headers used to assemble postgres.bki, indexing.h needs
  # be last, and toasting.h just before it.
! POSTGRES_BKI_SRCS := $(addprefix $(top_srcdir)/src/include/catalog/, $(CATALOG_HEADERS) toasting.h indexing.h)

  POSTGRES_BKI_DATA = $(addprefix $(top_srcdir)/src/include/catalog/,\
      pg_aggregate.dat pg_am.dat pg_amop.dat pg_amproc.dat pg_authid.dat \
!     pg_cast.dat pg_class.dat pg_collation.dat pg_database.dat pg_language.dat \
      pg_namespace.dat pg_opclass.dat pg_operator.dat pg_opfamily.dat \
      pg_pltemplate.dat pg_proc.dat pg_range.dat pg_tablespace.dat \
      pg_ts_config.dat pg_ts_config_map.dat pg_ts_dict.dat pg_ts_parser.dat \
--- 44,64 ----
      pg_default_acl.h pg_init_privs.h pg_seclabel.h pg_shseclabel.h \
      pg_collation.h pg_partitioned_table.h pg_range.h pg_transform.h \
      pg_sequence.h pg_publication.h pg_publication_rel.h pg_subscription.h \
!     pg_subscription_rel.h

  GENERATED_HEADERS := $(CATALOG_HEADERS:%.h=%_d.h) schemapg.h

  # In the list of headers used to assemble postgres.bki, indexing.h needs
  # be last, and toasting.h just before it.
! POSTGRES_BKI_SRCS := $(addprefix $(top_srcdir)/src/include/catalog/,\
!     $(CATALOG_HEADERS) toasting.h indexing.h \
!     )

+ # The .dat files we need can just be listed alphabetically.
  POSTGRES_BKI_DATA = $(addprefix $(top_srcdir)/src/include/catalog/,\
      pg_aggregate.dat pg_am.dat pg_amop.dat pg_amproc.dat pg_authid.dat \
!     pg_cast.dat pg_class.dat pg_collation.dat \
!     pg_database.dat pg_language.dat \
      pg_namespace.dat pg_opclass.dat pg_operator.dat pg_opfamily.dat \
      pg_pltemplate.dat pg_proc.dat pg_range.dat pg_tablespace.dat \
      pg_ts_config.dat pg_ts_config_map.dat pg_ts_dict.dat pg_ts_parser.dat \
*************** POSTGRES_BKI_DATA = $(addprefix $(top_sr
*** 72,104 ****
  # location of Catalog.pm
  catalogdir = $(top_srcdir)/src/backend/catalog

! # see explanation in ../parser/Makefile
! postgres.description: postgres.bki ;
! postgres.shdescription: postgres.bki ;
! $(GENERATED_HEADERS): postgres.bki ;

! # Technically, this should depend on Makefile.global, but then
! # postgres.bki would need to be rebuilt after every configure run,
! # even in distribution tarballs.  So this is cheating a bit, but it
! # will achieve the goal of updating the version number when it
! # changes.
! postgres.bki: genbki.pl Catalog.pm $(POSTGRES_BKI_SRCS) $(POSTGRES_BKI_DATA) $(top_srcdir)/configure
$(top_srcdir)/src/include/catalog/duplicate_oids
      cd $(top_srcdir)/src/include/catalog && $(PERL) ./duplicate_oids
      $(PERL) -I $(catalogdir) $< --set-version=$(MAJORVERSION) $(POSTGRES_BKI_SRCS)

! # see explanation in src/backend/Makefile
! $(top_builddir)/src/include/catalog/%_d.h: %_d.h
!     prereqdir=`cd '$(dir $<)' >/dev/null && pwd` && \
!       cd '$(dir $@)' && rm -f $(notdir $@) && \
!       $(LN_S) "$$prereqdir/$(notdir $<)" .
!
! $(top_builddir)/src/include/catalog/schemapg.h: schemapg.h
      prereqdir=`cd '$(dir $<)' >/dev/null && pwd` && \
!       cd '$(dir $@)' && rm -f $(notdir $@) && \
!       $(LN_S) "$$prereqdir/$(notdir $<)" .

  .PHONY: install-data
! install-data: $(BKIFILES) installdirs
      $(INSTALL_DATA) $(call vpathsearch,postgres.bki) '$(DESTDIR)$(datadir)/postgres.bki'
      $(INSTALL_DATA) $(call vpathsearch,postgres.description) '$(DESTDIR)$(datadir)/postgres.description'
      $(INSTALL_DATA) $(call vpathsearch,postgres.shdescription) '$(DESTDIR)$(datadir)/postgres.shdescription'
--- 68,105 ----
  # location of Catalog.pm
  catalogdir = $(top_srcdir)/src/backend/catalog

! all: distprep generated-header-symlinks

! distprep: bki-stamp
!
! .PHONY: generated-header-symlinks
!
! generated-header-symlinks: $(top_builddir)/src/include/catalog/header-stamp
!
! # Technically, this should depend on Makefile.global which supplies
! # $(MAJORVERSION); but then postgres.bki would need to be rebuilt after every
! # configure run, even in distribution tarballs.  So depending on configure.in
! # instead is cheating a bit, but it will achieve the goal of updating the
! # version number when it changes.
! bki-stamp: genbki.pl Catalog.pm $(POSTGRES_BKI_SRCS) $(POSTGRES_BKI_DATA) $(top_srcdir)/configure.in
$(top_srcdir)/src/include/catalog/duplicate_oids
      cd $(top_srcdir)/src/include/catalog && $(PERL) ./duplicate_oids
      $(PERL) -I $(catalogdir) $< --set-version=$(MAJORVERSION) $(POSTGRES_BKI_SRCS)
+     touch $@

! # The generated headers must all be symlinked into builddir/src/include/,
! # using absolute links for the reasons explained in src/backend/Makefile.
! # We use header-stamp to record that we've done this because the symlinks
! # themselves may appear older than bki-stamp.
! $(top_builddir)/src/include/catalog/header-stamp: bki-stamp
      prereqdir=`cd '$(dir $<)' >/dev/null && pwd` && \
!     cd '$(dir $@)' && for file in $(GENERATED_HEADERS); do \
!       rm -f $$file && $(LN_S) "$$prereqdir/$$file" . ; \
!     done
!     touch $@

+ # Note: installation of generated headers is handled elsewhere
  .PHONY: install-data
! install-data: bki-stamp installdirs
      $(INSTALL_DATA) $(call vpathsearch,postgres.bki) '$(DESTDIR)$(datadir)/postgres.bki'
      $(INSTALL_DATA) $(call vpathsearch,postgres.description) '$(DESTDIR)$(datadir)/postgres.description'
      $(INSTALL_DATA) $(call vpathsearch,postgres.shdescription) '$(DESTDIR)$(datadir)/postgres.shdescription'
*************** installdirs:
*** 113,121 ****
  uninstall-data:
      rm -f $(addprefix '$(DESTDIR)$(datadir)'/, $(BKIFILES) system_views.sql information_schema.sql sql_features.txt)

! # postgres.bki, postgres.description, postgres.shdescription, and the generated headers
! # are in the distribution tarball, so they are not cleaned here.
  clean:

  maintainer-clean: clean
!     rm -f $(BKIFILES)
--- 114,123 ----
  uninstall-data:
      rm -f $(addprefix '$(DESTDIR)$(datadir)'/, $(BKIFILES) system_views.sql information_schema.sql sql_features.txt)

! # postgres.bki, postgres.description, postgres.shdescription,
! # and the generated headers are in the distribution tarball,
! # so they are not cleaned here.
  clean:

  maintainer-clean: clean
!     rm -f bki-stamp $(BKIFILES) $(GENERATED_HEADERS)
diff --git a/src/backend/catalog/README b/src/backend/catalog/README
index 447fce6..4aa2adb 100644
*** a/src/backend/catalog/README
--- b/src/backend/catalog/README
*************** src/backend/catalog/README
*** 3,20 ****
  System Catalog
  ==============

! This directory contains .c files that manipulate the system catalogs;
! src/include/catalog contains the .h files that define the structure
! of the system catalogs.

! When the compile-time script genbki.pl executes, it parses the .h files
  and .dat files in order to generate the postgres.* files.  These are then
  used as input to initdb (which is just a wrapper around postgres
  running single-user in bootstrapping mode) in order to generate the
  initial (template) system catalog relation files.

! backend/utils/Gen_fmgrtab.pl uses the same mechanism to genarate .c and
! .h files used by the function manager.

  -----------------------------------------------------------------

--- 3,27 ----
  System Catalog
  ==============

! This directory contains .c files that manipulate the system catalogs.
! src/include/catalog contains the pg_XXX.h files that define the structure
! of the system catalogs, as well as pg_XXX.dat files that define the
! initial contents of the catalogs.

! When the compile-time script genbki.pl executes, it parses the pg_XXX.h files
  and .dat files in order to generate the postgres.* files.  These are then
  used as input to initdb (which is just a wrapper around postgres
  running single-user in bootstrapping mode) in order to generate the
  initial (template) system catalog relation files.

! genbki.pl also produces some generated header files that are used in
! compiling the C code.  These include a pg_XXX_d.h file corresponding
! to each pg_XXX.h catalog header file, which contains #define's extracted
! from the corresponding .dat file as well as some automatically-generated
! symbols.
!
! backend/utils/Gen_fmgrtab.pl uses the same catalog-data-reading code
! to generate .c and .h files used by the function manager.

  -----------------------------------------------------------------

*************** The data file format and bootstrap data
*** 22,33 ****

  - As far as the bootstrap code is concerned, it is very important
  that the insert statements in postgres.bki be properly formatted
! (e.g., no broken lines, proper use of white-space and _null_).  The
! scripts are line-oriented and break easily.  In addition, the only
! documentation on the proper format for them is the code in the
! bootstrap/ directory.  Fortunately, the source bootstrap data is much
! more tolerant with respect to formatting, but it still pays to be
! careful when adding new data.

  - The .dat files contain Perl data structure literals that are simply
  eval'd to produce in-memory data structures.  As such, the code reading
--- 29,39 ----

  - As far as the bootstrap code is concerned, it is very important
  that the insert statements in postgres.bki be properly formatted
! (e.g., no broken lines, proper use of white-space and _null_).
! In addition, the only documentation on the proper format for them
! is the code in the bootstrap/ directory.  Fortunately, the source
! bootstrap data is much more tolerant with respect to formatting,
! but it still pays to be careful when adding new data.

  - The .dat files contain Perl data structure literals that are simply
  eval'd to produce in-memory data structures.  As such, the code reading
*************** demonstrate the key features:
*** 43,54 ****

  # a comment
  { oid => '1', oid_symbol => 'TemplateDbOid', shdescr => 'default template',
!   datname => 'Berkely\'s DB', datcollate => '"LC_COLLATE"', datacl => '_null_' },

  ]

! -The layout is: open bracket, one or more sets of curly brackets containing
! comma-separated key-value pairs, close bracket.
  -All values are single-quoted.
  -Single quotes within values must be escaped.
  -If a value is a macro to be expanded by initdb.c, it must also have double-
--- 49,61 ----

  # a comment
  { oid => '1', oid_symbol => 'TemplateDbOid', shdescr => 'default template',
!   datname => 'Berkeley\'s DB', datcollate => '"LC_COLLATE"',
!   datacl => '_null_' },

  ]

! -The overall file layout is: open bracket, one or more sets of curly brackets
! containing comma-separated key-value pairs, close bracket.
  -All values are single-quoted.
  -Single quotes within values must be escaped.
  -If a value is a macro to be expanded by initdb.c, it must also have double-
*************** the catalog's data file, and use the #de
*** 91,100 ****
  the actual numeric value of any OID in C code is considered very bad form.
  Direct references to pg_proc OIDs are common enough that there's a special
  mechanism to create the necessary #define's automatically: see
! backend/utils/Gen_fmgrtab.pl.  We also have standard conventions for setting
! up #define's for the pg_class OIDs of system catalogs and indexes.  For all
! the other system catalogs, you have to manually create any #define's you
! need.

  - If you need to find a valid OID for a new predefined tuple, use the
  script src/include/catalog/unused_oids.  It generates inclusive ranges of
--- 98,109 ----
  the actual numeric value of any OID in C code is considered very bad form.
  Direct references to pg_proc OIDs are common enough that there's a special
  mechanism to create the necessary #define's automatically: see
! backend/utils/Gen_fmgrtab.pl.  Similarly (but, for historical reasons, not
! done the same way), there's an automatic method for creating #define's
! for pg_type OIDs.  We also have standard conventions for setting up #define's
! for the pg_class OIDs of system catalogs and indexes.  For all the other
! system catalogs, you have to manually specify any #define's you need, via
! oid_symbol entries.

  - If you need to find a valid OID for a new predefined tuple, use the
  script src/include/catalog/unused_oids.  It generates inclusive ranges of
*************** script src/include/catalog/unused_oids.
*** 102,111 ****
  not been allocated yet).  Currently, OIDs 1-9999 are reserved for manual
  assignment; the unused_oids script simply looks through the include/catalog
  headers and .dat files to see which ones do not appear.
! (As of Postgres 8.1, it also looks at CATALOG and DECLARE_INDEX lines.)
! You can use the duplicate_oids script to check for mistakes.  This script
! is also run at compile time, and will stop the build if a duplicate is
! found.

  - The OID counter starts at 10000 at bootstrap.  If a catalog row is in a
  table that requires OIDs, but no OID was preassigned by an "OID =" clause,
--- 111,119 ----
  not been allocated yet).  Currently, OIDs 1-9999 are reserved for manual
  assignment; the unused_oids script simply looks through the include/catalog
  headers and .dat files to see which ones do not appear.
! You can also use the duplicate_oids script to check for mistakes.
! (This script is run automatically at compile time, and will stop the build
! if a duplicate is found.)

  - The OID counter starts at 10000 at bootstrap.  If a catalog row is in a
  table that requires OIDs, but no OID was preassigned by an "OID =" clause,
*************** the tables are in place, and toasting.h
*** 134,144 ****
  (or at least after all the tables that need toast tables).  There are
  reputedly some other order dependencies in the .bki list, too.

! -Client code should not include the catalog headers directly.  Instead, it
! should include the corresponding generated pg_*_d.h header.  If you want
! macros or other code in the catalog headers to be visible to clients, use
! the undefined macro EXPOSE_TO_CLIENT_CODE to instruct genbki.pl to copy
! that section to the pg_*_d.h header.

  -----------------------------------------------------------------

--- 142,155 ----
  (or at least after all the tables that need toast tables).  There are
  reputedly some other order dependencies in the .bki list, too.

! - Frontend code should not include any pg_XXX.h header directly, as these
! files may contain C code that won't compile outside the backend.  Instead,
! frontend code may include the corresponding generated pg_*_d.h header, which
! will contain OID #defines and any other data that might be of use on the
! client side.  If you want macros or other code in a catalog header to be
! visible to frontend code, write "#ifdef EXPOSE_TO_CLIENT_CODE" ... "#endif"
! around that section to instruct genbki.pl to copy that section to the
! pg_*_d.h header.

  -----------------------------------------------------------------

*************** piece of code will likely perform "typet
*** 155,163 ****
  random errors or even segmentation violations.  Hence, do NOT insert
  catalog tuples that contain NULL attributes except in their
  variable-length portions!  (The bootstrapping code is fairly good about
! marking NOT NULL each of the columns that can legally be referenced via
! C struct declarations ... but those markings won't be enforced against
! insert commands, so you must get it right in the data files.)

  - Modification of the catalogs must be performed with the proper
  updating of catalog indexes!  That is, most catalogs have indexes
--- 166,176 ----
  random errors or even segmentation violations.  Hence, do NOT insert
  catalog tuples that contain NULL attributes except in their
  variable-length portions!  (The bootstrapping code is fairly good about
! automatically marking NOT NULL each of the columns that can legally be
! referenced via C struct declarations, and it can be helped along with
! BKI_FORCE_NOT_NULL and BKI_FORCE_NULL annotations where needed.  But
! attnotnull constraints are only enforced in the executor, not against
! tuples that are generated by random C code.)

  - Modification of the catalogs must be performed with the proper
  updating of catalog indexes!  That is, most catalogs have indexes
*************** method calls to insert new or modified t
*** 167,170 ****
  also make the calls to insert the tuple into ALL of its indexes!  If
  not, the new tuple will generally be "invisible" to the system because
  most of the accesses to the catalogs in question will be through the
! associated indexes.
--- 180,185 ----
  also make the calls to insert the tuple into ALL of its indexes!  If
  not, the new tuple will generally be "invisible" to the system because
  most of the accesses to the catalogs in question will be through the
! associated indexes.  Current practice is to always use CatalogTupleInsert,
! CatalogTupleUpdate, CatalogTupleDelete, or one of their sibling functions
! when updating the system catalogs, so that this is handled automatically.
diff --git a/src/common/Makefile b/src/common/Makefile
index f28c136..80e78d7 100644
*** a/src/common/Makefile
--- b/src/common/Makefile
*************** libpgcommon_srv.a: $(OBJS_SRV)
*** 88,95 ****
  %_srv.o: %.c %.o
      $(CC) $(CFLAGS) $(subst -DFRONTEND ,, $(CPPFLAGS)) -c $< -o $@

- $(OBJS_COMMON): | submake-generated-headers
-
  $(OBJS_SRV): | submake-errcodes

  .PHONY: submake-errcodes
--- 88,93 ----
diff --git a/src/include/Makefile b/src/include/Makefile
index 7fe4d45..59e18c7 100644
*** a/src/include/Makefile
--- b/src/include/Makefile
*************** install: all installdirs
*** 54,60 ****
        chmod $(INSTALL_DATA_MODE) '$(DESTDIR)$(includedir_server)'/$$dir/*.h  || exit; \
      done
  ifeq ($(vpath_build),yes)
!     for file in dynloader.h catalog/schemapg.h parser/gram.h storage/lwlocknames.h utils/probes.h; do \
        cp $$file '$(DESTDIR)$(includedir_server)'/$$file || exit; \
        chmod $(INSTALL_DATA_MODE) '$(DESTDIR)$(includedir_server)'/$$file || exit; \
      done
--- 54,60 ----
        chmod $(INSTALL_DATA_MODE) '$(DESTDIR)$(includedir_server)'/$$dir/*.h  || exit; \
      done
  ifeq ($(vpath_build),yes)
!     for file in dynloader.h catalog/schemapg.h catalog/pg_*_d.h parser/gram.h storage/lwlocknames.h utils/probes.h;
do\ 
        cp $$file '$(DESTDIR)$(includedir_server)'/$$file || exit; \
        chmod $(INSTALL_DATA_MODE) '$(DESTDIR)$(includedir_server)'/$$file || exit; \
      done
*************** uninstall:
*** 73,79 ****


  clean:
!     rm -f utils/fmgroids.h utils/fmgrprotos.h utils/errcodes.h parser/gram.h utils/probes.h catalog/schemapg.h
catalog/pg_*_d.h

  distclean maintainer-clean: clean
      rm -f pg_config.h pg_config_ext.h pg_config_os.h dynloader.h stamp-h stamp-ext-h
--- 73,81 ----


  clean:
!     rm -f utils/fmgroids.h utils/fmgrprotos.h utils/errcodes.h
!     rm -f parser/gram.h storage/lwlocknames.h utils/probes.h
!     rm -f catalog/schemapg.h catalog/pg_*_d.h catalog/header-stamp

  distclean maintainer-clean: clean
      rm -f pg_config.h pg_config_ext.h pg_config_os.h dynloader.h stamp-h stamp-ext-h
diff --git a/src/include/catalog/.gitignore b/src/include/catalog/.gitignore
index dfd5616..6c8da54 100644
*** a/src/include/catalog/.gitignore
--- b/src/include/catalog/.gitignore
***************
*** 1,2 ****
--- 1,3 ----
  /schemapg.h
  /pg_*_d.h
+ /header-stamp

Re: WIP: a way forward on bootstrap data

От
John Naylor
Дата:
On 3/21/18, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> John Naylor <jcnaylor@gmail.com> writes:
>> [ v11-bootstrap-data-conversion.tar.gz ]
>
> I've done a round of review work on this, focusing on the Makefile
> infrastructure.  I found a bunch of problems with parallel builds and
> VPATH builds, which are addressed in the attached incremental patch.
>
[explanation of Make issues and stamp files]

> In the attached, I've also done some more work on the README file
> and cleaned up a few other little things.

Thanks for pulling my attempt at makefile hackery across the finish
line. It sounds like there are no more obvious structural issues
remaining (fingers crossed). Your other improvements make sense.

>  I did note a bit of distressing inconsistency in the formatting of
> the catalog struct declarations, some of which predates this patch but it
> seems like you've introduced more.  I think what we ought to standardize
> on is a format similar to this in pg_opclass.h:
>
> CATALOG(pg_opclass,2616)
> {
>     /* index access method opclass is for */
>     Oid         opcmethod       BKI_LOOKUP(pg_am);
>
[snip]

That is the most sensible format. Did you mean all 62 catalog headers
for future-proofing, or just the ones with annotations now?

> The former convention used in some places, of field descriptions in
> same-line comments, clearly won't work anymore if we're sticking
> BKI_DEFAULT annotations there.

Yeah.

> I also don't like the format used in, eg,
> pg_aggregate.h of putting field descriptions in a separate comment block
> before the struct proper.  Bitter experience has shown that there are a
> lot of people on this project who won't update comments that are more than
> about two lines away from the code they change; so the style in
> pg_aggregate.h is just inviting maintenance oversights.

Okay.

> I've got mixed feelings about the whitespace lines between fields.  They
> seem like they are mostly bulking up the code and we could do without 'em.
> On the other hand, pgindent will insist on putting one before any
> multi-line field comment, and so that would create inconsistent formatting
> if we don't use 'em elsewhere.  Thoughts?

I'll do it both ways for one header and post the results for people to look at.

> Speaking of pgindent, those prettily aligned BKI annotations are a waste
> of effort, because when pgindent gets done with the code it will look
> like
>
[snip]
>     regproc        aggmtransfn BKI_DEFAULT(-) BKI_LOOKUP(pg_proc);
>     regproc        aggminvtransfn BKI_DEFAULT(-) BKI_LOOKUP(pg_proc);
>     regproc        aggmfinalfn BKI_DEFAULT(-) BKI_LOOKUP(pg_proc);

> I'm not sure if there's anything much to be done about this.  BKI_DEFAULT
> isn't so bad, but additional annotations get unreadable fast.  Maybe
> BKI_LOOKUP was a bad idea after all, and we should just invent more
> Oid-equivalent typedef names.

Well, until version 7, I used "fake" type aliases that only genbki.pl
could see. The C compiler and postgres.bki could only see the actual
Oid/oid type. Perhaps it was a mistake to model their appearance after
regproc (regtype, etc), because that was misleading. Maybe something
more obviously transient like 'lookup_typeoid? I'm leaning towards
this idea.

Another possibility is to teach the pgindent pre_/post_indent
functions to preserve annotation formatting, but I'd rather not add
yet another regex to that script. Plus, over the next 10+ years, I
could see people adding several more BKI_* macros, leading to
readability issues regardless of formatting, so maybe we should nip
this one in the bud.

> The attached is just one incremental patch on top of your v11 series.
> I couldn't think of an easy way to migrate the changes back into the
> most relevant diffs of your series, so I didn't try.

I've done that quite a few times while developing this patch series,
so I'm used to it. I'll incorporate your changes soon and also rebase
over the new pg_class column that landed recently. I'll have a new
version by this weekend, assuming we conclude the formatting
discussion, so if you or others have any more comments by then, I'll
include them.

-John Naylor


Re: WIP: a way forward on bootstrap data

От
Tom Lane
Дата:
John Naylor <jcnaylor@gmail.com> writes:
> On 3/21/18, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>     regproc     aggmtransfn BKI_DEFAULT(-) BKI_LOOKUP(pg_proc);
>>     regproc     aggminvtransfn BKI_DEFAULT(-) BKI_LOOKUP(pg_proc);
>>     regproc     aggmfinalfn BKI_DEFAULT(-) BKI_LOOKUP(pg_proc);

>> I'm not sure if there's anything much to be done about this.  BKI_DEFAULT
>> isn't so bad, but additional annotations get unreadable fast.  Maybe
>> BKI_LOOKUP was a bad idea after all, and we should just invent more
>> Oid-equivalent typedef names.

> Well, until version 7, I used "fake" type aliases that only genbki.pl
> could see. The C compiler and postgres.bki could only see the actual
> Oid/oid type. Perhaps it was a mistake to model their appearance after
> regproc (regtype, etc), because that was misleading. Maybe something
> more obviously transient like 'lookup_typeoid? I'm leaning towards
> this idea.

Looking at this again, I think a big chunk of the readability problem here
is just from the fact that we have long, similar-looking lines tightly
packed.  If it were reformatted to have comment lines and whitespace
between, it might not look nearly as bad.

> Another possibility is to teach the pgindent pre_/post_indent
> functions to preserve annotation formatting, but I'd rather not add
> yet another regex to that script. Plus, over the next 10+ years, I
> could see people adding several more BKI_* macros, leading to
> readability issues regardless of formatting, so maybe we should nip
> this one in the bud.

Well, whether or not we invent BKI_LOOKUP, the need for other kinds
of annotations isn't likely to be lessened.

I wondered whether we could somehow convert the format into multiple
lines, say

    regproc     aggmfinalfn
        BKI_DEFAULT(-)
        BKI_LOOKUP(pg_proc);

but some quick experimentation was discouraging: either Emacs' C
syntax mode, or pgindent, or both would make a hash of it.  It
wasn't great from the vertical-space-consumption standpoint either.

Anyway, for the moment I'd stick with BKI_LOOKUP rather than undoing
that work.  I think it's a more transparent way of saying what we
want than the magic-OID-typedefs approach.  The formatting issue is
just a mild annoyance, and it's not really BKI_LOOKUP's fault anyway.

While I'm thinking of it --- I noticed one or two places where you
had "BKI_DEFAULT(\0)".  That coding scares me a bit --- gcc seems to
tolerate it, but other C compilers might feel that \0 is not a valid
preprocessing token, or it might confuse some editors' syntax highlight
rules.  I'd rather write cases like this as "BKI_DEFAULT('\0')".  IOW,
the argument should be a valid C identifier, number, or string literal.

            regards, tom lane


Re: WIP: a way forward on bootstrap data

От
John Naylor
Дата:
I wrote:

> On 3/21/18, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I've got mixed feelings about the whitespace lines between fields.  They
>> seem like they are mostly bulking up the code and we could do without
>> 'em.
>> On the other hand, pgindent will insist on putting one before any
>> multi-line field comment, and so that would create inconsistent
>> formatting
>> if we don't use 'em elsewhere.  Thoughts?
>
> I'll do it both ways for one header and post the results for people to look
> at.

I've attached an earlier version of pg_proc.h with both formats as I
understand them. I turned a couple comments into multi-line comments
to demonstrate. I think without spaces it's just as hard to read as
with multiple annotations. I'd vote for spaces, but then again I'm not
the one who has to read these things very often.

-John Naylor

Вложения

Re: WIP: a way forward on bootstrap data

От
John Naylor
Дата:
On 3/22/18, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Looking at this again, I think a big chunk of the readability problem here
> is just from the fact that we have long, similar-looking lines tightly
> packed.  If it were reformatted to have comment lines and whitespace
> between, it might not look nearly as bad.
>
...
> Anyway, for the moment I'd stick with BKI_LOOKUP rather than undoing
> that work.  I think it's a more transparent way of saying what we
> want than the magic-OID-typedefs approach.  The formatting issue is
> just a mild annoyance, and it's not really BKI_LOOKUP's fault anyway.

Okay, I'll do it with comments and whitespace.

> While I'm thinking of it --- I noticed one or two places where you
> had "BKI_DEFAULT(\0)".  That coding scares me a bit --- gcc seems to
> tolerate it, but other C compilers might feel that \0 is not a valid
> preprocessing token, or it might confuse some editors' syntax highlight
> rules.  I'd rather write cases like this as "BKI_DEFAULT('\0')".  IOW,
> the argument should be a valid C identifier, number, or string literal.

Hmm, I only see this octal in pg_type.h

char        typdelim BKI_DEFAULT(\054);

Which I hope is fine. Were you thinking of this comment in
pg_attribute.h? We use the double-quoted empty string for postgres.bki
and change it to  '\0' for schemapg.h.

/* One of the ATTRIBUTE_IDENTITY_* constants below, or '\0' */
char        attidentity BKI_DEFAULT("");


-John Naylor


Re: WIP: a way forward on bootstrap data

От
Alvaro Herrera
Дата:
John Naylor wrote:

> I've attached an earlier version of pg_proc.h with both formats as I
> understand them. I turned a couple comments into multi-line comments
> to demonstrate. I think without spaces it's just as hard to read as
> with multiple annotations. I'd vote for spaces, but then again I'm not
> the one who has to read these things very often.

how about letting the line go long, with the comment at the right of
each definition, with one blank line between struct members, as in the
sample below?  You normally don't care that these lines are too long
since you seldom edit them -- one mostly adds or remove entire lines
instead, so there's not as much need for side-by-side diffs as with
regular code.  (One issue with this proposal is how to convince pgindent
to leave the long lines alone.)

To me, an important property of these structs is fitting as much as
possible (vertically) in a screenful; the other proposed modes end up
with too many lines.

CATALOG(pg_proc,1255) BKI_BOOTSTRAP BKI_ROWTYPE_OID(81) BKI_SCHEMA_MACRO
{
    NameData    proname;        /* procedure name */

    Oid         pronamespace BKI_DEFAULT(PGNSP); /* OID of namespace containing this proc */

    Oid         proowner BKI_DEFAULT(PGUID); /* procedure owner */

    Oid         prolang BKI_DEFAULT(12); /* OID of pg_language entry */

    float4      procost BKI_DEFAULT(1); /* estimated execution cost */

    float4      prorows BKI_DEFAULT(0); /* estimated # of rows out (if proretset) */

    Oid         provariadic BKI_DEFAULT(0); /* element type of variadic array, or 0 */

    regproc     protransform BKI_DEFAULT(0); /* transforms calls to it during planning */

    bool        proisagg BKI_DEFAULT(f); /* is it an aggregate? */

    bool        proiswindow BKI_DEFAULT(f); /* is it a window function? */

    bool        prosecdef BKI_DEFAULT(f); /* security definer */

    bool        proleakproof BKI_DEFAULT(f); /* is it a leak-proof function? */

    bool        proisstrict BKI_DEFAULT(f); /* strict with respect to NULLs? */

    bool        proretset BKI_DEFAULT(f); /* returns a set? */

    char        provolatile BKI_DEFAULT(v); /* see PROVOLATILE_ categories below */

    char        proparallel BKI_DEFAULT(u); /* see PROPARALLEL_ categories below */

    int16       pronargs; /* number of arguments */

    int16       pronargdefaults BKI_DEFAULT(0); /* number of arguments with defaults */

    Oid         prorettype; /* OID of result type */

    /*
     * variable-length fields start here, but we allow direct access to
     * proargtypes
     */

    oidvector    proargtypes; /* parameter types (excludes OUT params) */

#ifdef CATALOG_VARLEN

    Oid         proallargtypes[1] BKI_DEFAULT(_null_); /* all param types (NULL if IN only) */

    char        proargmodes[1] BKI_DEFAULT(_null_); /* parameter modes (NULL if IN only) */

    text        proargnames[1] BKI_DEFAULT(_null_); /* parameter names (NULL if no names) */

    pg_node_tree proargdefaults BKI_DEFAULT(_null_); /* list of expression trees for argument defaults (NULL if none)
*/

    Oid         protrftypes[1] BKI_DEFAULT(_null_); /* types for which to apply transforms */

    text        prosrc BKI_FORCE_NOT_NULL; /* procedure source text */

    text        probin BKI_DEFAULT(_null_); /* secondary procedure info (can be NULL) */

    text        proconfig[1] BKI_DEFAULT(_null_); /* procedure-local GUC settings */

    aclitem     proacl[1] BKI_DEFAULT(_null_); /* access permissions */
#endif
} FormData_pg_proc;

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: WIP: a way forward on bootstrap data

От
Tom Lane
Дата:
John Naylor <jcnaylor@gmail.com> writes:
> On 3/22/18, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> While I'm thinking of it --- I noticed one or two places where you
>> had "BKI_DEFAULT(\0)".

> Hmm, I only see this octal in pg_type.h
> char        typdelim BKI_DEFAULT(\054);

Sorry, I was going by memory rather than looking at the code.

> Which I hope is fine.

I don't really think it's legal C; I'd rather write BKI_DEFAULT('\054').

> Were you thinking of this comment in
> pg_attribute.h? We use the double-quoted empty string for postgres.bki
> and change it to  '\0' for schemapg.h.

> /* One of the ATTRIBUTE_IDENTITY_* constants below, or '\0' */
> char        attidentity BKI_DEFAULT("");

That definitely seems like a hack --- why not BKI_DEFAULT('\0') ?

            regards, tom lane


Re: WIP: a way forward on bootstrap data

От
Tom Lane
Дата:
John Naylor <jcnaylor@gmail.com> writes:
>> On 3/21/18, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> I've got mixed feelings about the whitespace lines between fields.

> I've attached an earlier version of pg_proc.h with both formats as I
> understand them. I turned a couple comments into multi-line comments
> to demonstrate. I think without spaces it's just as hard to read as
> with multiple annotations. I'd vote for spaces, but then again I'm not
> the one who has to read these things very often.

Yeah, after looking at this example I agree --- it's too tight without
the blank lines.

            regards, tom lane


Re: WIP: a way forward on bootstrap data

От
Tom Lane
Дата:
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> how about letting the line go long, with the comment at the right of
> each definition, with one blank line between struct members, as in the
> sample below?

>     NameData    proname;        /* procedure name */

>     Oid         pronamespace BKI_DEFAULT(PGNSP); /* OID of namespace containing this proc */

>     Oid         proowner BKI_DEFAULT(PGUID); /* procedure owner */

I don't think this is going to work: pgindent is going to wrap most of
these comments, ending up with something that's ugly *and* consumes
just as much vertical space as if we'd given the comments their own
lines.  The problem is that in the headers where we were using
same-line comments, the comments were written to fit in the space
available without this extra annotation.  (For my money, having spent
lots of time shaving a character or two off such comments to make 'em
fit, I'd much prefer the luxury of having a whole line to write in.)

We could go with some scheme that preserves the old formatting of the
struct definition proper and puts the added info somewhere else, ie

    Oid         pronamespace;  /* OID of namespace containing this proc */

    Oid         prolang;       /* OID of pg_language entry */

then after the struct:

BKI_DEFAULT(pronamespace, PGNSP);
BKI_DEFAULT(prolang, 12);

but on the whole I don't think that's an improvement.  I'd rather keep
the info about a field together.

            regards, tom lane


Re: WIP: a way forward on bootstrap data

От
John Naylor
Дата:
On 3/22/18, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> how about letting the line go long, with the comment at the right of
> each definition, with one blank line between struct members, as in the
> sample below?  You normally don't care that these lines are too long
> since you seldom edit them -- one mostly adds or remove entire lines
> instead, so there's not as much need for side-by-side diffs as with
> regular code.  (One issue with this proposal is how to convince pgindent
> to leave the long lines alone.)

Yeah, it seems when perltidy or pgindent mangle things badly, it's to
try and shoehorn a long line into a smaller number of characters.  If
memory serves, I've come across things like this:

pg_node_tree proargdefaults BKI_DEFAULT(_null_); /* list of expression

                  trees for argument

                  defaults (NULL if none) */

And thought "only a machine could be so precisely awkward"

-John Naylor


Re: WIP: a way forward on bootstrap data

От
John Naylor
Дата:
On 3/22/18, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I don't really think it's legal C; I'd rather write BKI_DEFAULT('\054').

Okay, I'll do that.

>> Were you thinking of this comment in
>> pg_attribute.h? We use the double-quoted empty string for postgres.bki
>> and change it to  '\0' for schemapg.h.
>
>> /* One of the ATTRIBUTE_IDENTITY_* constants below, or '\0' */
>> char        attidentity BKI_DEFAULT("");
>
> That definitely seems like a hack --- why not BKI_DEFAULT('\0') ?

Hmm, yes, the way I had it, the comment is a mystery. I'll switch it around.

-John Naylor


Re: WIP: a way forward on bootstrap data

От
John Naylor
Дата:
On 3/21/18, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> The attached is just one incremental patch on top of your v11 series.
> I couldn't think of an easy way to migrate the changes back into the
> most relevant diffs of your series, so I didn't try.

I've applied your changes to the v12 patch series (attached), but I
hope you'll allow two nit-picky adjustments:

-s/pg_XXX.h/pg_xxx.h/ in the README. There seems to be greater
precedent for the lower-case spelling if the rest of the word is lower
case.
-I shortened the data example in the README so it would comfortably
fit on two lines. Spreading it out over three lines doesn't match
what's in the data files. It's valid syntax, but real data is
formatted to at most two lines (See rewrite_dat.pl. Hmm, maybe I
should make that more explicit elsewhere in the README)

> I also have not spent much time yet looking at the end-product .h and .dat
> files.  I did note a bit of distressing inconsistency in the formatting of
> the catalog struct declarations, some of which predates this patch but it
> seems like you've introduced more.  I think what we ought to standardize
> on is a format similar to this in pg_opclass.h:
>
> CATALOG(pg_opclass,2616)
> {
>     /* index access method opclass is for */
>     Oid         opcmethod       BKI_LOOKUP(pg_am);
>

Done, with blank lines interspersed. I put most changes of this sort
in with the other cleanups in patch 0004. I neglected to do this
separately for couple of tiny tables that have lookups, but no default
values. I don't think it impacts the readability of patch 0007 much.

On 3/22/18, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I don't really think it's legal C; I'd rather write BKI_DEFAULT('\054').
[snip]
>> /* One of the ATTRIBUTE_IDENTITY_* constants below, or '\0' */
>> char         attidentity BKI_DEFAULT("");
>
> That definitely seems like a hack --- why not BKI_DEFAULT('\0') ?

Done (patch 0006).


Other changes:
-A README note about OID macros (patch 0007).
-A couple minor cosmetic rearrangements and comment/commit message edits.

Open items:
-Test MSVC.
-Arrange for rewrite_dat.pl to run when perltidy does.
-I was a bit cavalier about when to use =/:= in the Makefiles. Not
sure if there's a preferred project style for when the choice doesn't
really matter.
-Maybe document examples of how to do bulk-editing of data files?


-John Naylor

Вложения

Re: WIP: a way forward on bootstrap data

От
Tom Lane
Дата:
John Naylor <jcnaylor@gmail.com> writes:
> -I shortened the data example in the README so it would comfortably
> fit on two lines. Spreading it out over three lines doesn't match
> what's in the data files. It's valid syntax, but real data is
> formatted to at most two lines (See rewrite_dat.pl. Hmm, maybe I
> should make that more explicit elsewhere in the README)

Well, as I said, I hadn't really reviewed the .dat files, but if that's
what you're doing I'm going to request a change.  Project style is to
fit in 80 columns as much as possible.  I do not see a reason to exempt
the .dat files from that, especially not since it would presumably be a
trivial change in rewrite_dat.pl to insert extra newlines between fields
when needed.  (Obviously, if a field value is so wide it runs past 80
columns on its own, it's not rewrite_dat.pl's charter to fix that.)

> Open items:
> -Test MSVC.

Again, while I'd be happy if someone did that manually, I'm prepared
to let the buildfarm do it.

> -Arrange for rewrite_dat.pl to run when perltidy does.

What I was thinking we should have is a convenience target in
include/Makefile to do this, say "make reformat-dat-files".
I'm not that excited about bundling it into pgindent runs.

> -Maybe document examples of how to do bulk-editing of data files?

+1.  In the end, that's the reason we're doing all this work, so showing
people how to benefit seems like a good thing.

            regards, tom lane


Re: WIP: a way forward on bootstrap data

От
John Naylor
Дата:
On 3/25/18, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Well, as I said, I hadn't really reviewed the .dat files, but if that's
> what you're doing I'm going to request a change.  Project style is to
> fit in 80 columns as much as possible.  I do not see a reason to exempt
> the .dat files from that, especially not since it would presumably be a
> trivial change in rewrite_dat.pl to insert extra newlines between fields
> when needed.  (Obviously, if a field value is so wide it runs past 80
> columns on its own, it's not rewrite_dat.pl's charter to fix that.)

This feature is working now. I've attached a 100-line sample of all
the catalogs' files for viewing. Note, this is pretty raw output,
without the clean-up step from patch 0002. In the most of the original
DATA() lines, there was no spacing between entries except in some
cases to separate groups (often with a comment to describe the group).
My clean-up patch tried to make that more consistent. For this sample,
it would add blank lines before the comments in pg_amop, and remove
blank lines from the first few entries in pg_type. If you wanted to
opine on that before I rework that patch, I'd be grateful.

Also, these data entries have default values removed, but they don't
have human-readable OID macros. (I'll have to adjust that script to
the 80-column limit as well).

>> -Arrange for rewrite_dat.pl to run when perltidy does.
>
> What I was thinking we should have is a convenience target in
> include/Makefile to do this, say "make reformat-dat-files".
> I'm not that excited about bundling it into pgindent runs.

I've attached a draft patch for this. If it's okay, I'll incorporate
it into the series. I think reformat_dat_files.pl also works as a
better script name.

>> -Maybe document examples of how to do bulk-editing of data files?
>
> +1.  In the end, that's the reason we're doing all this work, so showing
> people how to benefit seems like a good thing.

It seems like with that, it'd be good to split off the data-format
section of the README into a new file, maybe README.data, which will
contain code snippets and some example scenarios. I'll include the
example pg_proc.prokind merger among those.

-John Naylor


Re: WIP: a way forward on bootstrap data

От
John Naylor
Дата:
With the attachments this time.

-John Naylor

Вложения

Re: WIP: a way forward on bootstrap data

От
Tom Lane
Дата:
John Naylor <jcnaylor@gmail.com> writes:
> With the attachments this time.

Layout of .dat files seems generally reasonable, but I don't understand
the proposed make rule:

+reformat-dat-files:
+    $(PERL) -I $(catalogdir) $< catalog/rewrite_dat.pl -o catalog catalog/pg_*.dat

This rule has no prerequisite, so what's $< supposed to be?  Also, I think
the rule probably ought to be located in src/include/catalog/Makefile,
because that's typically where you'd be cd'd to when messing with the
.dat files, I'd think.  (Hm, I see no such makefile, but maybe it's time
for one.  A convenience rule located one level up doesn't seem very
convenient.)

> My clean-up patch tried to make that more consistent. For this sample,
> it would add blank lines before the comments in pg_amop, and remove
> blank lines from the first few entries in pg_type. If you wanted to
> opine on that before I rework that patch, I'd be grateful.

No particular objection to either.

>>> -Maybe document examples of how to do bulk-editing of data files?

>> +1.  In the end, that's the reason we're doing all this work, so showing
>> people how to benefit seems like a good thing.

> It seems like with that, it'd be good to split off the data-format
> section of the README into a new file, maybe README.data, which will
> contain code snippets and some example scenarios. I'll include the
> example pg_proc.prokind merger among those.

It would be more work, but maybe we should move this into the main
SGML docs.  It seems rather silly to have SGML documentation for the
.BKI file format, which now will be an internal matter that hardly
any developers need worry about, but not for the .DAT file format.
But I understand if that seems a bridge too far for today --- certainly
a README file is way better than nothing.

            regards, tom lane


Re: WIP: a way forward on bootstrap data

От
John Naylor
Дата:

> On Mar 26, 2018, at 10:44 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote

> Layout of .dat files seems generally reasonable, but I don't understand
> the proposed make rule:
>
> +reformat-dat-files:
> +    $(PERL) -I $(catalogdir) $< catalog/rewrite_dat.pl -o catalog catalog/pg_*.dat
>
> This rule has no prerequisite, so what's $< supposed to be?  Also, I think
> the rule probably ought to be located in src/include/catalog/Makefile,
> because that's typically where you'd be cd'd to when messing with the
> .dat files, I'd think.  (Hm, I see no such makefile, but maybe it's time
> for one.  A convenience rule located one level up doesn't seem very
> convenient.)
>

Oops, copy-pasto. And I’ll see about a new Makefile.

>> It seems like with that, it'd be good to split off the data-format
>> section of the README into a new file, maybe README.data, which will
>> contain code snippets and some example scenarios. I'll include the
>> example pg_proc.prokind merger among those.
>
> It would be more work, but maybe we should move this into the main
> SGML docs.  It seems rather silly to have SGML documentation for the
> .BKI file format, which now will be an internal matter that hardly
> any developers need worry about, but not for the .DAT file format.
> But I understand if that seems a bridge too far for today --- certainly
> a README file is way better than nothing.

Makes sense on all points. I’m not optimistic about creating a new sgml doc on time, but I’ll keep it in mind.

-John Naylor

Re: WIP: a way forward on bootstrap data

От
John Naylor
Дата:
Tom Lane wrote:

>> -Maybe document examples of how to do bulk-editing of data files?
>
> +1.  In the end, that's the reason we're doing all this work, so showing
> people how to benefit seems like a good thing.

I'll hold off on posting a new patchset until I add this to the
documentation, but I wanted to report on a couple of other things:

While adjusting to the 80-column limit, I encountered a separation of
concerns violation between Catalog.pm and reformat_dat_files.pl that I
hadn't noticed before. Fixing that made things easier to read, with
fewer lines of code.

Speaking of bulk editing, that would be done via adopting
reformat_dat_files.pl to the task at hand. I did this myself for two
of the conversion helper scripts. However, enough bitrot has now
occurred that to make the relationship murky. Since I had to adopt
them to the 80-column limit as well, I shaved all the irrelevant
differences away, and now they're just a small diff away from the
reformat script. I also added block comments to help developers find
where they need to edit the script. Since reformat_dat_files.pl has
been substantially altered, I'll attach it here, along with the diffs
to the the helper scripts.

I wrote:

> I’ll see about a new Makefile.

I've attached a draft of this. I thought about adding a call to
duplicate_oids here, but this won't run unless you've run configure
first, and if you've done that, you've likely built already, running
duplicate_oids in the process.

I think I'll consolidate all documentation patches into one, at the
end of the series for maximum flexibility. I liked the idea of
spreading the doc changes over the patches, but there is not a huge
amount of time left.

-John Naylor

Вложения

Re: WIP: a way forward on bootstrap data

От
John Naylor
Дата:
Attached is v13, rebased against b0c90c85fc.

Patch 0001:
-The data files are formatted to at most 80 columns wide.
-Rename rewrite_dat.pl to reformat_dat_file.pl.
-Refactor Catalog.pm and reformat_dat_file.pl to have better
separation of concerns.
-Add src/include/catalog/Makefile with convenience targets for
rewriting data files.

Patch 0002:
-Some adjustments to the post-conversion cleanup of data files.

Patch 0005:
-I made a stub version of Solution.pm to simulate testing the MSVC
build. This found one bug, and also allowed me to bring in some of the
more pedantic dependencies I added to utils/Makefile.

Patch 0009:
-New patch that puts all doc changes in one patch, for flexibility.
-Split the parts of catalog/README having to do with data into a new
README.data file. Add recipes for how to edit data, with code
examples.

On 3/26/18, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> It would be more work, but maybe we should move this into the main
> SGML docs.  It seems rather silly to have SGML documentation for the
> .BKI file format, which now will be an internal matter that hardly
> any developers need worry about, but not for the .DAT file format.
> But I understand if that seems a bridge too far for today --- certainly
> a README file is way better than nothing.

I had an idea to make it less silly without doing as much work: Get
rid of the SGML docs for the BKI format, and turn them into
bootstrap/README. Thoughts?

And in the department of second thoughts, it occurred to me that the
only reason that the .dat files are in include/catalog is because
that's where the DATA() statements were. Since they are separate now,
one could make the case that they actually belong in backend/catalog.
One trivial advantage here is that there is already an existing
Makefile in which to put convenience targets for formatting. On the
other hand, it kind of makes sense to have the files describing the
schema (.h) and the contents (.dat) in the same directory. I'm
inclined to leave things as they are for that reason.

-John Naylor

Вложения

Re: WIP: a way forward on bootstrap data

От
Tom Lane
Дата:
John Naylor <jcnaylor@gmail.com> writes:
> And in the department of second thoughts, it occurred to me that the
> only reason that the .dat files are in include/catalog is because
> that's where the DATA() statements were. Since they are separate now,
> one could make the case that they actually belong in backend/catalog.
> One trivial advantage here is that there is already an existing
> Makefile in which to put convenience targets for formatting. On the
> other hand, it kind of makes sense to have the files describing the
> schema (.h) and the contents (.dat) in the same directory. I'm
> inclined to leave things as they are for that reason.

Yeah.  The fact that, eg, both the .h and .dat files are inputs to
duplicate_oids and unused_oids makes me think it's better to keep
them together.

I'd actually been thinking of something that's about the reverse:
instead of building the derived .h files in backend/catalog and
then symlinking them into include/catalog, it'd be saner to build
them in include/catalog to begin with.  However, that would mean
that the Perl scripts need to produce output in two different
places, so maybe it'd end up more complicated not less so.
In any case, that seems like something to leave for another day.

            regards, tom lane


Re: WIP: a way forward on bootstrap data

От
Tom Lane
Дата:
I'm starting to look through v13 seriously, and one thing struck
me that could use some general discussion: what is our policy
going to be for choosing the default values for catalog columns?
In particular, I noticed that you have for pg_proc

    bool        proisstrict BKI_DEFAULT(f);

    char        provolatile BKI_DEFAULT(v);

    char        proparallel BKI_DEFAULT(u);

which do not comport at all with the most common values in those
columns.  As of HEAD, I see

postgres=# select proisstrict, count(*) from pg_proc group by 1;
 proisstrict | count 
-------------+-------
 f           |   312
 t           |  2640
(2 rows)

postgres=# select provolatile, count(*) from pg_proc group by 1;
 provolatile | count 
-------------+-------
 i           |  2080
 s           |   570
 v           |   302
(3 rows)

postgres=# select proparallel, count(*) from pg_proc group by 1;
 proparallel | count 
-------------+-------
 r           |   139
 s           |  2722
 u           |    91
(3 rows)

(Since this is from the final initdb state, this overstates the number
of .bki entries for pg_proc a bit, but not by much.)

I think there's no question that the default for proisstrict ought
to be "true" --- not only is that by far the more common choice,
but it's actually the safer choice.  A C function that needs to be
marked strict and isn't will at best do the wrong thing, and quite
likely will crash, if passed a NULL value.

The defaults for provolatile and proparallel maybe require more thought
though.  What you've chosen corresponds to the default assumptions of
CREATE FUNCTION, which are what we need for user-defined functions that
we don't know anything about; but I'm not sure that makes them the best
defaults for built-in functions.  I'm inclined to go with the majority
values here, in part because that will make the outliers stand out when
looking at pg_proc.dat.  I don't think it's great that we'll have 2800+
entries explicitly marked with proparallel 'i' or 's', but the less-than-
100 with proparallel 'u' will be so only implicitly because the rewrite
script will strip out any field entries that match the default.  That's
really the worst of all worlds: it'd be better to have no default
in this column at all, I think, than to behave like that.

In short, I'm tempted to say that when there's a clear majority of
entries that would use a particular default, that's the default we
should use, whether or not it's "surprising" or "unsafe" according
to the semantics.  It's clearly not "surprising" for a C function
to be marked proparallel 's'; the other cases are more so.

I'm not seeing any other BKI_DEFAULT choices that I'm inclined to
question, so maybe it's a mistake to try to derive any general
policy choices from such a small number of cases.  But anyway
I'm inclined to change these cases.

Comments anyone?

            regards, tom lane


Re: WIP: a way forward on bootstrap data

От
Andres Freund
Дата:
Hi,

On 2018-04-04 18:29:31 -0400, Tom Lane wrote:
> I'm starting to look through v13 seriously, and one thing struck
> me that could use some general discussion: what is our policy
> going to be for choosing the default values for catalog columns?
>
> [...]
>
> In short, I'm tempted to say that when there's a clear majority of
> entries that would use a particular default, that's the default we
> should use, whether or not it's "surprising" or "unsafe" according
> to the semantics.  It's clearly not "surprising" for a C function
> to be marked proparallel 's'; the other cases are more so.
>
> [...]
>
> I'm not seeing any other BKI_DEFAULT choices that I'm inclined to
> question, so maybe it's a mistake to try to derive any general
> policy choices from such a small number of cases.  But anyway
> I'm inclined to change these cases.
> 
> Comments anyone?

I think choosing SQL defaults is defensible, but so is choosing the most
common value as default to make uncommon stand out more, and so is
choosing the safest values. In short, I don't think it matters terribly
much, we just should try to be reasonably consistent about.

Greetings,

Andres Freund


Re: WIP: a way forward on bootstrap data

От
Tom Lane
Дата:
Here are the results of an evening's desultory hacking on v13.

I was dissatisfied with the fact that we still had several
function-referencing columns that had numeric instead of symbolic
contents, for instance pg_aggregate.aggfnoid.  Of course, the main reason
is that those are declared regproc but reference functions with overloaded
names, which regproc can't handle.  Now that the lookups are being done in
genbki.pl there's no reason why we have to live with that limitation.
In the attached, I've generalized the BKI_LOOKUP(pg_proc) code so that
you can use either regproc-like or regprocedure-like notation, and then
applied that to relevant columns.

I did not like the hard-wired handling of proargtypes and proallargtypes
in genbki.pl; it hardly seems impossible that we'll want similar
conversions for other array-of-OID columns in future.  After a bit of
thought, it seemed like we could allow

    oidvector    proargtypes BKI_LOOKUP(pg_type);

    Oid          proallargtypes[1] BKI_DEFAULT(_null_) BKI_LOOKUP(pg_type);

and just teach genbki.pl that if a lookup rule is attached to
an oidvector or Oid[] column, it means to apply the rule to
each array element individually.

I also changed genbki.pl so that it'd warn about entries that aren't
recognized by the lookup rules.  This seems like a good idea for
catching errors, such as (ahem) applying BKI_LOOKUP to a column
that isn't even an OID.

bootstrap-v13-delta.patch is a diff atop your patch series for the
in-tree files, and convert_oid2name.patch adjusts that script to
make use of the additional conversion capability.

            regards, tom lane

diff --git a/src/backend/catalog/README.data b/src/backend/catalog/README.data
index b7c680c..22ad0f2 100644
*** a/src/backend/catalog/README.data
--- b/src/backend/catalog/README.data
*************** teach Catalog::ParseData() how to expand
*** 62,71 ****
  representation.

  - To aid readability, some values that are references to other catalog
! entries are represented by macros rather than numeric OIDs. This is
! the case for index access methods, opclasses, operators, opfamilies,
! and types. This is done for functions as well, but only if the proname
! is unique.

  Bootstrap Data Conventions
  ==========================
--- 62,103 ----
  representation.

  - To aid readability, some values that are references to other catalog
! entries are represented by names rather than numeric OIDs.  Currently
! this is the case for access methods, functions, operators, opclasses,
! opfamilies, and types.  The rules are as follows:
!
! * Use of names rather than numbers is enabled for a particular catalog
! column by attaching BKI_LOOKUP(lookuprule) to the column's definition,
! where "lookuprule" is pg_am, pg_proc, pg_operator, pg_opclass,
! pg_opfamily, or pg_type.
!
! * In a name-lookup column, all entries must use the name format except
! when writing "0" for InvalidOid.  (If the column is declared regproc,
! you can optionally write "-" instead of "0".)  genbki.pl will warn
! about unrecognized names.
!
! * Access methods are just represented by their names, as are types.
! Type names must match the referenced pg_type entry's typname; you
! do not get to use any aliases such as "integer" for "int4".
!
! * A function can be represented by its proname, if that is unique among
! the pg_proc.dat entries (this works like regproc input).  Otherwise,
! write it as "proname(argtypename,argtypename,...)", like regprocedure.
! The argument type names must be spelled exactly as they are in the
! pg_proc.dat entry's proargtypes field.  Do not insert any spaces.
!
! * Operators are represented by "oprname(lefttype,righttype)", writing the
! type names exactly as they appear in the pg_operator.dat entry's oprleft
! and oprright fields.  (Write 0 for the omitted operand of a unary
! operator.)
!
! * The names of opclasses and opfamilies are only unique within an access
! method, so they are represented by "access_method_name/object_name".
!
! In none of these cases is there any provision for schema-qualification;
! all objects created during bootstrap are expected to be in the pg_catalog
! schema.
!

  Bootstrap Data Conventions
  ==========================
*************** You can also use the duplicate_oids scri
*** 105,111 ****
  build if a duplicate is found.)

  - The OID counter starts at 10000 at bootstrap.  If a catalog row is
! in a table that requires OIDs, but no OID was preassigned by an "OID ="
  clause, then it will receive an OID of 10000 or above.


--- 137,143 ----
  build if a duplicate is found.)

  - The OID counter starts at 10000 at bootstrap.  If a catalog row is
! in a table that requires OIDs, but no OID was preassigned by an "oid =>"
  clause, then it will receive an OID of 10000 or above.


diff --git a/src/backend/catalog/genbki.pl b/src/backend/catalog/genbki.pl
index 27494d9..f6be50a 100644
*** a/src/backend/catalog/genbki.pl
--- b/src/backend/catalog/genbki.pl
*************** foreach my $row (@{ $catalog_data{pg_opf
*** 169,181 ****
  my %procoids;
  foreach my $row (@{ $catalog_data{pg_proc} })
  {
!     if (defined($procoids{ $row->{proname} }))
      {
!         $procoids{ $row->{proname} } = 'MULTIPLE';
      }
      else
      {
!         $procoids{ $row->{proname} } = $row->{oid};
      }
  }

--- 169,197 ----
  my %procoids;
  foreach my $row (@{ $catalog_data{pg_proc} })
  {
!     # Generate an entry under just the proname (corresponds to regproc lookup)
!     my $prokey = $row->{proname};
!     if (defined($procoids{ $prokey }))
      {
!         $procoids{ $prokey } = 'MULTIPLE';
      }
      else
      {
!         $procoids{ $prokey } = $row->{oid};
!     }
!     # Also generate an entry using proname(proargtypes).  This is not quite
!     # identical to regprocedure lookup because we don't worry much about
!     # special SQL names for types etc; we just use the names in the source
!     # proargtypes field.  These *should* be unique, but do a multiplicity
!     # check anyway.
!     $prokey .= '(' . join(',', split(/\s+/, $row->{proargtypes})) . ')';
!     if (defined($procoids{ $prokey }))
!     {
!         $procoids{ $prokey } = 'MULTIPLE';
!     }
!     else
!     {
!         $procoids{ $prokey } = $row->{oid};
      }
  }

*************** EOM
*** 294,300 ****
          print $def $line;
      }

!     # Open it, unless bootstrap case (create bootstrap does this
      # automatically)
      if (!$catalog->{bootstrap})
      {
--- 310,316 ----
          print $def $line;
      }

!     # Open it, unless it's a bootstrap catalog (create bootstrap does this
      # automatically)
      if (!$catalog->{bootstrap})
      {
*************** EOM
*** 323,375 ****
              $bki_values{$attname} =~ s/\bPGUID\b/$BOOTSTRAP_SUPERUSERID/g;
              $bki_values{$attname} =~ s/\bPGNSP\b/$PG_CATALOG_NAMESPACE/g;

!             # Replace OID macros with OIDs.
!             # If we don't have a unique value to substitute, just do
!             # nothing. This should only happen in the case for functions,
!             # in which case regprocin will complain.
              if ($column->{lookup})
              {
                  my $lookup = $lookup_kind{ $column->{lookup} };
-                 my $lookupoid = $lookup->{ $bki_values{$attname} };
-                 $bki_values{$attname} = $lookupoid
-                   if defined($lookupoid) && $lookupoid ne 'MULTIPLE';
-             }
-         }

!         # Some pg_proc columns contain lists of types, so we must unpack
!         # these and do the lookups on each element in turn.
!         if ($catname eq 'pg_proc')
!         {

!             # proargtypes
!             if ($bki_values{proargtypes})
!             {
!                 my @argtypenames = split /\s+/, $bki_values{proargtypes};
!                 my @argtypeoids;
!                 foreach my $argtypename (@argtypenames)
                  {
!                     my $argtypeoid  = $typeoids{$argtypename};
!                     push @argtypeoids, $argtypeoid;
                  }
!                 $bki_values{proargtypes} = join(' ', @argtypeoids);
!             }
!
!             # proallargtypes
!             if ($bki_values{proallargtypes} ne '_null_')
!             {
!                 $bki_values{proallargtypes} =~ s/[{}]//g;
!                 my @argtypenames = split /,/, $bki_values{proallargtypes};
!                 my @argtypeoids;
!                 foreach my $argtypename (@argtypenames)
                  {
!                     my $argtypeoid  = $typeoids{$argtypename};
!                     push @argtypeoids, $argtypeoid;
                  }
-                 $bki_values{proallargtypes} =
-                     sprintf "{%s}", join(',', @argtypeoids);
              }
          }
!         elsif ($catname eq 'pg_type' and !exists $bki_values{oid_symbol})
          {
              my $symbol = form_pg_type_symbol($bki_values{typname});
              $bki_values{oid_symbol} = $symbol
--- 339,423 ----
              $bki_values{$attname} =~ s/\bPGUID\b/$BOOTSTRAP_SUPERUSERID/g;
              $bki_values{$attname} =~ s/\bPGNSP\b/$PG_CATALOG_NAMESPACE/g;

!             # Replace OID synonyms with OIDs per the appropriate lookup rule.
!             #
!             # If the column type is oidvector or oid[], we have to replace
!             # each element of the array as per the lookup rule.
!             #
!             # If we don't have a unique value to substitute, warn and
!             # leave the entry unchanged.
              if ($column->{lookup})
              {
                  my $lookup = $lookup_kind{ $column->{lookup} };

!                 die "unrecognized BKI_LOOKUP type " . $column->{lookup}
!                   if !defined($lookup);

!                 if ($atttype eq 'oidvector')
                  {
!                     my @lookupnames = split /\s+/, $bki_values{$attname};
!                     my @lookupoids;
!                     foreach my $lookupname (@lookupnames)
!                     {
!                         my $lookupoid = $lookup->{ $lookupname };
!                         if (defined($lookupoid) && $lookupoid ne 'MULTIPLE')
!                         {
!                             $lookupname = $lookupoid;
!                         }
!                         else
!                         {
!                             warn "unresolved OID reference \"$lookupname\" in $catname row " . join(',',
values(%bki_values))
!                                 if $lookupname ne '-' && $lookupname ne '0';
!                         }
!                         push @lookupoids, $lookupname;
!                     }
!                     $bki_values{$attname} = join(' ', @lookupoids);
                  }
!                 elsif ($atttype eq 'oid[]')
                  {
!                     if ($bki_values{$attname} ne '_null_')
!                     {
!                         $bki_values{$attname} =~ s/[{}]//g;
!                         my @lookupnames = split /,/, $bki_values{$attname};
!                         my @lookupoids;
!                         foreach my $lookupname (@lookupnames)
!                         {
!                             my $lookupoid = $lookup->{ $lookupname };
!                             if (defined($lookupoid) && $lookupoid ne 'MULTIPLE')
!                             {
!                                 $lookupname = $lookupoid;
!                             }
!                             else
!                             {
!                                 warn "unresolved OID reference \"$lookupname\" in $catname row " . join(',',
values(%bki_values))
!                                     if $lookupname ne '-' && $lookupname ne '0';
!                             }
!                             push @lookupoids, $lookupname;
!                         }
!                         $bki_values{$attname} =
!                             sprintf "{%s}", join(',', @lookupoids);
!                     }
!                 }
!                 else
!                 {
!                     my $lookupname = $bki_values{$attname};
!                     my $lookupoid = $lookup->{ $lookupname };
!                     if (defined($lookupoid) && $lookupoid ne 'MULTIPLE')
!                     {
!                         $bki_values{$attname} = $lookupoid;
!                     }
!                     else
!                     {
!                         warn "unresolved OID reference \"$lookupname\" in $catname row " . join(',',
values(%bki_values))
!                             if $lookupname ne '-' && $lookupname ne '0';
!                     }
                  }
              }
          }
!
!         # Special hack to generate OID symbols for pg_type entries
!         # that lack one.
!         if ($catname eq 'pg_type' and !exists $bki_values{oid_symbol})
          {
              my $symbol = form_pg_type_symbol($bki_values{typname});
              $bki_values{oid_symbol} = $symbol
diff --git a/src/include/catalog/pg_aggregate.h b/src/include/catalog/pg_aggregate.h
index 226bb07..767bab5 100644
*** a/src/include/catalog/pg_aggregate.h
--- b/src/include/catalog/pg_aggregate.h
***************
*** 31,37 ****
  CATALOG(pg_aggregate,2600) BKI_WITHOUT_OIDS
  {
      /* pg_proc OID of the aggregate itself */
!     regproc        aggfnoid;

      /* aggregate kind, see AGGKIND_ categories below */
      char        aggkind BKI_DEFAULT(n);
--- 31,37 ----
  CATALOG(pg_aggregate,2600) BKI_WITHOUT_OIDS
  {
      /* pg_proc OID of the aggregate itself */
!     regproc        aggfnoid BKI_LOOKUP(pg_proc);

      /* aggregate kind, see AGGKIND_ categories below */
      char        aggkind BKI_DEFAULT(n);
diff --git a/src/include/catalog/pg_amproc.h b/src/include/catalog/pg_amproc.h
index f045bfa..accbe83 100644
*** a/src/include/catalog/pg_amproc.h
--- b/src/include/catalog/pg_amproc.h
*************** CATALOG(pg_amproc,2603)
*** 54,63 ****
      Oid            amprocrighttype BKI_LOOKUP(pg_type);

      /* support procedure index */
!     int16        amprocnum BKI_LOOKUP(pg_type);

      /* OID of the proc */
!     regproc        amproc;
  } FormData_pg_amproc;

  /* ----------------
--- 54,63 ----
      Oid            amprocrighttype BKI_LOOKUP(pg_type);

      /* support procedure index */
!     int16        amprocnum;

      /* OID of the proc */
!     regproc        amproc BKI_LOOKUP(pg_proc);
  } FormData_pg_amproc;

  /* ----------------
diff --git a/src/include/catalog/pg_cast.h b/src/include/catalog/pg_cast.h
index 2674701..f3bc3c0 100644
*** a/src/include/catalog/pg_cast.h
--- b/src/include/catalog/pg_cast.h
*************** CATALOG(pg_cast,2605)
*** 39,45 ****
      Oid            casttarget BKI_LOOKUP(pg_type);

      /* cast function; 0 = binary coercible */
!     Oid            castfunc;

      /* contexts in which cast can be used */
      char        castcontext;
--- 39,45 ----
      Oid            casttarget BKI_LOOKUP(pg_type);

      /* cast function; 0 = binary coercible */
!     Oid            castfunc BKI_LOOKUP(pg_proc);

      /* contexts in which cast can be used */
      char        castcontext;
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 8d5d044..b9d9cfd 100644
*** a/src/include/catalog/pg_proc.h
--- b/src/include/catalog/pg_proc.h
*************** CATALOG(pg_proc,1255) BKI_BOOTSTRAP BKI_
*** 49,55 ****
      float4        prorows BKI_DEFAULT(0);

      /* element type of variadic array, or 0 */
!     Oid            provariadic BKI_DEFAULT(0);

      /* transforms calls to it during planning */
      regproc        protransform BKI_DEFAULT(0) BKI_LOOKUP(pg_proc);
--- 49,55 ----
      float4        prorows BKI_DEFAULT(0);

      /* element type of variadic array, or 0 */
!     Oid            provariadic BKI_DEFAULT(0) BKI_LOOKUP(pg_type);

      /* transforms calls to it during planning */
      regproc        protransform BKI_DEFAULT(0) BKI_LOOKUP(pg_proc);
*************** CATALOG(pg_proc,1255) BKI_BOOTSTRAP BKI_
*** 90,101 ****
       */

      /* parameter types (excludes OUT params) */
!     oidvector    proargtypes;

  #ifdef CATALOG_VARLEN

      /* all param types (NULL if IN only) */
!     Oid            proallargtypes[1] BKI_DEFAULT(_null_);

      /* parameter modes (NULL if IN only) */
      char        proargmodes[1] BKI_DEFAULT(_null_);
--- 90,101 ----
       */

      /* parameter types (excludes OUT params) */
!     oidvector    proargtypes BKI_LOOKUP(pg_type);

  #ifdef CATALOG_VARLEN

      /* all param types (NULL if IN only) */
!     Oid            proallargtypes[1] BKI_DEFAULT(_null_) BKI_LOOKUP(pg_type);

      /* parameter modes (NULL if IN only) */
      char        proargmodes[1] BKI_DEFAULT(_null_);
diff --git a/src/include/catalog/pg_type.h b/src/include/catalog/pg_type.h
index 381da18..8992fcd 100644
*** a/src/include/catalog/pg_type.h
--- b/src/include/catalog/pg_type.h
*************** CATALOG(pg_type,1247) BKI_BOOTSTRAP BKI_
*** 109,121 ****
       *
       * typelem != 0 and typlen == -1.
       */
!     Oid            typelem BKI_DEFAULT(0);

      /*
       * If there is a "true" array type having this type as element type,
       * typarray links to it.  Zero if no associated "true" array type.
       */
!     Oid            typarray;

      /*
       * I/O conversion procedures for the datatype.
--- 109,121 ----
       *
       * typelem != 0 and typlen == -1.
       */
!     Oid            typelem BKI_DEFAULT(0) BKI_LOOKUP(pg_type);

      /*
       * If there is a "true" array type having this type as element type,
       * typarray links to it.  Zero if no associated "true" array type.
       */
!     Oid            typarray BKI_DEFAULT(0) BKI_LOOKUP(pg_type);

      /*
       * I/O conversion procedures for the datatype.
*** boot-13/convert_oid2name.pl~    Sat Mar 31 07:53:29 2018
--- boot-13/convert_oid2name.pl    Wed Apr  4 21:45:05 2018
***************
*** 129,146 ****
        $row->{oprname}, $typenames{$row->{oprleft}}, $typenames{$row->{oprright}};
  }

! # Proc oid lookup.
  my %procoids;
  foreach my $row (@{ $catalog_data{pg_proc} })
  {
      next if !ref $row;
      if (defined($procoids{ $row->{proname} }))
      {
          $procoids{ $row->{proname} } = 'MULTIPLE';
      }
      else
      {
!         $procoids{ $row->{oid} } = $row->{proname};
      }
  }

--- 129,153 ----
        $row->{oprname}, $typenames{$row->{oprleft}}, $typenames{$row->{oprright}};
  }

! # Proc oid lookup (see lookup_procname).
! my %procshortnames;
! my %proclongnames;
  my %procoids;
  foreach my $row (@{ $catalog_data{pg_proc} })
  {
      next if !ref $row;
+     $procshortnames{ $row->{oid} } = $row->{proname};
+     $proclongnames{ $row->{oid} } = sprintf "%s(%s)",
+       $row->{proname},
+       join(',', map($typenames{$_}, split(/\s+/, $row->{proargtypes})));
+     # We use this to track whether a proname is duplicated.
      if (defined($procoids{ $row->{proname} }))
      {
          $procoids{ $row->{proname} } = 'MULTIPLE';
      }
      else
      {
!         $procoids{ $row->{proname} } = $row->{oid};
      }
  }

***************
*** 186,191 ****
--- 193,200 ----

              if ($catname eq 'pg_proc')
              {
+                 $values{provariadic} = $typenames{$values{provariadic}}
+                   if exists $values{provariadic};
                  $values{prorettype} = $typenames{$values{prorettype}};
                  if ($values{proargtypes})
                  {
***************
*** 211,222 ****
--- 220,236 ----
              }
              elsif ($catname eq 'pg_aggregate')
              {
+                 $values{aggfnoid}     = lookup_procname($values{aggfnoid});
                  $values{aggsortop}     = $opernames{$values{aggsortop}}
                    if exists $values{aggsortop};
                  $values{aggtranstype}  = $typenames{$values{aggtranstype}};
                  $values{aggmtranstype} = $typenames{$values{aggmtranstype}}
                    if exists $values{aggmtranstype};
              }
+             elsif ($catname eq 'pg_am')
+             {
+                 $values{aggfnoid}     = lookup_procname($values{aggfnoid});
+             }
              elsif ($catname eq 'pg_amop')
              {
                  $values{amoplefttype}   = $typenames{$values{amoplefttype}};
***************
*** 232,242 ****
--- 246,258 ----
                  $values{amprocfamily}    = $opfnames{$values{amprocfamily}};
                  $values{amproclefttype}  = $typenames{$values{amproclefttype}};
                  $values{amprocrighttype} = $typenames{$values{amprocrighttype}};
+                 $values{amproc}          = lookup_procname($values{amproc});
              }
              elsif ($catname eq 'pg_cast')
              {
                  $values{castsource} = $typenames{$values{castsource}};
                  $values{casttarget} = $typenames{$values{casttarget}};
+                 $values{castfunc}   = lookup_procname($values{castfunc});
              }
              elsif ($catname eq 'pg_opclass')
              {
***************
*** 255,260 ****
--- 271,277 ----
                    if exists $values{oprcom};
                  $values{oprnegate} = $opernames{$values{oprnegate}}
                    if exists $values{oprnegate};
+                 $values{oprcode}   = lookup_procname($values{oprcode});
              }
              elsif ($catname eq 'pg_opfamily')
              {
***************
*** 266,271 ****
--- 283,295 ----
                  $values{rngsubtype} = $typenames{$values{rngsubtype}};
                  $values{rngsubopc}  = $opcnames{$values{rngsubopc}};
              }
+             elsif ($catname eq 'pg_type')
+             {
+                 $values{typelem}    = $typenames{$values{typelem}}
+                   if exists $values{typelem};
+                 $values{typarray}   = $typenames{$values{typarray}}
+                   if exists $values{typarray};
+             }

              ############################################################

***************
*** 412,417 ****
--- 436,454 ----
      return $hash_str;
  }

+ sub lookup_procname
+ {
+     my $oid = shift;
+     return $oid if !defined($oid) || $oid eq '-' || $oid eq '0';
+     my $shortname = $procshortnames{$oid};
+     return $shortname if defined($shortname) &&
+         defined($procoids{$shortname}) &&
+         $procoids{$shortname} eq $oid;
+     my $longname = $proclongnames{$oid};
+     return $longname if defined($longname);
+     return $oid;
+ }
+
  sub usage
  {
      die <<EOM;

Re: WIP: a way forward on bootstrap data

От
John Naylor
Дата:
On 4/5/18, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Here are the results of an evening's desultory hacking on v13.
>
> [numeric function oids with overloaded name]

Thank you for the detailed review and for improving the function
references (not to mention the type references I somehow left on the
table). I was also not quite satisfied with just the regproc columns.

> I did not like the hard-wired handling of proargtypes and proallargtypes
> in genbki.pl; it hardly seems impossible that we'll want similar
> conversions for other array-of-OID columns in future.  After a bit of
> thought, it seemed like we could allow
>
>     oidvector    proargtypes BKI_LOOKUP(pg_type);
>
>     Oid          proallargtypes[1] BKI_DEFAULT(_null_) BKI_LOOKUP(pg_type);
>
> and just teach genbki.pl that if a lookup rule is attached to
> an oidvector or Oid[] column, it means to apply the rule to
> each array element individually.

I think that's a good idea. I went an extra step and extracted the
common logic into a function (attached draft patch to be applied on
top of yours). It treats all lookups as operating on arrays. The
common case is that we pass a single-element array. That may seem
awkward, but I think it's clear. The code is slimmer, and the lines
now fit within 80 characters.

> I also changed genbki.pl so that it'd warn about entries that aren't
> recognized by the lookup rules.  This seems like a good idea for
> catching errors, such as (ahem) applying BKI_LOOKUP to a column
> that isn't even an OID.

Yikes, I must have fat-fingered that during the comment reformatting.

Unrelated, I noticed my quoting of defaults that contain back-slashes
was half-baked, so I'll include that fix in the next patchset. I'll
put out a new one in a couple days, to give a chance for further
review and discussion of the defaults. I didn't feel the need to
respond to the other messages, but yours and Andres' points are well
taken.

-John Naylor

Вложения

Re: WIP: a way forward on bootstrap data

От
Tom Lane
Дата:
John Naylor <jcnaylor@gmail.com> writes:
> On 4/5/18, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I did not like the hard-wired handling of proargtypes and proallargtypes
>> in genbki.pl; it hardly seems impossible that we'll want similar
>> conversions for other array-of-OID columns in future.  After a bit of
>> thought, it seemed like we could allow
>>     oidvector    proargtypes BKI_LOOKUP(pg_type);
>>     Oid          proallargtypes[1] BKI_DEFAULT(_null_) BKI_LOOKUP(pg_type);
>> and just teach genbki.pl that if a lookup rule is attached to
>> an oidvector or Oid[] column, it means to apply the rule to
>> each array element individually.

> I think that's a good idea. I went an extra step and extracted the
> common logic into a function (attached draft patch to be applied on
> top of yours). It treats all lookups as operating on arrays. The
> common case is that we pass a single-element array. That may seem
> awkward, but I think it's clear. The code is slimmer, and the lines
> now fit within 80 characters.

Sounds good.  I too was bothered by the duplication of code, but
I'm not a good enough Perl programmer to have thought of that solution.

Something that bothered me a bit while writing the warning-producing code
is that showing %bki_values isn't actually that great a way of identifying
the trouble spot.  By this point we've expanded out defaults and possibly
replaced some other macros, so it doesn't look that much like what was
in the .dat file.  I think what would be ideal, both here and in some
other places like AddDefaultValues, is to be able to finger the location
of the bad tuple by filename and line number, but I have no idea whether
it's practical to annotate the tuples with that while reading the .dat
files.  Any thoughts?

(Obviously, better error messages could be a future improvement; it's not
something we have to get done before the conversion.)

> Unrelated, I noticed my quoting of defaults that contain back-slashes
> was half-baked, so I'll include that fix in the next patchset. I'll
> put out a new one in a couple days, to give a chance for further
> review and discussion of the defaults. I didn't feel the need to
> respond to the other messages, but yours and Andres' points are well
> taken.

We're getting down to the wire here --- I think the plan is to close
the CF on Saturday or Sunday, and then push the bootstrap changes right
after that.  So please turn around whatever you're planning to do ASAP.
I'm buckling down to a final review today and tomorrow.

            regards, tom lane


Re: WIP: a way forward on bootstrap data

От
Tom Lane
Дата:
I experimented with converting all frontend code to include just the
catalog/pg_foo_d.h files instead of catalog/pg_foo.h, as per the
proposed new policy.  I soon found that we'd overlooked one thing:
some clients expect to see the relation OID macros, eg
LargeObjectRelationId.  Attached is a patch that changes things around
so that those appear in the _d files instead of the master files.
This is cleaner anyway because it removes duplication of the OIDs in
the master files, with attendant risk of error.  For example we
have this change in pg_aggregate.h:

-#define AggregateRelationId  2600
-
-CATALOG(pg_aggregate,2600) BKI_WITHOUT_OIDS
+CATALOG(pg_aggregate,2600,AggregateRelationId) BKI_WITHOUT_OIDS

Some of the CATALOG lines spill well past 80 characters with this,
although many of the affected ones already were overlength, eg

-#define DatabaseRelationId    1262
-#define DatabaseRelation_Rowtype_Id  1248
-
-CATALOG(pg_database,1262) BKI_SHARED_RELATION BKI_ROWTYPE_OID(1248) BKI_SCHEMA_MACRO
+CATALOG(pg_database,1262,DatabaseRelationId) BKI_SHARED_RELATION BKI_ROWTYPE_OID(1248,DatabaseRelation_Rowtype_Id)
BKI_SCHEMA_MACRO

I thought about improving that by removing the restriction that these
BKI annotations appear on the same line as the CATALOG macro, so that
we could break the above into several lines.  I think the original key
reason for the restriction was to avoid accidentally taking some bit
of a DATA line as a BKI annotation.  With the DATA lines gone from these
files, that's no longer a significant hazard (although passing references
to BKI keywords in comments might still be hazards for the Perl scripts).
However, if we try to format things like

CATALOG(pg_database,1262,DatabaseRelationId)
    BKI_SHARED_RELATION
    BKI_ROWTYPE_OID(1248,DatabaseRelation_Rowtype_Id)
    BKI_SCHEMA_MACRO
{
    fields...
}

I'm afraid that neither pgindent nor a lot of common editors would indent
that very nicely.  So at least for the moment I'm inclined to just keep
it all on one line ... we know how that behaves, anyway.

            regards, tom lane

diff --git a/src/backend/catalog/Catalog.pm b/src/backend/catalog/Catalog.pm
index 519247e..fb3d62a 100644
--- a/src/backend/catalog/Catalog.pm
+++ b/src/backend/catalog/Catalog.pm
@@ -104,18 +104,29 @@ sub ParseHeader
             {
                 push @{ $catalog{indexing} }, "build indices\n";
             }
-            elsif (/^CATALOG\(([^,]*),(\d+)\)/)
+            elsif (/^CATALOG\(([^,]*),(\d+),(\w+)\)/)
             {
                 $catalog{catname} = $1;
                 $catalog{relation_oid} = $2;
+                $catalog{relation_oid_macro} = $3;

                 $catalog{bootstrap} = /BKI_BOOTSTRAP/ ? ' bootstrap' : '';
                 $catalog{shared_relation} =
                   /BKI_SHARED_RELATION/ ? ' shared_relation' : '';
                 $catalog{without_oids} =
                   /BKI_WITHOUT_OIDS/ ? ' without_oids' : '';
-                $catalog{rowtype_oid} =
-                  /BKI_ROWTYPE_OID\((\d+)\)/ ? " rowtype_oid $1" : '';
+                if (/BKI_ROWTYPE_OID\((\d+),(\w+)\)/)
+                {
+                    $catalog{rowtype_oid} = $1;
+                    $catalog{rowtype_oid_clause} = " rowtype_oid $1";
+                    $catalog{rowtype_oid_macro} = $2;
+                }
+                else
+                {
+                    $catalog{rowtype_oid} = '';
+                    $catalog{rowtype_oid_clause} = '';
+                    $catalog{rowtype_oid_macro} = '';
+                }
                 $catalog{schema_macro} = /BKI_SCHEMA_MACRO/ ? 1 : 0;
                 $declaring_attributes = 1;
             }
diff --git a/src/backend/catalog/genbki.pl b/src/backend/catalog/genbki.pl
index f6be50a..fe8c3ca 100644
--- a/src/backend/catalog/genbki.pl
+++ b/src/backend/catalog/genbki.pl
@@ -228,6 +228,7 @@ my @tables_needing_macros;
 # produce output, one catalog at a time
 foreach my $catname (@catnames)
 {
+    my $catalog = $catalogs{$catname};

     # Create one definition header with macro definitions for each catalog.
     my $def_file = $output_path . $catname . '_d.h';
@@ -258,13 +259,21 @@ foreach my $catname (@catnames)

 EOM

+    # Emit OID macros for catalog's OID and rowtype OID, if wanted
+    print $def
+      sprintf("#define %s %s\n", $catalog->{relation_oid_macro}, $catalog->{relation_oid})
+      if $catalog->{relation_oid_macro} ne '';
+    print $def
+      sprintf("#define %s %s\n", $catalog->{rowtype_oid_macro}, $catalog->{rowtype_oid})
+      if $catalog->{rowtype_oid_macro} ne '';
+    print $def "\n";
+
     # .bki CREATE command for this catalog
-    my $catalog = $catalogs{$catname};
     print $bki "create $catname $catalog->{relation_oid}"
       . $catalog->{shared_relation}
       . $catalog->{bootstrap}
       . $catalog->{without_oids}
-      . $catalog->{rowtype_oid} . "\n";
+      . $catalog->{rowtype_oid_clause} . "\n";

     my $first = 1;

diff --git a/src/include/catalog/duplicate_oids b/src/include/catalog/duplicate_oids
index 9732f61..0e6285f 100755
--- a/src/include/catalog/duplicate_oids
+++ b/src/include/catalog/duplicate_oids
@@ -15,7 +15,7 @@ while (<>)
     next if /^CATALOG\(.*BKI_BOOTSTRAP/;
     next
       unless /\boid *=> *'(\d+)'/
-          || /^CATALOG\([^,]*, *(\d+).*BKI_ROWTYPE_OID\((\d+)\)/
+          || /^CATALOG\([^,]*, *(\d+).*BKI_ROWTYPE_OID\((\d+),/
           || /^CATALOG\([^,]*, *(\d+)/
           || /^DECLARE_INDEX\([^,]*, *(\d+)/
           || /^DECLARE_UNIQUE_INDEX\([^,]*, *(\d+)/
diff --git a/src/include/catalog/genbki.h b/src/include/catalog/genbki.h
index af064fc..02c38c4 100644
--- a/src/include/catalog/genbki.h
+++ b/src/include/catalog/genbki.h
@@ -20,13 +20,13 @@
 #define GENBKI_H

 /* Introduces a catalog's structure definition */
-#define CATALOG(name,oid)    typedef struct CppConcat(FormData_,name)
+#define CATALOG(name,oid,oidmacro)    typedef struct CppConcat(FormData_,name)

 /* Options that may appear after CATALOG (on the same line) */
 #define BKI_BOOTSTRAP
 #define BKI_SHARED_RELATION
 #define BKI_WITHOUT_OIDS
-#define BKI_ROWTYPE_OID(oid)
+#define BKI_ROWTYPE_OID(oid,oidmacro)
 #define BKI_SCHEMA_MACRO

 /* Options that may appear after an attribute (on the same line) */
@@ -34,7 +34,7 @@
 #define BKI_FORCE_NOT_NULL
 /* Specifies a default value for a catalog field */
 #define BKI_DEFAULT(value)
-/* Indicates where to perform lookups for OID macros */
+/* Indicates how to perform name lookups for OID fields */
 #define BKI_LOOKUP(catalog)

 /* The following are never defined; they are here only for documentation. */
diff --git a/src/include/catalog/pg_aggregate.h b/src/include/catalog/pg_aggregate.h
index 767bab5..4d0ec01 100644
--- a/src/include/catalog/pg_aggregate.h
+++ b/src/include/catalog/pg_aggregate.h
@@ -26,9 +26,7 @@
  *        cpp turns this into typedef struct FormData_pg_aggregate
  * ----------------------------------------------------------------
  */
-#define AggregateRelationId  2600
-
-CATALOG(pg_aggregate,2600) BKI_WITHOUT_OIDS
+CATALOG(pg_aggregate,2600,AggregateRelationId) BKI_WITHOUT_OIDS
 {
     /* pg_proc OID of the aggregate itself */
     regproc        aggfnoid BKI_LOOKUP(pg_proc);
diff --git a/src/include/catalog/pg_am.h b/src/include/catalog/pg_am.h
index d6454c5..5aa2bac 100644
--- a/src/include/catalog/pg_am.h
+++ b/src/include/catalog/pg_am.h
@@ -26,9 +26,7 @@
  *        typedef struct FormData_pg_am
  * ----------------
  */
-#define AccessMethodRelationId    2601
-
-CATALOG(pg_am,2601)
+CATALOG(pg_am,2601,AccessMethodRelationId)
 {
     /* access method name */
     NameData    amname;
diff --git a/src/include/catalog/pg_amop.h b/src/include/catalog/pg_amop.h
index 59842a6..a481691 100644
--- a/src/include/catalog/pg_amop.h
+++ b/src/include/catalog/pg_amop.h
@@ -51,9 +51,7 @@
  *        typedef struct FormData_pg_amop
  * ----------------
  */
-#define AccessMethodOperatorRelationId    2602
-
-CATALOG(pg_amop,2602)
+CATALOG(pg_amop,2602,AccessMethodOperatorRelationId)
 {
     /* the index opfamily this entry is for */
     Oid            amopfamily BKI_LOOKUP(pg_opfamily);
diff --git a/src/include/catalog/pg_amproc.h b/src/include/catalog/pg_amproc.h
index accbe83..d638e0c 100644
--- a/src/include/catalog/pg_amproc.h
+++ b/src/include/catalog/pg_amproc.h
@@ -40,9 +40,7 @@
  *        typedef struct FormData_pg_amproc
  * ----------------
  */
-#define AccessMethodProcedureRelationId  2603
-
-CATALOG(pg_amproc,2603)
+CATALOG(pg_amproc,2603,AccessMethodProcedureRelationId)
 {
     /* the index opfamily this entry is for */
     Oid            amprocfamily BKI_LOOKUP(pg_opfamily);
diff --git a/src/include/catalog/pg_attrdef.h b/src/include/catalog/pg_attrdef.h
index 068ab64..16b106d 100644
--- a/src/include/catalog/pg_attrdef.h
+++ b/src/include/catalog/pg_attrdef.h
@@ -26,9 +26,7 @@
  *        typedef struct FormData_pg_attrdef
  * ----------------
  */
-#define AttrDefaultRelationId  2604
-
-CATALOG(pg_attrdef,2604)
+CATALOG(pg_attrdef,2604,AttrDefaultRelationId)
 {
     Oid            adrelid;        /* OID of table containing attribute */
     int16        adnum;            /* attnum of attribute */
diff --git a/src/include/catalog/pg_attribute.h b/src/include/catalog/pg_attribute.h
index 1b3f306..69b651a 100644
--- a/src/include/catalog/pg_attribute.h
+++ b/src/include/catalog/pg_attribute.h
@@ -34,10 +34,7 @@
  *        You may need to change catalog/genbki.pl as well.
  * ----------------
  */
-#define AttributeRelationId  1249
-#define AttributeRelation_Rowtype_Id  75
-
-CATALOG(pg_attribute,1249) BKI_BOOTSTRAP BKI_WITHOUT_OIDS BKI_ROWTYPE_OID(75) BKI_SCHEMA_MACRO
+CATALOG(pg_attribute,1249,AttributeRelationId) BKI_BOOTSTRAP BKI_WITHOUT_OIDS
BKI_ROWTYPE_OID(75,AttributeRelation_Rowtype_Id)BKI_SCHEMA_MACRO 
 {
     Oid            attrelid;        /* OID of relation containing this attribute */
     NameData    attname;        /* name of attribute */
diff --git a/src/include/catalog/pg_auth_members.h b/src/include/catalog/pg_auth_members.h
index b8ac653..75bc2ba 100644
--- a/src/include/catalog/pg_auth_members.h
+++ b/src/include/catalog/pg_auth_members.h
@@ -27,10 +27,7 @@
  *        typedef struct FormData_pg_auth_members
  * ----------------
  */
-#define AuthMemRelationId    1261
-#define AuthMemRelation_Rowtype_Id    2843
-
-CATALOG(pg_auth_members,1261) BKI_SHARED_RELATION BKI_WITHOUT_OIDS BKI_ROWTYPE_OID(2843) BKI_SCHEMA_MACRO
+CATALOG(pg_auth_members,1261,AuthMemRelationId) BKI_SHARED_RELATION BKI_WITHOUT_OIDS
BKI_ROWTYPE_OID(2843,AuthMemRelation_Rowtype_Id)BKI_SCHEMA_MACRO 
 {
     Oid            roleid;            /* ID of a role */
     Oid            member;            /* ID of a member of that role */
diff --git a/src/include/catalog/pg_authid.h b/src/include/catalog/pg_authid.h
index f27906f..863ef65 100644
--- a/src/include/catalog/pg_authid.h
+++ b/src/include/catalog/pg_authid.h
@@ -28,10 +28,7 @@
  *        typedef struct FormData_pg_authid
  * ----------------
  */
-#define AuthIdRelationId    1260
-#define AuthIdRelation_Rowtype_Id    2842
-
-CATALOG(pg_authid,1260) BKI_SHARED_RELATION BKI_ROWTYPE_OID(2842) BKI_SCHEMA_MACRO
+CATALOG(pg_authid,1260,AuthIdRelationId) BKI_SHARED_RELATION BKI_ROWTYPE_OID(2842,AuthIdRelation_Rowtype_Id)
BKI_SCHEMA_MACRO
 {
     NameData    rolname;        /* name of role */
     bool        rolsuper;        /* read this field via superuser() only! */
diff --git a/src/include/catalog/pg_cast.h b/src/include/catalog/pg_cast.h
index 8a5b7f9..a9e7e2b 100644
--- a/src/include/catalog/pg_cast.h
+++ b/src/include/catalog/pg_cast.h
@@ -28,9 +28,7 @@
  *        typedef struct FormData_pg_cast
  * ----------------
  */
-#define CastRelationId    2605
-
-CATALOG(pg_cast,2605)
+CATALOG(pg_cast,2605,CastRelationId)
 {
     /* source datatype for cast */
     Oid            castsource BKI_LOOKUP(pg_type);
diff --git a/src/include/catalog/pg_class.h b/src/include/catalog/pg_class.h
index 914aa63..28d939d 100644
--- a/src/include/catalog/pg_class.h
+++ b/src/include/catalog/pg_class.h
@@ -26,10 +26,7 @@
  *        typedef struct FormData_pg_class
  * ----------------
  */
-#define RelationRelationId    1259
-#define RelationRelation_Rowtype_Id  83
-
-CATALOG(pg_class,1259) BKI_BOOTSTRAP BKI_ROWTYPE_OID(83) BKI_SCHEMA_MACRO
+CATALOG(pg_class,1259,RelationRelationId) BKI_BOOTSTRAP BKI_ROWTYPE_OID(83,RelationRelation_Rowtype_Id)
BKI_SCHEMA_MACRO
 {
     NameData    relname;        /* class name */
     Oid            relnamespace;    /* OID of namespace containing this class */
diff --git a/src/include/catalog/pg_collation.h b/src/include/catalog/pg_collation.h
index 0c6d47f..9c643cf 100644
--- a/src/include/catalog/pg_collation.h
+++ b/src/include/catalog/pg_collation.h
@@ -27,9 +27,7 @@
  *        typedef struct FormData_pg_collation
  * ----------------
  */
-#define CollationRelationId  3456
-
-CATALOG(pg_collation,3456)
+CATALOG(pg_collation,3456,CollationRelationId)
 {
     NameData    collname;        /* collation name */
     Oid            collnamespace;    /* OID of namespace containing collation */
diff --git a/src/include/catalog/pg_constraint.h b/src/include/catalog/pg_constraint.h
index 2024c45..e1ef9cc 100644
--- a/src/include/catalog/pg_constraint.h
+++ b/src/include/catalog/pg_constraint.h
@@ -26,9 +26,7 @@
  *        typedef struct FormData_pg_constraint
  * ----------------
  */
-#define ConstraintRelationId  2606
-
-CATALOG(pg_constraint,2606)
+CATALOG(pg_constraint,2606,ConstraintRelationId)
 {
     /*
      * conname + connamespace is deliberately not unique; we allow, for
diff --git a/src/include/catalog/pg_conversion.h b/src/include/catalog/pg_conversion.h
index eacc09a..2a38d71 100644
--- a/src/include/catalog/pg_conversion.h
+++ b/src/include/catalog/pg_conversion.h
@@ -35,9 +35,7 @@
  *    condefault            true if this is a default conversion
  * ----------------------------------------------------------------
  */
-#define ConversionRelationId  2607
-
-CATALOG(pg_conversion,2607)
+CATALOG(pg_conversion,2607,ConversionRelationId)
 {
     NameData    conname;
     Oid            connamespace;
diff --git a/src/include/catalog/pg_database.h b/src/include/catalog/pg_database.h
index 9435f24..7f03d24 100644
--- a/src/include/catalog/pg_database.h
+++ b/src/include/catalog/pg_database.h
@@ -26,10 +26,7 @@
  *        typedef struct FormData_pg_database
  * ----------------
  */
-#define DatabaseRelationId    1262
-#define DatabaseRelation_Rowtype_Id  1248
-
-CATALOG(pg_database,1262) BKI_SHARED_RELATION BKI_ROWTYPE_OID(1248) BKI_SCHEMA_MACRO
+CATALOG(pg_database,1262,DatabaseRelationId) BKI_SHARED_RELATION BKI_ROWTYPE_OID(1248,DatabaseRelation_Rowtype_Id)
BKI_SCHEMA_MACRO
 {
     NameData    datname;        /* database name */
     Oid            datdba;            /* owner of database */
diff --git a/src/include/catalog/pg_db_role_setting.h b/src/include/catalog/pg_db_role_setting.h
index 9793c69..cccb28a 100644
--- a/src/include/catalog/pg_db_role_setting.h
+++ b/src/include/catalog/pg_db_role_setting.h
@@ -29,9 +29,7 @@
  *        typedef struct FormData_pg_db_role_setting
  * ----------------
  */
-#define DbRoleSettingRelationId 2964
-
-CATALOG(pg_db_role_setting,2964) BKI_SHARED_RELATION BKI_WITHOUT_OIDS
+CATALOG(pg_db_role_setting,2964,DbRoleSettingRelationId) BKI_SHARED_RELATION BKI_WITHOUT_OIDS
 {
     Oid            setdatabase;    /* database */
     Oid            setrole;        /* role */
diff --git a/src/include/catalog/pg_default_acl.h b/src/include/catalog/pg_default_acl.h
index 868ac0c..ac81df1 100644
--- a/src/include/catalog/pg_default_acl.h
+++ b/src/include/catalog/pg_default_acl.h
@@ -26,9 +26,7 @@
  *        typedef struct FormData_pg_default_acl
  * ----------------
  */
-#define DefaultAclRelationId    826
-
-CATALOG(pg_default_acl,826)
+CATALOG(pg_default_acl,826,DefaultAclRelationId)
 {
     Oid            defaclrole;        /* OID of role owning this ACL */
     Oid            defaclnamespace;    /* OID of namespace, or 0 for all */
diff --git a/src/include/catalog/pg_depend.h b/src/include/catalog/pg_depend.h
index 030f655..bf31c1a 100644
--- a/src/include/catalog/pg_depend.h
+++ b/src/include/catalog/pg_depend.h
@@ -38,9 +38,7 @@
  *        typedef struct FormData_pg_depend
  * ----------------
  */
-#define DependRelationId  2608
-
-CATALOG(pg_depend,2608) BKI_WITHOUT_OIDS
+CATALOG(pg_depend,2608,DependRelationId) BKI_WITHOUT_OIDS
 {
     /*
      * Identification of the dependent (referencing) object.
diff --git a/src/include/catalog/pg_description.h b/src/include/catalog/pg_description.h
index d3c8644..b95b188 100644
--- a/src/include/catalog/pg_description.h
+++ b/src/include/catalog/pg_description.h
@@ -45,9 +45,7 @@
  *        typedef struct FormData_pg_description
  * ----------------
  */
-#define DescriptionRelationId  2609
-
-CATALOG(pg_description,2609) BKI_WITHOUT_OIDS
+CATALOG(pg_description,2609,DescriptionRelationId) BKI_WITHOUT_OIDS
 {
     Oid            objoid;            /* OID of object itself */
     Oid            classoid;        /* OID of table containing object */
diff --git a/src/include/catalog/pg_enum.h b/src/include/catalog/pg_enum.h
index edea5e3..a0922be 100644
--- a/src/include/catalog/pg_enum.h
+++ b/src/include/catalog/pg_enum.h
@@ -26,9 +26,7 @@
  *        typedef struct FormData_pg_enum
  * ----------------
  */
-#define EnumRelationId    3501
-
-CATALOG(pg_enum,3501)
+CATALOG(pg_enum,3501,EnumRelationId)
 {
     Oid            enumtypid;        /* OID of owning enum type */
     float4        enumsortorder;    /* sort position of this enum value */
diff --git a/src/include/catalog/pg_event_trigger.h b/src/include/catalog/pg_event_trigger.h
index 3ca0a88..f06cbe0 100644
--- a/src/include/catalog/pg_event_trigger.h
+++ b/src/include/catalog/pg_event_trigger.h
@@ -26,9 +26,7 @@
  *        typedef struct FormData_pg_event_trigger
  * ----------------
  */
-#define EventTriggerRelationId    3466
-
-CATALOG(pg_event_trigger,3466)
+CATALOG(pg_event_trigger,3466,EventTriggerRelationId)
 {
     NameData    evtname;        /* trigger's name */
     NameData    evtevent;        /* trigger's event */
diff --git a/src/include/catalog/pg_extension.h b/src/include/catalog/pg_extension.h
index a60bd44..10bbb69 100644
--- a/src/include/catalog/pg_extension.h
+++ b/src/include/catalog/pg_extension.h
@@ -26,9 +26,7 @@
  *        typedef struct FormData_pg_extension
  * ----------------
  */
-#define ExtensionRelationId 3079
-
-CATALOG(pg_extension,3079)
+CATALOG(pg_extension,3079,ExtensionRelationId)
 {
     NameData    extname;        /* extension name */
     Oid            extowner;        /* extension owner */
diff --git a/src/include/catalog/pg_foreign_data_wrapper.h b/src/include/catalog/pg_foreign_data_wrapper.h
index ae9b0be..67e3319 100644
--- a/src/include/catalog/pg_foreign_data_wrapper.h
+++ b/src/include/catalog/pg_foreign_data_wrapper.h
@@ -26,9 +26,7 @@
  *        typedef struct FormData_pg_foreign_data_wrapper
  * ----------------
  */
-#define ForeignDataWrapperRelationId    2328
-
-CATALOG(pg_foreign_data_wrapper,2328)
+CATALOG(pg_foreign_data_wrapper,2328,ForeignDataWrapperRelationId)
 {
     NameData    fdwname;        /* foreign-data wrapper name */
     Oid            fdwowner;        /* FDW owner */
diff --git a/src/include/catalog/pg_foreign_server.h b/src/include/catalog/pg_foreign_server.h
index 34fc827..0d25839 100644
--- a/src/include/catalog/pg_foreign_server.h
+++ b/src/include/catalog/pg_foreign_server.h
@@ -25,9 +25,7 @@
  *        typedef struct FormData_pg_foreign_server
  * ----------------
  */
-#define ForeignServerRelationId 1417
-
-CATALOG(pg_foreign_server,1417)
+CATALOG(pg_foreign_server,1417,ForeignServerRelationId)
 {
     NameData    srvname;        /* foreign server name */
     Oid            srvowner;        /* server owner */
diff --git a/src/include/catalog/pg_foreign_table.h b/src/include/catalog/pg_foreign_table.h
index 1a1fefc..13de918 100644
--- a/src/include/catalog/pg_foreign_table.h
+++ b/src/include/catalog/pg_foreign_table.h
@@ -25,9 +25,7 @@
  *        typedef struct FormData_pg_foreign_table
  * ----------------
  */
-#define ForeignTableRelationId 3118
-
-CATALOG(pg_foreign_table,3118) BKI_WITHOUT_OIDS
+CATALOG(pg_foreign_table,3118,ForeignTableRelationId) BKI_WITHOUT_OIDS
 {
     Oid            ftrelid;        /* OID of foreign table */
     Oid            ftserver;        /* OID of foreign server */
diff --git a/src/include/catalog/pg_index.h b/src/include/catalog/pg_index.h
index ecae0db..b70ad73 100644
--- a/src/include/catalog/pg_index.h
+++ b/src/include/catalog/pg_index.h
@@ -26,9 +26,7 @@
  *        typedef struct FormData_pg_index.
  * ----------------
  */
-#define IndexRelationId  2610
-
-CATALOG(pg_index,2610) BKI_WITHOUT_OIDS BKI_SCHEMA_MACRO
+CATALOG(pg_index,2610,IndexRelationId) BKI_WITHOUT_OIDS BKI_SCHEMA_MACRO
 {
     Oid            indexrelid;        /* OID of the index */
     Oid            indrelid;        /* OID of the relation it indexes */
diff --git a/src/include/catalog/pg_inherits.h b/src/include/catalog/pg_inherits.h
index 478a587..3b2e03c 100644
--- a/src/include/catalog/pg_inherits.h
+++ b/src/include/catalog/pg_inherits.h
@@ -26,9 +26,7 @@
  *        typedef struct FormData_pg_inherits
  * ----------------
  */
-#define InheritsRelationId    2611
-
-CATALOG(pg_inherits,2611) BKI_WITHOUT_OIDS
+CATALOG(pg_inherits,2611,InheritsRelationId) BKI_WITHOUT_OIDS
 {
     Oid            inhrelid;
     Oid            inhparent;
diff --git a/src/include/catalog/pg_init_privs.h b/src/include/catalog/pg_init_privs.h
index 7dcb70c..6ce2646 100644
--- a/src/include/catalog/pg_init_privs.h
+++ b/src/include/catalog/pg_init_privs.h
@@ -43,9 +43,7 @@
  *        typedef struct FormData_pg_init_privs
  * ----------------
  */
-#define InitPrivsRelationId  3394
-
-CATALOG(pg_init_privs,3394) BKI_WITHOUT_OIDS
+CATALOG(pg_init_privs,3394,InitPrivsRelationId) BKI_WITHOUT_OIDS
 {
     Oid            objoid;            /* OID of object itself */
     Oid            classoid;        /* OID of table containing object */
diff --git a/src/include/catalog/pg_language.h b/src/include/catalog/pg_language.h
index d2d878c..e2d8d15 100644
--- a/src/include/catalog/pg_language.h
+++ b/src/include/catalog/pg_language.h
@@ -26,9 +26,7 @@
  *        typedef struct FormData_pg_language
  * ----------------
  */
-#define LanguageRelationId    2612
-
-CATALOG(pg_language,2612)
+CATALOG(pg_language,2612,LanguageRelationId)
 {
     NameData    lanname;        /* Language name */
     Oid            lanowner;        /* Language's owner */
diff --git a/src/include/catalog/pg_largeobject.h b/src/include/catalog/pg_largeobject.h
index 2157bab..07adca0 100644
--- a/src/include/catalog/pg_largeobject.h
+++ b/src/include/catalog/pg_largeobject.h
@@ -24,9 +24,7 @@
  *        typedef struct FormData_pg_largeobject
  * ----------------
  */
-#define LargeObjectRelationId  2613
-
-CATALOG(pg_largeobject,2613) BKI_WITHOUT_OIDS
+CATALOG(pg_largeobject,2613,LargeObjectRelationId) BKI_WITHOUT_OIDS
 {
     Oid            loid;            /* Identifier of large object */
     int32        pageno;            /* Page number (starting from 0) */
diff --git a/src/include/catalog/pg_largeobject_metadata.h b/src/include/catalog/pg_largeobject_metadata.h
index 3d5e0cd..a8732bc 100644
--- a/src/include/catalog/pg_largeobject_metadata.h
+++ b/src/include/catalog/pg_largeobject_metadata.h
@@ -26,9 +26,7 @@
  *        typedef struct FormData_pg_largeobject_metadata
  * ----------------
  */
-#define LargeObjectMetadataRelationId  2995
-
-CATALOG(pg_largeobject_metadata,2995)
+CATALOG(pg_largeobject_metadata,2995,LargeObjectMetadataRelationId)
 {
     Oid            lomowner;        /* OID of the largeobject owner */

diff --git a/src/include/catalog/pg_namespace.h b/src/include/catalog/pg_namespace.h
index 5f80e86..0d9cada 100644
--- a/src/include/catalog/pg_namespace.h
+++ b/src/include/catalog/pg_namespace.h
@@ -31,9 +31,7 @@
  *    nspacl                access privilege list
  * ----------------------------------------------------------------
  */
-#define NamespaceRelationId  2615
-
-CATALOG(pg_namespace,2615)
+CATALOG(pg_namespace,2615,NamespaceRelationId)
 {
     NameData    nspname;
     Oid            nspowner;
diff --git a/src/include/catalog/pg_opclass.h b/src/include/catalog/pg_opclass.h
index 1b20d0d..16c3875 100644
--- a/src/include/catalog/pg_opclass.h
+++ b/src/include/catalog/pg_opclass.h
@@ -46,9 +46,7 @@
  *        typedef struct FormData_pg_opclass
  * ----------------
  */
-#define OperatorClassRelationId  2616
-
-CATALOG(pg_opclass,2616)
+CATALOG(pg_opclass,2616,OperatorClassRelationId)
 {
     /* index access method opclass is for */
     Oid            opcmethod BKI_LOOKUP(pg_am);
diff --git a/src/include/catalog/pg_operator.h b/src/include/catalog/pg_operator.h
index 9c829d0..4950d28 100644
--- a/src/include/catalog/pg_operator.h
+++ b/src/include/catalog/pg_operator.h
@@ -26,9 +26,7 @@
  *        typedef struct FormData_pg_operator
  * ----------------
  */
-#define OperatorRelationId    2617
-
-CATALOG(pg_operator,2617)
+CATALOG(pg_operator,2617,OperatorRelationId)
 {
     /* name of operator */
     NameData    oprname;
diff --git a/src/include/catalog/pg_opfamily.h b/src/include/catalog/pg_opfamily.h
index 598e546..4bedc9a 100644
--- a/src/include/catalog/pg_opfamily.h
+++ b/src/include/catalog/pg_opfamily.h
@@ -26,9 +26,7 @@
  *        typedef struct FormData_pg_opfamily
  * ----------------
  */
-#define OperatorFamilyRelationId  2753
-
-CATALOG(pg_opfamily,2753)
+CATALOG(pg_opfamily,2753,OperatorFamilyRelationId)
 {
     /* index access method opfamily is for */
     Oid            opfmethod BKI_LOOKUP(pg_am);
diff --git a/src/include/catalog/pg_partitioned_table.h b/src/include/catalog/pg_partitioned_table.h
index 39ee67e..676532a 100644
--- a/src/include/catalog/pg_partitioned_table.h
+++ b/src/include/catalog/pg_partitioned_table.h
@@ -25,9 +25,7 @@
  *        typedef struct FormData_pg_partitioned_table
  * ----------------
  */
-#define PartitionedRelationId 3350
-
-CATALOG(pg_partitioned_table,3350) BKI_WITHOUT_OIDS
+CATALOG(pg_partitioned_table,3350,PartitionedRelationId) BKI_WITHOUT_OIDS
 {
     Oid            partrelid;        /* partitioned table oid */
     char        partstrat;        /* partitioning strategy */
diff --git a/src/include/catalog/pg_pltemplate.h b/src/include/catalog/pg_pltemplate.h
index 116a4a0..d84c86b 100644
--- a/src/include/catalog/pg_pltemplate.h
+++ b/src/include/catalog/pg_pltemplate.h
@@ -26,9 +26,7 @@
  *        typedef struct FormData_pg_pltemplate
  * ----------------
  */
-#define PLTemplateRelationId    1136
-
-CATALOG(pg_pltemplate,1136) BKI_SHARED_RELATION BKI_WITHOUT_OIDS
+CATALOG(pg_pltemplate,1136,PLTemplateRelationId) BKI_SHARED_RELATION BKI_WITHOUT_OIDS
 {
     NameData    tmplname;        /* name of PL */
     bool        tmpltrusted;    /* PL is trusted? */
diff --git a/src/include/catalog/pg_policy.h b/src/include/catalog/pg_policy.h
index 543077c..7ad0cde 100644
--- a/src/include/catalog/pg_policy.h
+++ b/src/include/catalog/pg_policy.h
@@ -20,9 +20,7 @@
  *        typedef struct FormData_pg_policy
  * ----------------
  */
-#define PolicyRelationId    3256
-
-CATALOG(pg_policy,3256)
+CATALOG(pg_policy,3256,PolicyRelationId)
 {
     NameData    polname;        /* Policy name. */
     Oid            polrelid;        /* Oid of the relation with policy. */
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index b9d9cfd..fd0b909 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -25,10 +25,7 @@
  *        typedef struct FormData_pg_proc
  * ----------------
  */
-#define ProcedureRelationId  1255
-#define ProcedureRelation_Rowtype_Id  81
-
-CATALOG(pg_proc,1255) BKI_BOOTSTRAP BKI_ROWTYPE_OID(81) BKI_SCHEMA_MACRO
+CATALOG(pg_proc,1255,ProcedureRelationId) BKI_BOOTSTRAP BKI_ROWTYPE_OID(81,ProcedureRelation_Rowtype_Id)
BKI_SCHEMA_MACRO
 {
     /* procedure name */
     NameData    proname;
diff --git a/src/include/catalog/pg_publication.h b/src/include/catalog/pg_publication.h
index 92cdcf1..9a6a64d 100644
--- a/src/include/catalog/pg_publication.h
+++ b/src/include/catalog/pg_publication.h
@@ -26,9 +26,7 @@
  *        typedef struct FormData_pg_publication
  * ----------------
  */
-#define PublicationRelationId            6104
-
-CATALOG(pg_publication,6104)
+CATALOG(pg_publication,6104,PublicationRelationId)
 {
     NameData    pubname;        /* name of the publication */

diff --git a/src/include/catalog/pg_publication_rel.h b/src/include/catalog/pg_publication_rel.h
index 864d6ca..2208e42 100644
--- a/src/include/catalog/pg_publication_rel.h
+++ b/src/include/catalog/pg_publication_rel.h
@@ -25,9 +25,7 @@
  *        typedef struct FormData_pg_publication_rel
  * ----------------
  */
-#define PublicationRelRelationId                6106
-
-CATALOG(pg_publication_rel,6106)
+CATALOG(pg_publication_rel,6106,PublicationRelRelationId)
 {
     Oid            prpubid;        /* Oid of the publication */
     Oid            prrelid;        /* Oid of the relation */
diff --git a/src/include/catalog/pg_range.h b/src/include/catalog/pg_range.h
index d507e4e..3762b3e 100644
--- a/src/include/catalog/pg_range.h
+++ b/src/include/catalog/pg_range.h
@@ -26,9 +26,7 @@
  *        typedef struct FormData_pg_range
  * ----------------
  */
-#define RangeRelationId 3541
-
-CATALOG(pg_range,3541) BKI_WITHOUT_OIDS
+CATALOG(pg_range,3541,RangeRelationId) BKI_WITHOUT_OIDS
 {
     /* OID of owning range type */
     Oid            rngtypid BKI_LOOKUP(pg_type);
diff --git a/src/include/catalog/pg_replication_origin.h b/src/include/catalog/pg_replication_origin.h
index 02856dd..1adc3f7 100644
--- a/src/include/catalog/pg_replication_origin.h
+++ b/src/include/catalog/pg_replication_origin.h
@@ -26,9 +26,7 @@
  *        typedef struct FormData_pg_replication_origin
  * ----------------
  */
-#define ReplicationOriginRelationId 6000
-
-CATALOG(pg_replication_origin,6000) BKI_SHARED_RELATION BKI_WITHOUT_OIDS
+CATALOG(pg_replication_origin,6000,ReplicationOriginRelationId) BKI_SHARED_RELATION BKI_WITHOUT_OIDS
 {
     /*
      * Locally known id that get included into WAL.
diff --git a/src/include/catalog/pg_rewrite.h b/src/include/catalog/pg_rewrite.h
index d656990..7712586 100644
--- a/src/include/catalog/pg_rewrite.h
+++ b/src/include/catalog/pg_rewrite.h
@@ -29,9 +29,7 @@
  *        typedef struct FormData_pg_rewrite
  * ----------------
  */
-#define RewriteRelationId  2618
-
-CATALOG(pg_rewrite,2618)
+CATALOG(pg_rewrite,2618,RewriteRelationId)
 {
     NameData    rulename;
     Oid            ev_class;
diff --git a/src/include/catalog/pg_seclabel.h b/src/include/catalog/pg_seclabel.h
index d6d2f97..48d4548 100644
--- a/src/include/catalog/pg_seclabel.h
+++ b/src/include/catalog/pg_seclabel.h
@@ -19,9 +19,7 @@
  *        typedef struct FormData_pg_seclabel
  * ----------------
  */
-#define SecLabelRelationId        3596
-
-CATALOG(pg_seclabel,3596) BKI_WITHOUT_OIDS
+CATALOG(pg_seclabel,3596,SecLabelRelationId) BKI_WITHOUT_OIDS
 {
     Oid            objoid;            /* OID of the object itself */
     Oid            classoid;        /* OID of table containing the object */
diff --git a/src/include/catalog/pg_sequence.h b/src/include/catalog/pg_sequence.h
index de6ed1a..a13b05e 100644
--- a/src/include/catalog/pg_sequence.h
+++ b/src/include/catalog/pg_sequence.h
@@ -14,9 +14,7 @@
 #include "catalog/genbki.h"
 #include "catalog/pg_sequence_d.h"

-#define SequenceRelationId    2224
-
-CATALOG(pg_sequence,2224) BKI_WITHOUT_OIDS
+CATALOG(pg_sequence,2224,SequenceRelationId) BKI_WITHOUT_OIDS
 {
     Oid            seqrelid;
     Oid            seqtypid;
diff --git a/src/include/catalog/pg_shdepend.h b/src/include/catalog/pg_shdepend.h
index 708980b..9f4dcb9 100644
--- a/src/include/catalog/pg_shdepend.h
+++ b/src/include/catalog/pg_shdepend.h
@@ -35,9 +35,7 @@
  *        typedef struct FormData_pg_shdepend
  * ----------------
  */
-#define SharedDependRelationId    1214
-
-CATALOG(pg_shdepend,1214) BKI_SHARED_RELATION BKI_WITHOUT_OIDS
+CATALOG(pg_shdepend,1214,SharedDependRelationId) BKI_SHARED_RELATION BKI_WITHOUT_OIDS
 {
     /*
      * Identification of the dependent (referencing) object.
diff --git a/src/include/catalog/pg_shdescription.h b/src/include/catalog/pg_shdescription.h
index 1777144..00fd0e0 100644
--- a/src/include/catalog/pg_shdescription.h
+++ b/src/include/catalog/pg_shdescription.h
@@ -38,9 +38,7 @@
  *        typedef struct FormData_pg_shdescription
  * ----------------
  */
-#define SharedDescriptionRelationId  2396
-
-CATALOG(pg_shdescription,2396) BKI_SHARED_RELATION BKI_WITHOUT_OIDS
+CATALOG(pg_shdescription,2396,SharedDescriptionRelationId) BKI_SHARED_RELATION BKI_WITHOUT_OIDS
 {
     Oid            objoid;            /* OID of object itself */
     Oid            classoid;        /* OID of table containing object */
diff --git a/src/include/catalog/pg_shseclabel.h b/src/include/catalog/pg_shseclabel.h
index 9fceeee..22ecf98 100644
--- a/src/include/catalog/pg_shseclabel.h
+++ b/src/include/catalog/pg_shseclabel.h
@@ -19,10 +19,7 @@
  *        typedef struct FormData_pg_shseclabel
  * ----------------
  */
-#define SharedSecLabelRelationId            3592
-#define SharedSecLabelRelation_Rowtype_Id    4066
-
-CATALOG(pg_shseclabel,3592) BKI_SHARED_RELATION BKI_ROWTYPE_OID(4066) BKI_WITHOUT_OIDS BKI_SCHEMA_MACRO
+CATALOG(pg_shseclabel,3592,SharedSecLabelRelationId) BKI_SHARED_RELATION
BKI_ROWTYPE_OID(4066,SharedSecLabelRelation_Rowtype_Id)BKI_WITHOUT_OIDS BKI_SCHEMA_MACRO 
 {
     Oid            objoid;            /* OID of the shared object itself */
     Oid            classoid;        /* OID of table containing the shared object */
diff --git a/src/include/catalog/pg_statistic.h b/src/include/catalog/pg_statistic.h
index ea4d1be..c0ab74b 100644
--- a/src/include/catalog/pg_statistic.h
+++ b/src/include/catalog/pg_statistic.h
@@ -26,9 +26,7 @@
  *        typedef struct FormData_pg_statistic
  * ----------------
  */
-#define StatisticRelationId  2619
-
-CATALOG(pg_statistic,2619) BKI_WITHOUT_OIDS
+CATALOG(pg_statistic,2619,StatisticRelationId) BKI_WITHOUT_OIDS
 {
     /* These fields form the unique key for the entry: */
     Oid            starelid;        /* relation containing attribute */
diff --git a/src/include/catalog/pg_statistic_ext.h b/src/include/catalog/pg_statistic_ext.h
index 3c6be71..5d57b81 100644
--- a/src/include/catalog/pg_statistic_ext.h
+++ b/src/include/catalog/pg_statistic_ext.h
@@ -26,9 +26,7 @@
  *        typedef struct FormData_pg_statistic_ext
  * ----------------
  */
-#define StatisticExtRelationId    3381
-
-CATALOG(pg_statistic_ext,3381)
+CATALOG(pg_statistic_ext,3381,StatisticExtRelationId)
 {
     Oid            stxrelid;        /* relation containing attributes */

diff --git a/src/include/catalog/pg_subscription.h b/src/include/catalog/pg_subscription.h
index 1b2981f..93b249f 100644
--- a/src/include/catalog/pg_subscription.h
+++ b/src/include/catalog/pg_subscription.h
@@ -20,8 +20,6 @@
  *        typedef struct FormData_pg_subscription
  * ----------------
  */
-#define SubscriptionRelationId            6100
-#define SubscriptionRelation_Rowtype_Id 6101

 /*
  * Technically, the subscriptions live inside the database, so a shared catalog
@@ -31,7 +29,7 @@
  *
  * NOTE:  When adding a column, also update system_views.sql.
  */
-CATALOG(pg_subscription,6100) BKI_SHARED_RELATION BKI_ROWTYPE_OID(6101) BKI_SCHEMA_MACRO
+CATALOG(pg_subscription,6100,SubscriptionRelationId) BKI_SHARED_RELATION
BKI_ROWTYPE_OID(6101,SubscriptionRelation_Rowtype_Id)BKI_SCHEMA_MACRO 
 {
     Oid            subdbid;        /* Database the subscription is in. */
     NameData    subname;        /* Name of the subscription */
diff --git a/src/include/catalog/pg_subscription_rel.h b/src/include/catalog/pg_subscription_rel.h
index 64aa121..d82b262 100644
--- a/src/include/catalog/pg_subscription_rel.h
+++ b/src/include/catalog/pg_subscription_rel.h
@@ -22,9 +22,7 @@
  *        typedef struct FormData_pg_subscription_rel
  * ----------------
  */
-#define SubscriptionRelRelationId            6102
-
-CATALOG(pg_subscription_rel,6102) BKI_WITHOUT_OIDS
+CATALOG(pg_subscription_rel,6102,SubscriptionRelRelationId) BKI_WITHOUT_OIDS
 {
     Oid            srsubid;        /* Oid of subscription */
     Oid            srrelid;        /* Oid of relation */
diff --git a/src/include/catalog/pg_tablespace.h b/src/include/catalog/pg_tablespace.h
index bd9c118..4782e78 100644
--- a/src/include/catalog/pg_tablespace.h
+++ b/src/include/catalog/pg_tablespace.h
@@ -26,9 +26,7 @@
  *        typedef struct FormData_pg_tablespace
  * ----------------
  */
-#define TableSpaceRelationId  1213
-
-CATALOG(pg_tablespace,1213) BKI_SHARED_RELATION
+CATALOG(pg_tablespace,1213,TableSpaceRelationId) BKI_SHARED_RELATION
 {
     NameData    spcname;        /* tablespace name */
     Oid            spcowner;        /* owner of tablespace */
diff --git a/src/include/catalog/pg_transform.h b/src/include/catalog/pg_transform.h
index c571fb5..6059b89 100644
--- a/src/include/catalog/pg_transform.h
+++ b/src/include/catalog/pg_transform.h
@@ -26,9 +26,7 @@
  *        typedef struct FormData_pg_transform
  * ----------------
  */
-#define TransformRelationId 3576
-
-CATALOG(pg_transform,3576)
+CATALOG(pg_transform,3576,TransformRelationId)
 {
     Oid            trftype;
     Oid            trflang;
diff --git a/src/include/catalog/pg_trigger.h b/src/include/catalog/pg_trigger.h
index eabd301..7d60861 100644
--- a/src/include/catalog/pg_trigger.h
+++ b/src/include/catalog/pg_trigger.h
@@ -31,9 +31,7 @@
  * to be associated with a deferrable constraint.
  * ----------------
  */
-#define TriggerRelationId  2620
-
-CATALOG(pg_trigger,2620)
+CATALOG(pg_trigger,2620,TriggerRelationId)
 {
     Oid            tgrelid;        /* relation trigger is attached to */
     NameData    tgname;            /* trigger's name */
diff --git a/src/include/catalog/pg_ts_config.h b/src/include/catalog/pg_ts_config.h
index d0b7aa9..d344bb7 100644
--- a/src/include/catalog/pg_ts_config.h
+++ b/src/include/catalog/pg_ts_config.h
@@ -26,9 +26,7 @@
  *        typedef struct FormData_pg_ts_config
  * ----------------
  */
-#define TSConfigRelationId    3602
-
-CATALOG(pg_ts_config,3602)
+CATALOG(pg_ts_config,3602,TSConfigRelationId)
 {
     NameData    cfgname;        /* name of configuration */
     Oid            cfgnamespace;    /* name space */
diff --git a/src/include/catalog/pg_ts_config_map.h b/src/include/catalog/pg_ts_config_map.h
index cdee4b4..2120021 100644
--- a/src/include/catalog/pg_ts_config_map.h
+++ b/src/include/catalog/pg_ts_config_map.h
@@ -26,9 +26,7 @@
  *        typedef struct FormData_pg_ts_config_map
  * ----------------
  */
-#define TSConfigMapRelationId    3603
-
-CATALOG(pg_ts_config_map,3603) BKI_WITHOUT_OIDS
+CATALOG(pg_ts_config_map,3603,TSConfigMapRelationId) BKI_WITHOUT_OIDS
 {
     Oid            mapcfg;            /* OID of configuration owning this entry */
     int32        maptokentype;    /* token type from parser */
diff --git a/src/include/catalog/pg_ts_dict.h b/src/include/catalog/pg_ts_dict.h
index 58af179..1e285ad 100644
--- a/src/include/catalog/pg_ts_dict.h
+++ b/src/include/catalog/pg_ts_dict.h
@@ -26,9 +26,7 @@
  *        typedef struct FormData_pg_ts_dict
  * ----------------
  */
-#define TSDictionaryRelationId    3600
-
-CATALOG(pg_ts_dict,3600)
+CATALOG(pg_ts_dict,3600,TSDictionaryRelationId)
 {
     NameData    dictname;        /* dictionary name */
     Oid            dictnamespace;    /* name space */
diff --git a/src/include/catalog/pg_ts_parser.h b/src/include/catalog/pg_ts_parser.h
index 51b70ae..ccaf40b 100644
--- a/src/include/catalog/pg_ts_parser.h
+++ b/src/include/catalog/pg_ts_parser.h
@@ -26,9 +26,7 @@
  *        typedef struct FormData_pg_ts_parser
  * ----------------
  */
-#define TSParserRelationId    3601
-
-CATALOG(pg_ts_parser,3601)
+CATALOG(pg_ts_parser,3601,TSParserRelationId)
 {
     /* parser's name */
     NameData    prsname;
diff --git a/src/include/catalog/pg_ts_template.h b/src/include/catalog/pg_ts_template.h
index cfb97925..5e66e02 100644
--- a/src/include/catalog/pg_ts_template.h
+++ b/src/include/catalog/pg_ts_template.h
@@ -26,9 +26,7 @@
  *        typedef struct FormData_pg_ts_template
  * ----------------
  */
-#define TSTemplateRelationId    3764
-
-CATALOG(pg_ts_template,3764)
+CATALOG(pg_ts_template,3764,TSTemplateRelationId)
 {
     /* template name */
     NameData    tmplname;
diff --git a/src/include/catalog/pg_type.h b/src/include/catalog/pg_type.h
index 8992fcd..4ddc09a 100644
--- a/src/include/catalog/pg_type.h
+++ b/src/include/catalog/pg_type.h
@@ -31,10 +31,7 @@
  *        See struct FormData_pg_attribute for details.
  * ----------------
  */
-#define TypeRelationId    1247
-#define TypeRelation_Rowtype_Id  71
-
-CATALOG(pg_type,1247) BKI_BOOTSTRAP BKI_ROWTYPE_OID(71) BKI_SCHEMA_MACRO
+CATALOG(pg_type,1247,TypeRelationId) BKI_BOOTSTRAP BKI_ROWTYPE_OID(71,TypeRelation_Rowtype_Id) BKI_SCHEMA_MACRO
 {
     /* type name */
     NameData    typname;
diff --git a/src/include/catalog/pg_user_mapping.h b/src/include/catalog/pg_user_mapping.h
index ec62ee2..6efbed0 100644
--- a/src/include/catalog/pg_user_mapping.h
+++ b/src/include/catalog/pg_user_mapping.h
@@ -25,9 +25,7 @@
  *        typedef struct FormData_pg_user_mapping
  * ----------------
  */
-#define UserMappingRelationId    1418
-
-CATALOG(pg_user_mapping,1418)
+CATALOG(pg_user_mapping,1418,UserMappingRelationId)
 {
     Oid            umuser;            /* Id of the user, InvalidOid if PUBLIC is
                                  * wanted */
diff --git a/src/include/catalog/unused_oids b/src/include/catalog/unused_oids
index e8be34c..f71222d 100755
--- a/src/include/catalog/unused_oids
+++ b/src/include/catalog/unused_oids
@@ -30,7 +30,7 @@ export FIRSTOBJECTID
 cat pg_*.h pg_*.dat toasting.h indexing.h |
 egrep -v -e '^CATALOG\(.*BKI_BOOTSTRAP' | \
 sed -n    -e 's/.*\boid *=> *'\''\([0-9][0-9]*\)'\''.*$/\1/p' \
-    -e 's/^CATALOG([^,]*, *\([0-9][0-9]*\).*BKI_ROWTYPE_OID(\([0-9][0-9]*\)).*$/\1,\2/p' \
+    -e 's/^CATALOG([^,]*, *\([0-9][0-9]*\).*BKI_ROWTYPE_OID(\([0-9][0-9]*\),.*$/\1,\2/p' \
     -e 's/^CATALOG([^,]*, *\([0-9][0-9]*\).*$/\1/p' \
     -e 's/^DECLARE_INDEX([^,]*, *\([0-9][0-9]*\).*$/\1/p' \
     -e 's/^DECLARE_UNIQUE_INDEX([^,]*, *\([0-9][0-9]*\).*$/\1/p' \

Re: WIP: a way forward on bootstrap data

От
Tom Lane
Дата:
BTW, I experimented with adding blank lines between the hash items in the
.dat files, and that seemed to make a nice improvement in readability,
converting masses of rather gray text into visibly distinct stanzas.
I'm not dead set on that, but try it and see what you think.

A small example from pg_aggregate.dat:

{ aggfnoid => 'avg(int8)', aggtransfn => 'int8_avg_accum',
  aggfinalfn => 'numeric_poly_avg', aggcombinefn => 'int8_avg_combine',
  aggserialfn => 'int8_avg_serialize', aggdeserialfn => 'int8_avg_deserialize',
  aggmtransfn => 'int8_avg_accum', aggminvtransfn => 'int8_avg_accum_inv',
  aggmfinalfn => 'numeric_poly_avg', aggtranstype => 'internal',
  aggtransspace => '48', aggmtranstype => 'internal', aggmtransspace => '48' },
{ aggfnoid => 'avg(int4)', aggtransfn => 'int4_avg_accum',
  aggfinalfn => 'int8_avg', aggcombinefn => 'int4_avg_combine',
  aggmtransfn => 'int4_avg_accum', aggminvtransfn => 'int4_avg_accum_inv',
  aggmfinalfn => 'int8_avg', aggtranstype => '_int8', aggmtranstype => '_int8',
  agginitval => '{0,0}', aggminitval => '{0,0}' },
{ aggfnoid => 'avg(int2)', aggtransfn => 'int2_avg_accum',
  aggfinalfn => 'int8_avg', aggcombinefn => 'int4_avg_combine',
  aggmtransfn => 'int2_avg_accum', aggminvtransfn => 'int2_avg_accum_inv',
  aggmfinalfn => 'int8_avg', aggtranstype => '_int8', aggmtranstype => '_int8',
  agginitval => '{0,0}', aggminitval => '{0,0}' },
{ aggfnoid => 'avg(numeric)', aggtransfn => 'numeric_avg_accum',
  aggfinalfn => 'numeric_avg', aggcombinefn => 'numeric_avg_combine',
  aggserialfn => 'numeric_avg_serialize',
  aggdeserialfn => 'numeric_avg_deserialize',
  aggmtransfn => 'numeric_avg_accum', aggminvtransfn => 'numeric_accum_inv',
  aggmfinalfn => 'numeric_avg', aggtranstype => 'internal',
  aggtransspace => '128', aggmtranstype => 'internal',
  aggmtransspace => '128' },
{ aggfnoid => 'avg(float4)', aggtransfn => 'float4_accum',
  aggfinalfn => 'float8_avg', aggcombinefn => 'float8_combine',
  aggtranstype => '_float8', agginitval => '{0,0,0}' },

versus

{ aggfnoid => 'avg(int8)', aggtransfn => 'int8_avg_accum',
  aggfinalfn => 'numeric_poly_avg', aggcombinefn => 'int8_avg_combine',
  aggserialfn => 'int8_avg_serialize', aggdeserialfn => 'int8_avg_deserialize',
  aggmtransfn => 'int8_avg_accum', aggminvtransfn => 'int8_avg_accum_inv',
  aggmfinalfn => 'numeric_poly_avg', aggtranstype => 'internal',
  aggtransspace => '48', aggmtranstype => 'internal', aggmtransspace => '48' },

{ aggfnoid => 'avg(int4)', aggtransfn => 'int4_avg_accum',
  aggfinalfn => 'int8_avg', aggcombinefn => 'int4_avg_combine',
  aggmtransfn => 'int4_avg_accum', aggminvtransfn => 'int4_avg_accum_inv',
  aggmfinalfn => 'int8_avg', aggtranstype => '_int8', aggmtranstype => '_int8',
  agginitval => '{0,0}', aggminitval => '{0,0}' },

{ aggfnoid => 'avg(int2)', aggtransfn => 'int2_avg_accum',
  aggfinalfn => 'int8_avg', aggcombinefn => 'int4_avg_combine',
  aggmtransfn => 'int2_avg_accum', aggminvtransfn => 'int2_avg_accum_inv',
  aggmfinalfn => 'int8_avg', aggtranstype => '_int8', aggmtranstype => '_int8',
  agginitval => '{0,0}', aggminitval => '{0,0}' },

{ aggfnoid => 'avg(numeric)', aggtransfn => 'numeric_avg_accum',
  aggfinalfn => 'numeric_avg', aggcombinefn => 'numeric_avg_combine',
  aggserialfn => 'numeric_avg_serialize',
  aggdeserialfn => 'numeric_avg_deserialize',
  aggmtransfn => 'numeric_avg_accum', aggminvtransfn => 'numeric_accum_inv',
  aggmfinalfn => 'numeric_avg', aggtranstype => 'internal',
  aggtransspace => '128', aggmtranstype => 'internal',
  aggmtransspace => '128' },

{ aggfnoid => 'avg(float4)', aggtransfn => 'float4_accum',
  aggfinalfn => 'float8_avg', aggcombinefn => 'float8_combine',
  aggtranstype => '_float8', agginitval => '{0,0,0}' },

            regards, tom lane


Re: WIP: a way forward on bootstrap data

От
John Naylor
Дата:
On 4/6/18, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I experimented with converting all frontend code to include just the
> catalog/pg_foo_d.h files instead of catalog/pg_foo.h, as per the
> proposed new policy.  I soon found that we'd overlooked one thing:
> some clients expect to see the relation OID macros, eg
> LargeObjectRelationId.  Attached is a patch that changes things around
> so that those appear in the _d files instead of the master files.
> This is cleaner anyway because it removes duplication of the OIDs in
> the master files, with attendant risk of error.  For example we
> have this change in pg_aggregate.h:
>
> -#define AggregateRelationId  2600
> -
> -CATALOG(pg_aggregate,2600) BKI_WITHOUT_OIDS
> +CATALOG(pg_aggregate,2600,AggregateRelationId) BKI_WITHOUT_OIDS
>
> Some of the CATALOG lines spill well past 80 characters with this,
> although many of the affected ones already were overlength, eg
>
> -#define DatabaseRelationId    1262
> -#define DatabaseRelation_Rowtype_Id  1248
> -
> -CATALOG(pg_database,1262) BKI_SHARED_RELATION BKI_ROWTYPE_OID(1248)
> BKI_SCHEMA_MACRO
> +CATALOG(pg_database,1262,DatabaseRelationId) BKI_SHARED_RELATION
> BKI_ROWTYPE_OID(1248,DatabaseRelation_Rowtype_Id) BKI_SCHEMA_MACRO

It seems most of the time the FooRelationId labels are predictable,
although not as pristine as the Anum_* constants. One possibility that
came to mind is to treat these like pg_type OID #defines -- have a
simple rule that can be overridden for historical reasons. In this
case the pg_database change would simply be:

-#define DatabaseRelationId    1262
-#define DatabaseRelation_Rowtype_Id  1248
-

and genbki.pl would know what to do. But for pg_am:

-#define AccessMethodRelationId    2601
-
-CATALOG(pg_am,2601)
+CATALOG(pg_am,2601) BKI_REL_LABEL(AccessMethod)

I haven't thought this through yet. I imagine it will add as well as
remove a bit of complexity, code-wise. The upside is most CATALOG
lines will remain unchanged, and those that do won't end up quite as
long. I can try a draft tomorrow to see how it looks, unless you see
an obvious downside.

-John Naylor


Re: WIP: a way forward on bootstrap data

От
John Naylor
Дата:
On 4/5/18, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I've generalized the BKI_LOOKUP(pg_proc) code so that
> you can use either regproc-like or regprocedure-like notation, and then
> applied that to relevant columns.
> [...]
> bootstrap-v13-delta.patch is a diff atop your patch series for the
> in-tree files, and convert_oid2name.patch adjusts that script to
> make use of the additional conversion capability.

Looking at convert_oid2name.patch again, I see this:

+             elsif ($catname eq 'pg_am')
+             {
+                 $values{aggfnoid}     = lookup_procname($values{aggfnoid});
+             }

aggfnoid is in pg_aggregate, and pg_am already had a regproc lookup.
Do you remember the intent here?

-John Naylor


Re: WIP: a way forward on bootstrap data

От
Tom Lane
Дата:
John Naylor <jcnaylor@gmail.com> writes:
> It seems most of the time the FooRelationId labels are predictable,
> although not as pristine as the Anum_* constants. One possibility that
> came to mind is to treat these like pg_type OID #defines -- have a
> simple rule that can be overridden for historical reasons.

Meh.  I'd just as soon avoid having some catalogs done one way and
some another.

            regards, tom lane


Re: WIP: a way forward on bootstrap data

От
Tom Lane
Дата:
John Naylor <jcnaylor@gmail.com> writes:
> Looking at convert_oid2name.patch again, I see this:

> +             elsif ($catname eq 'pg_am')
> +             {
> +                 $values{aggfnoid}     = lookup_procname($values{aggfnoid});
> +             }

> aggfnoid is in pg_aggregate, and pg_am already had a regproc lookup.
> Do you remember the intent here?

Ugh, copy-and-pasteo.  I intended to have it lookup pg_am.amhandler, but
must have missed changing the field name after copying code from the
pg_aggregate stanza.  Seems to have been unnecessary anyway, since all the
entries in the column are already symbolic.

            regards, tom lane


Re: WIP: a way forward on bootstrap data

От
Tom Lane
Дата:
Just had another thought about this business: if practical, we should
remove the distinction between "descr" and "shdescr" and just use the
former name in .dat files.  genbki.pl knows which catalogs are shared,
so it ought to be able to figure out where to route the descriptions.

            regards, tom lane


Re: WIP: a way forward on bootstrap data

От
John Naylor
Дата:
On 4/6/18, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Just had another thought about this business: if practical, we should
> remove the distinction between "descr" and "shdescr" and just use the
> former name in .dat files.  genbki.pl knows which catalogs are shared,
> so it ought to be able to figure out where to route the descriptions.

Fairly trivial (attached), and shouldn't be too hard to integrate into
the series.

-John Naylor

Вложения

Re: WIP: a way forward on bootstrap data

От
John Naylor
Дата:
On 4/6/18, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> BTW, I experimented with adding blank lines between the hash items in the
> .dat files, and that seemed to make a nice improvement in readability,
> converting masses of rather gray text into visibly distinct stanzas.
> I'm not dead set on that, but try it and see what you think.

Narrow entries with natural whitespace might be okay as is. The
pg_aggregate example is better with blank lines, but another thing to
consider is that a comment that hugs a block is clear on which entries
it's referring to (pg_amop):

# btree integer_ops

# default operators int2
{ amopfamily => 'btree/integer_ops', amoplefttype => 'int2',
  amoprighttype => 'int2', amopstrategy => '1', amopopr => '<(int2,int2)',
  amopmethod => 'btree' },
{ amopfamily => 'btree/integer_ops', amoplefttype => 'int2',
  amoprighttype => 'int2', amopstrategy => '2', amopopr => '<=(int2,int2)',
  amopmethod => 'btree' },
{ amopfamily => 'btree/integer_ops', amoplefttype => 'int2',
  amoprighttype => 'int2', amopstrategy => '3', amopopr => '=(int2,int2)',
  amopmethod => 'btree' },
{ amopfamily => 'btree/integer_ops', amoplefttype => 'int2',
  amoprighttype => 'int2', amopstrategy => '4', amopopr => '>=(int2,int2)',
  amopmethod => 'btree' },
{ amopfamily => 'btree/integer_ops', amoplefttype => 'int2',
  amoprighttype => 'int2', amopstrategy => '5', amopopr => '>(int2,int2)',
  amopmethod => 'btree' },

# crosstype operators int24
{ amopfamily => 'btree/integer_ops', amoplefttype => 'int2',
...
[more blocks of integer ops
...]
  amopmethod => 'btree' },

# btree oid_ops


With a blank line beween every entry, the comments would "float" more,
and the scope is not as clear. I'm okay with whatever the community
thinks, but at this point I'm inclined to leave things as they are and
focus on the other points of review for the next patchset.

While on the subject of viewing, I do have a badly outdated patch that
would create a postgres.sql file which would load into a development
schema so one could query the bootstrap data in a database without
running initdb. I could update it at a future point.

-John Naylor


Re: WIP: a way forward on bootstrap data

От
John Naylor
Дата:
On 4/5/18, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Something that bothered me a bit while writing the warning-producing code
> is that showing %bki_values isn't actually that great a way of identifying
> the trouble spot.  By this point we've expanded out defaults and possibly
> replaced some other macros, so it doesn't look that much like what was
> in the .dat file.  I think what would be ideal, both here and in some
> other places like AddDefaultValues, is to be able to finger the location
> of the bad tuple by filename and line number, but I have no idea whether
> it's practical to annotate the tuples with that while reading the .dat
> files.  Any thoughts?

We could use the $. variable to save the line number, which is what
the old code had. AddDefaultValues will report the line number on
failure, so I left out explicit line number reporting. If memory
serves, Perl is sensitive to how you format the "die" message. If I
delete a default value from the header, I get this, reporting line 16:

Failed to form full tuple for pg_opfamily
Missing values for: opfnamespace
Showing other values for context:
oid => 421, opfmethod => 403, opfowner => PGUID, opfname =>
abstime_ops,  at ../../../src/backend/catalog/Catalog.pm line 259,
<$ifd> line 16.
Makefile:23: recipe for target 'reformat-dat-files' failed
make: *** [reformat-dat-files] Error 25

I think the context is good for pg_attribute, because there is no file
to read from.
I'll think about the lookup code.

-John Naylor


Re: WIP: a way forward on bootstrap data

От
John Naylor
Дата:
For version 14, diffed against f1464c53804:

-Use majority values for proisstrict, provolatile, proparallel (patch 0006)
-Use valid C string for multi-char defaults containing a backslash (patch 0006)
-Apply Tom's patch for additional lookups, slightly modified by me
(convert_oid2name.pl, patch 0007)
-Apply Tom's patch for relation/rowtype OID macros (patch 0005)

On 4/6/18, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Just had another thought about this business: if practical, we should
> remove the distinction between "descr" and "shdescr" and just use the
> former name in .dat files.  genbki.pl knows which catalogs are shared,
> so it ought to be able to figure out where to route the descriptions.

Done (convert_header2dat.pl, patches 0001, 0003, 0009)

On 4/5/18, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I think what would be ideal, both here and in some
> other places like AddDefaultValues, is to be able to finger the location
> of the bad tuple by filename and line number, but I have no idea whether
> it's practical to annotate the tuples with that while reading the .dat
> files.  Any thoughts?

Done (patch 0007). So far only lookup_oids() uses it.

-John Naylor

Вложения

Re: WIP: a way forward on bootstrap data

От
Tom Lane
Дата:
John Naylor <jcnaylor@gmail.com> writes:
> On 4/6/18, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> BTW, I experimented with adding blank lines between the hash items in the
>> .dat files, and that seemed to make a nice improvement in readability,
>> converting masses of rather gray text into visibly distinct stanzas.
>> I'm not dead set on that, but try it and see what you think.

> Narrow entries with natural whitespace might be okay as is. The
> pg_aggregate example is better with blank lines,

Yeah, it's somewhat of a worst-case example, because (more or less by
chance) most of the entries have last lines that are mostly full.
In other places it often happens that the last line of an entry is
much shorter than average, and that provides enough of a visual break,
as in your example from pg_amop.

> but another thing to
> consider is that a comment that hugs a block is clear on which entries
> it's referring to (pg_amop):

True, we'd need to rethink vertical spacing for comments.  I don't offhand
see why we couldn't keep comments that apply to a specific entry directly
adjacent to that entry, though.

Anyway, as I said, I'm not set on this change.  If you're unexcited by
the idea then let's drop it.

            regards, tom lane


Re: WIP: a way forward on bootstrap data

От
Tom Lane
Дата:
I wrote:
> John Naylor <jcnaylor@gmail.com> writes:
>> On 4/6/18, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> BTW, I experimented with adding blank lines between the hash items in the
>>> .dat files, and that seemed to make a nice improvement in readability,

> Anyway, as I said, I'm not set on this change.  If you're unexcited by
> the idea then let's drop it.

On third thought, there's actually a positive reason not to do that.
The more vertical whitespace we have, the less traction there is for
context diffs to find the right place to change, so that we'd be
increasing the risk of patch misapplication.  That was one of the
worries that we set out to avoid with this whole design.

So nevermind ... I'll get on with looking at v14.

            regards, tom lane


Re: WIP: a way forward on bootstrap data

От
Tom Lane
Дата:
Attached is a round of minor review fixes.  Much of this is cleaning
up existing mistakes or out-of-date comments, rather than anything
introduced by this patchset, but I noticed it while going through the
patch.

The additional EXPOSE_TO_CLIENT_CODE bits actually are necessary,
in some cases, as I had compile failures without them after changing
client include lines.  (I'll post that separately.)

I added a couple more BKI_DEFAULT markers here, but omitted the ensuing
.dat file updates, because they're just mechanical.

I don't think there's any great need to incorporate this into your patch
set.  As far as I'm concerned, v14 is ready as-is, and I'll just apply
this over the top of it.  (Note that I'll probably smash the whole thing
to one commit when the time comes.)

I have some other work pending on the documentation aspect, but that's
not quite ready for public consumption yet.

            regards, tom lane

diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index 8729ccd..1626999 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -3566,7 +3566,7 @@ Oid PQftype(const PGresult *res,
        You can query the system table <literal>pg_type</literal> to
        obtain the names and properties of the various data types. The
        <acronym>OID</acronym>s of the built-in data types are defined
-       in the file <filename>src/include/catalog/pg_type.h</filename>
+       in the file <filename>src/include/catalog/pg_type_d.h</filename>
        in the source tree.
       </para>
      </listitem>
diff --git a/src/backend/catalog/Makefile b/src/backend/catalog/Makefile
index 17213d4..d25d98a 100644
--- a/src/backend/catalog/Makefile
+++ b/src/backend/catalog/Makefile
@@ -25,8 +25,10 @@ BKIFILES = postgres.bki postgres.description postgres.shdescription

 include $(top_srcdir)/src/backend/common.mk

-# Note: there are some undocumented dependencies on the ordering in which
-# the catalog header files are assembled into postgres.bki.
+# Note: the order of this list determines the order in which the catalog
+# header files are assembled into postgres.bki.  BKI_BOOTSTRAP catalogs
+# must appear first, and there are reputedly other, undocumented ordering
+# dependencies.
 CATALOG_HEADERS := \
     pg_proc.h pg_type.h pg_attribute.h pg_class.h \
     pg_attrdef.h pg_constraint.h pg_inherits.h pg_index.h pg_operator.h \
@@ -49,7 +51,8 @@ CATALOG_HEADERS := \
 GENERATED_HEADERS := $(CATALOG_HEADERS:%.h=%_d.h) schemapg.h

 # In the list of headers used to assemble postgres.bki, indexing.h needs
-# be last, and toasting.h just before it.
+# be last, and toasting.h just before it.  This ensures we don't try to
+# create indexes or toast tables before their catalogs exist.
 POSTGRES_BKI_SRCS := $(addprefix $(top_srcdir)/src/include/catalog/,\
     $(CATALOG_HEADERS) toasting.h indexing.h \
     )
diff --git a/src/include/catalog/Makefile b/src/include/catalog/Makefile
index d84a572..1da3ea7 100644
--- a/src/include/catalog/Makefile
+++ b/src/include/catalog/Makefile
@@ -2,9 +2,6 @@
 #
 # Makefile for src/include/catalog
 #
-# 'make reformat-dat-files' is a convenience target for rewriting the
-# catalog data files in a standard format.
-#
 # Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
 # Portions Copyright (c) 1994, Regents of the University of California
 #
@@ -19,10 +16,16 @@ include $(top_builddir)/src/Makefile.global
 # location of Catalog.pm
 catalogdir = $(top_srcdir)/src/backend/catalog

+# 'make reformat-dat-files' is a convenience target for rewriting the
+# catalog data files in our standard format.  This includes collapsing
+# out any entries that are redundant with a BKI_DEFAULT annotation.
 reformat-dat-files:
     $(PERL) -I $(catalogdir) reformat_dat_file.pl pg_*.dat

+# 'make expand-dat-files' is a convenience target for expanding out all
+# default values in the catalog data files.  This should be run before
+# altering or removing any BKI_DEFAULT annotation.
 expand-dat-files:
     $(PERL) -I $(catalogdir) reformat_dat_file.pl pg_*.dat --full-tuples

-.PHONY: reformat-dat-files
+.PHONY: reformat-dat-files expand-dat-files
diff --git a/src/include/catalog/duplicate_oids b/src/include/catalog/duplicate_oids
index 0e6285f..8c143cf 100755
--- a/src/include/catalog/duplicate_oids
+++ b/src/include/catalog/duplicate_oids
@@ -30,7 +30,7 @@ foreach my $oid (sort { $a <=> $b } keys %oidcounts)
 {
     next unless $oidcounts{$oid} > 1;
     $found = 1;
-    print "***Duplicate OID: $oid\n";
+    print "$oid\n";
 }

 exit $found;
diff --git a/src/include/catalog/genbki.h b/src/include/catalog/genbki.h
index 02c38c4..b1e2cbd 100644
--- a/src/include/catalog/genbki.h
+++ b/src/include/catalog/genbki.h
@@ -34,7 +34,7 @@
 #define BKI_FORCE_NOT_NULL
 /* Specifies a default value for a catalog field */
 #define BKI_DEFAULT(value)
-/* Indicates how to perform name lookups for OID fields */
+/* Indicates how to perform name lookups for an OID or OID-array field */
 #define BKI_LOOKUP(catalog)

 /* The following are never defined; they are here only for documentation. */
@@ -48,11 +48,12 @@
 #undef CATALOG_VARLEN

 /*
- * There is code in the catalog headers that needs to be visible to clients,
- * but we don't want them to include the full header because of safety issues
- * with other code in the header. This symbol instructs genbki.pl to copy this
- * section when generating the corresponding definition header, where it can
- * be included by both client and backend code.
+ * There is code in some catalog headers that needs to be visible to clients,
+ * but we don't want clients to include the full header because of safety
+ * issues with other code in the header.  To handle that, surround code that
+ * should be visible to clients with "#ifdef EXPOSE_TO_CLIENT_CODE".  That
+ * instructs genbki.pl to copy the section when generating the corresponding
+ * "_d" header, which can be included by both client and backend code.
  */
 #undef EXPOSE_TO_CLIENT_CODE

diff --git a/src/include/catalog/pg_am.h b/src/include/catalog/pg_am.h
index 5aa2bac..9620bcd 100644
--- a/src/include/catalog/pg_am.h
+++ b/src/include/catalog/pg_am.h
@@ -47,9 +47,8 @@ typedef FormData_pg_am *Form_pg_am;

 #ifdef EXPOSE_TO_CLIENT_CODE

-/* ----------------
- *        compiler constant for amtype
- * ----------------
+/*
+ * Allowed values for amtype
  */
 #define AMTYPE_INDEX                    'i' /* index access method */

diff --git a/src/include/catalog/pg_authid.h b/src/include/catalog/pg_authid.h
index c410b99..863ef65 100644
--- a/src/include/catalog/pg_authid.h
+++ b/src/include/catalog/pg_authid.h
@@ -23,17 +23,6 @@
 #include "catalog/genbki.h"
 #include "catalog/pg_authid_d.h"

-/*
- * The CATALOG definition has to refer to the type of rolvaliduntil as
- * "timestamptz" (lower case) so that bootstrap mode recognizes it.  But
- * the C header files define this type as TimestampTz.  Since the field is
- * potentially-null and therefore can't be accessed directly from C code,
- * there is no particular need for the C struct definition to show the
- * field type as TimestampTz --- instead we just make it int.
- */
-#define timestamptz int
-
-
 /* ----------------
  *        pg_authid definition.  cpp turns this into
  *        typedef struct FormData_pg_authid
@@ -58,8 +47,6 @@ CATALOG(pg_authid,1260,AuthIdRelationId) BKI_SHARED_RELATION BKI_ROWTYPE_OID(284
 #endif
 } FormData_pg_authid;

-#undef timestamptz
-
 /* ----------------
  *        Form_pg_authid corresponds to a pointer to a tuple with
  *        the format of pg_authid relation.
diff --git a/src/include/catalog/pg_cast.h b/src/include/catalog/pg_cast.h
index e58eb8d..a9e7e2b 100644
--- a/src/include/catalog/pg_cast.h
+++ b/src/include/catalog/pg_cast.h
@@ -53,6 +53,8 @@ CATALOG(pg_cast,2605,CastRelationId)
  */
 typedef FormData_pg_cast *Form_pg_cast;

+#ifdef EXPOSE_TO_CLIENT_CODE
+
 /*
  * The allowable values for pg_cast.castcontext are specified by this enum.
  * Since castcontext is stored as a "char", we use ASCII codes for human
@@ -81,4 +83,6 @@ typedef enum CoercionMethod
     COERCION_METHOD_INOUT = 'i' /* use input/output functions */
 } CoercionMethod;

+#endif /* EXPOSE_TO_CLIENT_CODE */
+
 #endif                            /* PG_CAST_H */
diff --git a/src/include/catalog/pg_class.dat b/src/include/catalog/pg_class.dat
index ff972c0..e1450e3 100644
--- a/src/include/catalog/pg_class.dat
+++ b/src/include/catalog/pg_class.dat
@@ -12,9 +12,10 @@

 [

-# Note: only "bootstrapped" relations need to be declared here.  Be sure that
-# the OIDs listed here match those given in their CATALOG macros, and that
-# the relnatts values are correct.
+# Note: only "bootstrapped" relations, ie those marked BKI_BOOTSTRAP, need to
+# have entries here.  Be sure that the OIDs listed here match those given in
+# their CATALOG and BKI_ROWTYPE_OID macros, and that the relnatts values are
+# correct.

 # Note: "3" in the relfrozenxid column stands for FirstNormalTransactionId;
 # similarly, "1" in relminmxid stands for FirstMultiXactId
diff --git a/src/include/catalog/pg_db_role_setting.h b/src/include/catalog/pg_db_role_setting.h
index 0faa81f..896c410 100644
--- a/src/include/catalog/pg_db_role_setting.h
+++ b/src/include/catalog/pg_db_role_setting.h
@@ -1,7 +1,7 @@
 /*-------------------------------------------------------------------------
  *
  * pg_db_role_setting.h
- *    definition of configuration settings
+ *      definition of per-database/per-user configuration settings relation
  *
  *
  * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
@@ -18,6 +18,7 @@
 #ifndef PG_DB_ROLE_SETTING_H
 #define PG_DB_ROLE_SETTING_H

+#include "catalog/genbki.h"
 #include "catalog/pg_db_role_setting_d.h"
 #include "utils/guc.h"
 #include "utils/relcache.h"
diff --git a/src/include/catalog/pg_index.h b/src/include/catalog/pg_index.h
index 65c1110..b70ad73 100644
--- a/src/include/catalog/pg_index.h
+++ b/src/include/catalog/pg_index.h
@@ -64,6 +64,8 @@ CATALOG(pg_index,2610,IndexRelationId) BKI_WITHOUT_OIDS BKI_SCHEMA_MACRO
  */
 typedef FormData_pg_index *Form_pg_index;

+#ifdef EXPOSE_TO_CLIENT_CODE
+
 /*
  * Index AMs that support ordered scans must support these two indoption
  * bits.  Otherwise, the content of the per-column indoption fields is
@@ -72,6 +74,8 @@ typedef FormData_pg_index *Form_pg_index;
 #define INDOPTION_DESC            0x0001    /* values are in reverse order */
 #define INDOPTION_NULLS_FIRST    0x0002    /* NULLs are first instead of last */

+#endif /* EXPOSE_TO_CLIENT_CODE */
+
 /*
  * Use of these macros is recommended over direct examination of the state
  * flag columns where possible; this allows source code compatibility with
diff --git a/src/include/catalog/pg_largeobject.h b/src/include/catalog/pg_largeobject.h
index 07adca0..481d2ff 100644
--- a/src/include/catalog/pg_largeobject.h
+++ b/src/include/catalog/pg_largeobject.h
@@ -10,6 +10,8 @@
  * src/include/catalog/pg_largeobject.h
  *
  * NOTES
+ *      The Catalog.pm module reads this file and derives schema
+ *      information.
  *
  *-------------------------------------------------------------------------
  */
diff --git a/src/include/catalog/pg_operator.dat b/src/include/catalog/pg_operator.dat
index f8118ff..4382738 100644
--- a/src/include/catalog/pg_operator.dat
+++ b/src/include/catalog/pg_operator.dat
@@ -3054,8 +3054,6 @@
 { oid => '3681', descr => 'OR-concatenate',
   oprname => '||', oprleft => 'tsquery', oprright => 'tsquery',
   oprresult => 'tsquery', oprcode => 'tsquery_or' },
-
-# <-> operation calls tsquery_phrase, but function is polymorphic. So, point to OID of the tsquery_phrase
 { oid => '5005', descr => 'phrase-concatenate',
   oprname => '<->', oprleft => 'tsquery', oprright => 'tsquery',
   oprresult => 'tsquery', oprcode => 'tsquery_phrase(tsquery,tsquery)' },
diff --git a/src/include/catalog/pg_policy.h b/src/include/catalog/pg_policy.h
index 7ad0cde..45fdc28 100644
--- a/src/include/catalog/pg_policy.h
+++ b/src/include/catalog/pg_policy.h
@@ -7,6 +7,12 @@
  * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
  * Portions Copyright (c) 1994, Regents of the University of California
  *
+ * src/include/catalog/pg_policy.h
+ *
+ * NOTES
+ *      The Catalog.pm module reads this file and derives schema
+ *      information.
+ *
  *-------------------------------------------------------------------------
  */
 #ifndef PG_POLICY_H
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 43e24d2..5b5a1c0 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -30,7 +30,9 @@
 # "convert srctypename to desttypename" for cast functions
 # "less-equal-greater" for B-tree comparison functions

-# Keep the following ordered by OID so that later changes can be made easier.
+# Once upon a time these entries were ordered by OID.  Lately it's often
+# been the custom to insert new entries adjacent to related older entries.
+# Try to do one or the other though, don't just insert entries at random.

 # OIDS 1 - 99

diff --git a/src/include/catalog/pg_range.h b/src/include/catalog/pg_range.h
index 3762b3e..d8e16cc 100644
--- a/src/include/catalog/pg_range.h
+++ b/src/include/catalog/pg_range.h
@@ -35,7 +35,7 @@ CATALOG(pg_range,3541,RangeRelationId) BKI_WITHOUT_OIDS
     Oid            rngsubtype BKI_LOOKUP(pg_type);

     /* collation for this range type, or 0 */
-    Oid            rngcollation;
+    Oid            rngcollation BKI_DEFAULT(0);

     /* subtype's btree opclass */
     Oid            rngsubopc BKI_LOOKUP(pg_opclass);
diff --git a/src/include/catalog/pg_shdepend.h b/src/include/catalog/pg_shdepend.h
index 9f4dcb9..0f8508c 100644
--- a/src/include/catalog/pg_shdepend.h
+++ b/src/include/catalog/pg_shdepend.h
@@ -11,7 +11,6 @@
  * for example, there's not much value in creating an explicit dependency
  * from a relation to its database.  Currently, only dependencies on roles
  * are explicitly stored in pg_shdepend.
-
  *
  * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
  * Portions Copyright (c) 1994, Regents of the University of California
diff --git a/src/include/catalog/pg_statistic.h b/src/include/catalog/pg_statistic.h
index 30d6868..c0ab74b 100644
--- a/src/include/catalog/pg_statistic.h
+++ b/src/include/catalog/pg_statistic.h
@@ -126,6 +126,8 @@ CATALOG(pg_statistic,2619,StatisticRelationId) BKI_WITHOUT_OIDS
  */
 typedef FormData_pg_statistic *Form_pg_statistic;

+#ifdef EXPOSE_TO_CLIENT_CODE
+
 /*
  * Several statistical slot "kinds" are defined by core PostgreSQL, as
  * documented below.  Also, custom data types can define their own "kind"
@@ -255,4 +257,6 @@ typedef FormData_pg_statistic *Form_pg_statistic;
  */
 #define STATISTIC_KIND_BOUNDS_HISTOGRAM  7

+#endif /* EXPOSE_TO_CLIENT_CODE */
+
 #endif                            /* PG_STATISTIC_H */
diff --git a/src/include/catalog/pg_statistic_ext.h b/src/include/catalog/pg_statistic_ext.h
index 24e4bd8..5d57b81 100644
--- a/src/include/catalog/pg_statistic_ext.h
+++ b/src/include/catalog/pg_statistic_ext.h
@@ -58,7 +58,11 @@ CATALOG(pg_statistic_ext,3381,StatisticExtRelationId)
  */
 typedef FormData_pg_statistic_ext *Form_pg_statistic_ext;

+#ifdef EXPOSE_TO_CLIENT_CODE
+
 #define STATS_EXT_NDISTINCT            'd'
 #define STATS_EXT_DEPENDENCIES        'f'

+#endif /* EXPOSE_TO_CLIENT_CODE */
+
 #endif                            /* PG_STATISTIC_EXT_H */
diff --git a/src/include/catalog/pg_subscription_rel.h b/src/include/catalog/pg_subscription_rel.h
index d82b262..033f5a1 100644
--- a/src/include/catalog/pg_subscription_rel.h
+++ b/src/include/catalog/pg_subscription_rel.h
@@ -12,9 +12,9 @@
 #ifndef PG_SUBSCRIPTION_REL_H
 #define PG_SUBSCRIPTION_REL_H

-#include "access/xlogdefs.h"
 #include "catalog/genbki.h"
 #include "catalog/pg_subscription_rel_d.h"
+#include "access/xlogdefs.h"
 #include "nodes/pg_list.h"

 /* ----------------
diff --git a/src/include/catalog/pg_trigger.h b/src/include/catalog/pg_trigger.h
index ee64bb4..7d60861 100644
--- a/src/include/catalog/pg_trigger.h
+++ b/src/include/catalog/pg_trigger.h
@@ -69,6 +69,8 @@ CATALOG(pg_trigger,2620,TriggerRelationId)
  */
 typedef FormData_pg_trigger *Form_pg_trigger;

+#ifdef EXPOSE_TO_CLIENT_CODE
+
 /* Bits within tgtype */
 #define TRIGGER_TYPE_ROW                (1 << 0)
 #define TRIGGER_TYPE_BEFORE                (1 << 1)
@@ -128,4 +130,6 @@ typedef FormData_pg_trigger *Form_pg_trigger;
 #define TRIGGER_USES_TRANSITION_TABLE(namepointer) \
     ((namepointer) != (char *) NULL)

+#endif /* EXPOSE_TO_CLIENT_CODE */
+
 #endif                            /* PG_TRIGGER_H */
diff --git a/src/include/catalog/pg_ts_parser.h b/src/include/catalog/pg_ts_parser.h
index 20b3a4b..ccaf40b 100644
--- a/src/include/catalog/pg_ts_parser.h
+++ b/src/include/catalog/pg_ts_parser.h
@@ -32,7 +32,7 @@ CATALOG(pg_ts_parser,3601,TSParserRelationId)
     NameData    prsname;

     /* name space */
-    Oid            prsnamespace;
+    Oid            prsnamespace BKI_DEFAULT(PGNSP);

     /* init parsing session */
     regproc        prsstart BKI_LOOKUP(pg_proc);
diff --git a/src/include/catalog/pg_ts_template.h b/src/include/catalog/pg_ts_template.h
index 5f5b580..5e66e02 100644
--- a/src/include/catalog/pg_ts_template.h
+++ b/src/include/catalog/pg_ts_template.h
@@ -32,7 +32,7 @@ CATALOG(pg_ts_template,3764,TSTemplateRelationId)
     NameData    tmplname;

     /* name space */
-    Oid            tmplnamespace;
+    Oid            tmplnamespace BKI_DEFAULT(PGNSP);

     /* initialization method of dict (may be 0) */
     regproc        tmplinit BKI_LOOKUP(pg_proc);
diff --git a/src/include/catalog/pg_type.dat b/src/include/catalog/pg_type.dat
index a430c79..3c2c813 100644
--- a/src/include/catalog/pg_type.dat
+++ b/src/include/catalog/pg_type.dat
@@ -12,20 +12,21 @@

 [

-# Keep the following ordered by OID so that later changes can be made more
-# easily.
-
 # For types used in the system catalogs, make sure the values here match
 # TypInfo[] in bootstrap.c.

-# The defined symbols for pg_type OIDs are generated by genbki.pl
+# OID symbol macro names for pg_type OIDs are generated by genbki.pl
 # according to the following rule, so you don't need to specify them
 # here:
 #  foo_bar  ->  FOO_BAROID
 # _foo_bar  ->  FOO_BARARRAYOID
 #
-# The only symbols in this file are ones that don't match this rule, and
-# are grandfathered in.
+# The only oid_symbol entries in this file are for names that don't match
+# this rule, and are grandfathered in.
+
+# Once upon a time these entries were ordered by OID.  Lately it's often
+# been the custom to insert new entries adjacent to related older entries.
+# Try to do one or the other though, don't just insert entries at random.

 # OIDS 1 - 99

diff --git a/src/include/catalog/pg_type.h b/src/include/catalog/pg_type.h
index 695ed4e..4bcfac9 100644
--- a/src/include/catalog/pg_type.h
+++ b/src/include/catalog/pg_type.h
@@ -92,7 +92,7 @@ CATALOG(pg_type,1247,TypeRelationId) BKI_BOOTSTRAP BKI_ROWTYPE_OID(71,TypeRelati
     /* delimiter for arrays of this type */
     char        typdelim BKI_DEFAULT("\054");

-    /* 0 if not a composite type */
+    /* associated pg_class OID if a composite type, else 0 */
     Oid            typrelid BKI_DEFAULT(0);

     /*
@@ -246,7 +246,7 @@ typedef FormData_pg_type *Form_pg_type;
 #ifdef EXPOSE_TO_CLIENT_CODE

 /*
- * macros
+ * macros for values of poor-mans-enumerated-type columns
  */
 #define  TYPTYPE_BASE        'b' /* base type (ordinary scalar type) */
 #define  TYPTYPE_COMPOSITE    'c' /* composite (e.g., table's rowtype) */
diff --git a/src/include/catalog/reformat_dat_file.pl b/src/include/catalog/reformat_dat_file.pl
index f748a45..cb7a020 100644
--- a/src/include/catalog/reformat_dat_file.pl
+++ b/src/include/catalog/reformat_dat_file.pl
@@ -13,7 +13,7 @@
 # Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
 # Portions Copyright (c) 1994, Regents of the University of California
 #
-# /src/include/catalog/reformat_dat_file.pl
+# src/include/catalog/reformat_dat_file.pl
 #
 #----------------------------------------------------------------------


Re: WIP: a way forward on bootstrap data

От
Tom Lane
Дата:
Oh, one more thing: looking again at the contents of pg_proc.dat,
I find myself annoyed at the need to specify pronargs.  That's
entirely derivable from proargtypes, and if we did so, we'd get
down to this for the first few pg_proc entries:

{ oid => '1242', descr => 'I/O',
  proname => 'boolin', prorettype => 'bool', proargtypes => 'cstring',
  prosrc => 'boolin' },
{ oid => '1243', descr => 'I/O',
  proname => 'boolout', prorettype => 'cstring', proargtypes => 'bool',
  prosrc => 'boolout' },
{ oid => '1244', descr => 'I/O',
  proname => 'byteain', prorettype => 'bytea', proargtypes => 'cstring',
  prosrc => 'byteain' },

which seems pretty close to the minimum amount of stuff you could expect
to need to write.

I recall that this and some other data compression methods were on the
table awhile back, and I expressed doubt about putting very much magic
knowledge in genbki.pl, but this case seems like a no-brainer.  genbki
can always get the right answer, and expecting people to do it just
creates another way to make a mistake.

So attached is an incremental patch to do that.  I had to adjust
AddDefaultValues's argument list to tell it the catalog name,
and once I did that, I saw that the API arrangement whereby the
caller throws error was just useless complexity, so I got rid of it.

            regards, tom lane

diff --git a/src/backend/catalog/Catalog.pm b/src/backend/catalog/Catalog.pm
index 70aea89..3b3bb6b 100644
*** a/src/backend/catalog/Catalog.pm
--- b/src/backend/catalog/Catalog.pm
*************** sub ParseData
*** 254,265 ****
                  }

                  # Expand tuples to their full representation.
!                 my $error = AddDefaultValues($hash_ref, $schema);
!                 if ($error)
!                 {
!                     print "Failed to form full tuple for $catname\n";
!                     die $error;
!                 }
              }
              else
              {
--- 254,260 ----
                  }

                  # Expand tuples to their full representation.
!                 AddDefaultValues($hash_ref, $schema, $catname);
              }
              else
              {
*************** sub ParseData
*** 289,302 ****
      return $data;
  }

! # Fill in default values of a record using the given schema. It's the
! # caller's responsibility to specify other values beforehand.
! # If we fail to fill all fields, return a nonempty error message.
  sub AddDefaultValues
  {
!     my ($row, $schema) = @_;
      my @missing_fields;
-     my $msg;

      foreach my $column (@$schema)
      {
--- 284,295 ----
      return $data;
  }

! # Fill in default values of a record using the given schema.
! # It's the caller's responsibility to specify other values beforehand.
  sub AddDefaultValues
  {
!     my ($row, $schema, $catname) = @_;
      my @missing_fields;

      foreach my $column (@$schema)
      {
*************** sub AddDefaultValues
*** 311,316 ****
--- 304,316 ----
          {
              $row->{$attname} = $column->{default};
          }
+         elsif ($catname eq 'pg_proc' && $attname eq 'pronargs' &&
+                defined($row->{proargtypes}))
+         {
+             # pg_proc.pronargs can be derived from proargtypes.
+             my @proargtypes = split /\s+/, $row->{proargtypes};
+             $row->{$attname} = scalar(@proargtypes);
+         }
          else
          {
              # Failed to find a value.
*************** sub AddDefaultValues
*** 320,333 ****

      if (@missing_fields)
      {
!         $msg = "Missing values for: " . join(', ', @missing_fields);
!         $msg .= "\nShowing other values for context:\n";
          while (my($key, $value) = each %$row)
          {
              $msg .= "$key => $value, ";
          }
      }
-     return $msg;
  }

  # Rename temporary files to final names.
--- 320,334 ----

      if (@missing_fields)
      {
!         my $msg = "Failed to form full tuple for $catname\n";
!         $msg .= "Missing values for: " . join(', ', @missing_fields);
!         $msg .= "\nOther values for row:\n";
          while (my($key, $value) = each %$row)
          {
              $msg .= "$key => $value, ";
          }
+         die $msg;
      }
  }

  # Rename temporary files to final names.
diff --git a/src/backend/catalog/genbki.pl b/src/backend/catalog/genbki.pl
index d4c2ddf..3512952 100644
*** a/src/backend/catalog/genbki.pl
--- b/src/backend/catalog/genbki.pl
*************** sub morph_row_for_pgattr
*** 641,651 ****
          $row->{attnotnull} = 'f';
      }

!     my $error = Catalog::AddDefaultValues($row, $pgattr_schema);
!     if ($error)
!     {
!         die "Failed to form full tuple for pg_attribute: ", $error;
!     }
  }

  # Write an entry to postgres.bki. Adding quotes here allows us to keep
--- 641,647 ----
          $row->{attnotnull} = 'f';
      }

!     Catalog::AddDefaultValues($row, $pgattr_schema, 'pg_attribute');
  }

  # Write an entry to postgres.bki. Adding quotes here allows us to keep
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 2c42b5c..b25546e 100644
*** a/src/include/catalog/pg_proc.h
--- b/src/include/catalog/pg_proc.h
*************** CATALOG(pg_proc,1255,ProcedureRelationId
*** 73,78 ****
--- 73,79 ----
      char        proparallel BKI_DEFAULT(s);

      /* number of arguments */
+     /* Note: need not be given in pg_proc.dat; genbki.pl will compute it */
      int16        pronargs;

      /* number of arguments with defaults */
diff --git a/src/include/catalog/reformat_dat_file.pl b/src/include/catalog/reformat_dat_file.pl
index b20d236..bbceb16 100644
*** a/src/include/catalog/reformat_dat_file.pl
--- b/src/include/catalog/reformat_dat_file.pl
*************** sub strip_default_values
*** 201,206 ****
--- 201,213 ----
          {
              delete $row->{$attname};
          }
+
+         # Also delete pg_proc.pronargs, since that can be recomputed.
+         if ($catname eq 'pg_proc' && $attname eq 'pronargs' &&
+             defined($row->{proargtypes}))
+         {
+             delete $row->{$attname};
+         }
      }
  }


Re: WIP: a way forward on bootstrap data

От
John Naylor
Дата:
On 4/6/18, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> I don't think there's any great need to incorporate this into your patch
> set.  As far as I'm concerned, v14 is ready as-is, and I'll just apply
> this over the top of it.  (Note that I'll probably smash the whole thing
> to one commit when the time comes.)

Glad to hear it. A couple recent commits added #define symbols to
headers, which broke the patchset, so I've attached v15, diff'd
against 4f813c7203e. Commit 9fdb675fc added a symbol to pg_opfamily.h
where there were none before, so I went ahead and wrapped it with an
EXPOSE_TO_CLIENT_CODE macro.

All your additional patches apply still apply over it. Your SGML patch
can only apply if my doc patch is not applied, but I've included it
anyway for the sake of no surprises.

I'll check back in 24 hours to see if everything still applies.

-John Naylor

Вложения

Re: WIP: a way forward on bootstrap data

От
Alvaro Herrera
Дата:
John Naylor wrote:
> On 4/6/18, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> 
> > I don't think there's any great need to incorporate this into your patch
> > set.  As far as I'm concerned, v14 is ready as-is, and I'll just apply
> > this over the top of it.  (Note that I'll probably smash the whole thing
> > to one commit when the time comes.)
> 
> Glad to hear it. A couple recent commits added #define symbols to
> headers, which broke the patchset, so I've attached v15, diff'd
> against 4f813c7203e. Commit 9fdb675fc added a symbol to pg_opfamily.h
> where there were none before, so I went ahead and wrapped it with an
> EXPOSE_TO_CLIENT_CODE macro.

Actually, after pushing that, I was thinking maybe it's better to remove
that #define from there and put it in each of the two .c files that need
it.  I don't think it makes sense to expose this macro any further, and
before that commit it was localized to a single file.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: WIP: a way forward on bootstrap data

От
Tom Lane
Дата:
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> John Naylor wrote:
>> Commit 9fdb675fc added a symbol to pg_opfamily.h
>> where there were none before, so I went ahead and wrapped it with an
>> EXPOSE_TO_CLIENT_CODE macro.

> Actually, after pushing that, I was thinking maybe it's better to remove
> that #define from there and put it in each of the two .c files that need
> it.  I don't think it makes sense to expose this macro any further, and
> before that commit it was localized to a single file.

We're speaking of IsBooleanOpfamily, right?  Think I'd leave it where it
is.  As soon as you have more than one place using a macro like that,
there's room for maintenance mistakes.

Now it could also be argued that indxpath.c and partprune.c don't
necessarily have the same idea of "boolean opfamily" anyway, in which case
giving them separate copies might be better.  Not sure about that.

Anyway, now that John and I have each (separately) rebased the bootstrap
patch over that, I'd appreciate it if you hold off cosmetic refactoring
till said patch goes in, which I expect to do in ~ 24 hours.

            regards, tom lane


Re: WIP: a way forward on bootstrap data

От
Alvaro Herrera
Дата:
Tom Lane wrote:
> Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> > John Naylor wrote:
> >> Commit 9fdb675fc added a symbol to pg_opfamily.h
> >> where there were none before, so I went ahead and wrapped it with an
> >> EXPOSE_TO_CLIENT_CODE macro.
> 
> > Actually, after pushing that, I was thinking maybe it's better to remove
> > that #define from there and put it in each of the two .c files that need
> > it.  I don't think it makes sense to expose this macro any further, and
> > before that commit it was localized to a single file.
> 
> We're speaking of IsBooleanOpfamily, right?

Yeah, that's the one.

> Think I'd leave it where it is.  As soon as you have more than one
> place using a macro like that, there's room for maintenance mistakes.

Yeah, that's why I thought it'd be better to have it somewhere central
(the originally submitted patch just added it to partprune.c).

> Anyway, now that John and I have each (separately) rebased the bootstrap
> patch over that, I'd appreciate it if you hold off cosmetic refactoring
> till said patch goes in, which I expect to do in ~ 24 hours.

Understood.  I'm going over David Rowley's runtime pruning patch now
(doesn't touch any catalog files), which I intend to be my last
functional commit this cycle, and won't be doing any other commits till
after bootstrap rework has landed.  (As I mentioned elsewhere, I intend
to propose some restructuring of partitioning code, without any
functional changes, during next week.)

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: WIP: a way forward on bootstrap data

От
John Naylor
Дата:
I wrote:

> I'll check back in 24 hours to see if everything still applies.

There were a couple more catalog changes that broke patch context, so
attached is version 16.

-John Naylor

Вложения

Re: WIP: a way forward on bootstrap data

От
Tom Lane
Дата:
John Naylor <jcnaylor@gmail.com> writes:
> There were a couple more catalog changes that broke patch context, so
> attached is version 16.

Pushed with a few more adjustments (mostly, another round of copy-editing
on bki.sgml).  Now we wait to see what the buildfarm thinks, particularly
about the MSVC build ...

Congratulations, and many thanks, to John for seeing this through!
I know it's been a huge amount of work, but we've needed this for years.

            regards, tom lane


Re: WIP: a way forward on bootstrap data

От
John Naylor
Дата:
On 4/9/18, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Congratulations, and many thanks, to John for seeing this through!
> I know it's been a huge amount of work, but we've needed this for years.

Many thanks for your review, advice, and additional hacking. Thanks
also to Álvaro for review and initial commits, and to all who
participated in previous discussion.

-John Naylor


Re: WIP: a way forward on bootstrap data

От
Andres Freund
Дата:

On April 8, 2018 10:19:59 AM PDT, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>John Naylor <jcnaylor@gmail.com> writes:
>> There were a couple more catalog changes that broke patch context, so
>> attached is version 16.
>
>Pushed with a few more adjustments (mostly, another round of
>copy-editing
>on bki.sgml).  Now we wait to see what the buildfarm thinks,
>particularly
>about the MSVC build ...
>
>Congratulations, and many thanks, to John for seeing this through!
>I know it's been a huge amount of work, but we've needed this for
>years.

Seconded and thirded and fourthed (?)!
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.


Re: WIP: a way forward on bootstrap data

От
John Naylor
Дата:
On 4/6/18, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I experimented with converting all frontend code to include just the
> catalog/pg_foo_d.h files instead of catalog/pg_foo.h, as per the
> proposed new policy.  I soon found that we'd overlooked one thing:
> some clients expect to see the relation OID macros, eg
> LargeObjectRelationId.  Attached is a patch that changes things around
> so that those appear in the _d files instead of the master files.
[...]
> Some of the CATALOG lines spill well past 80 characters with this,
> although many of the affected ones already were overlength, eg
>
> -#define DatabaseRelationId    1262
> -#define DatabaseRelation_Rowtype_Id  1248
> -
> -CATALOG(pg_database,1262) BKI_SHARED_RELATION BKI_ROWTYPE_OID(1248)
> BKI_SCHEMA_MACRO
> +CATALOG(pg_database,1262,DatabaseRelationId) BKI_SHARED_RELATION
> BKI_ROWTYPE_OID(1248,DatabaseRelation_Rowtype_Id) BKI_SCHEMA_MACRO

[ to which I responded with an inadequate alternative ]

Thinking about this some more, a way occurred to me to shorten the
CATALOG lines while still treating all headers the same, and with very
little code (Patch 0001). What we do is automate the use of
'RelationId' and 'Relation_Rowtype_Id' so that the CATALOG macro only
needs the part pertaining to the table name, and the BKI_ROWTYPE_OID
macro can go back to just having the OID, eg:

CATALOG(pg_database,1262,Database) BKI_SHARED_RELATION
BKI_ROWTYPE_OID(1248) BKI_SCHEMA_MACRO

This is shorter, but not quite as short as before the conversion. If
we really wanted to, we could also leave off the BKI_ prefix from the
CATALOG options (Patch 0002), eg:

CATALOG(pg_database,1262,Database) SHARED_RELATION ROWTYPE_OID(1248)
SCHEMA_MACRO

CATALOG itself lacks a prefix, and its options can only go in one
place, so we'd lose some consistency but perhaps we don't lose any
clarity. (This isn't true for the attribute-level options for
nullability etc, which can appear all over the place.)

Results below - number of CATALOG lines with more than 80 characters,
and the longest line:

grep -E '^CATALOG\(' src/include/catalog/pg_*.h  | sed 's/.*://' | awk
'{ print length, $0 }' | sort -n -r

--
before conversion: 6 lines > 80 chars
105 CATALOG(pg_auth_members,1261) BKI_SHARED_RELATION BKI_WITHOUT_OIDS
BKI_ROWTYPE_OID(2843) BKI_SCHEMA_MACRO

--
after conversion: 14 lines > 80 chars
162 CATALOG(pg_shseclabel,3592,SharedSecLabelRelationId)
BKI_SHARED_RELATION
BKI_ROWTYPE_OID(4066,SharedSecLabelRelation_Rowtype_Id)
BKI_WITHOUT_OIDS BKI_SCHEMA_MACRO

--
with Patch 0001: 11 lines > 80 chars
118 CATALOG(pg_shseclabel,3592,SharedSecLabel) BKI_SHARED_RELATION
BKI_ROWTYPE_OID(4066) BKI_WITHOUT_OIDS BKI_SCHEMA_MACRO

--
with Patch 0002: 5 lines > 80 chars
102 CATALOG(pg_shseclabel,3592,SharedSecLabel) SHARED_RELATION
ROWTYPE_OID(4066) WITHOUT_OIDS SCHEMA_MACRO


-John Naylor

Вложения

Re: WIP: a way forward on bootstrap data

От
Tom Lane
Дата:
John Naylor <jcnaylor@gmail.com> writes:
> On 4/6/18, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Some of the CATALOG lines spill well past 80 characters with this,
>> although many of the affected ones already were overlength, eg ...

> Thinking about this some more, a way occurred to me to shorten the
> CATALOG lines while still treating all headers the same, and with very
> little code (Patch 0001). What we do is automate the use of
> 'RelationId' and 'Relation_Rowtype_Id' so that the CATALOG macro only
> needs the part pertaining to the table name, and the BKI_ROWTYPE_OID
> macro can go back to just having the OID, eg:

Hm ... I don't like this too much, because it means that grepping for
those macros will no longer turn up the source of their definition.
Yeah, if you already know how Relation_Rowtype_Id macros are created,
you might not be confused, but I think it'd be problematic for
newcomers.  Essentially we'd be shortening these lines by obfuscating,
which doesn't seem like a good tradeoff.

It might be all right to drop the BKI_ prefixes as per your other
suggestion, but I'm worried about possible symbol conflicts.  It's
probably not really worth changing that by itself.

            regards, tom lane


Re: WIP: a way forward on bootstrap data

От
Mark Dilger
Дата:
There still seems to be a lot of boilerplate in the .dat files
that could be eliminated.  Tom mentioned upthread that he did
not want too much magic in genbki.pl or Catalog.pm, but I think
I can propose putting some magic in the header files themselves.

Take, for example, some of the fields in pg_type.dat.  I'll elide
the ones I'm not talking about with ...:


{... typname => 'X', ... typinput => 'Xin', typoutput => 'Xout',
 typreceive => 'Xrecv', typsend => 'Xsend', ... },


If we changed pg_type.h:

        /* text format (required) */
-       regproc         typinput BKI_LOOKUP(pg_proc);
-       regproc         typoutput BKI_LOOKUP(pg_proc);
+       regproc         typinput BKI_DEFAULT("${typname}in") BKI_LOOKUP(pg_proc);
+       regproc         typoutput BKI_DEFAULT("${typname}out") BKI_LOOKUP(pg_proc);

        /* binary format (optional) */
-       regproc         typreceive BKI_LOOKUP(pg_proc);
-       regproc         typsend BKI_LOOKUP(pg_proc);
+       regproc         typreceive BKI_DEFAULT("${typname}recv") BKI_LOOKUP(pg_proc);
+       regproc         typsend BKI_DEFAULT("${typname}send") BKI_LOOKUP(pg_proc);

we could remove the typinput, typoutput, typreceive, and typsend
fields from many of the records.  The logic for how to derive these
fields would not be hardcoded into genbki.pl or Catalog.pm, but rather
would be in pg_type.h, where it belongs.  The pattern "${typname}in",
for example, is not hardcoded in perl, but is rather specified
in pg_type.h, making it obvious for those reading pg_type.h what the
pattern will be for generating the default value.

For those types where the typreceive and/or typsend values are not defined,
owing to receive and send functionality not being implemented, the user
would need to specify something like:

{... typname => 'X', typreceive => '-', typsend => '-'},

but that seems desirable anyway, as it helps to document the lacking
functionality.  For types with associated functions named 'X_in', 'X_out',
'X_recv' and 'X_send' rather than 'Xin', 'Xout', 'Xrecv' and 'Xsend'
the user would have to specify those function names.  That's not a
regression from how things are now, as all function names are currently
required.  Perhaps after this change is applied (if it is) there will be
some pressure to standardize the naming of these functions.  I'd consider
that a good thing.

If we changed pg_proc.h:

        /* procedure source text */
-       text            prosrc BKI_FORCE_NOT_NULL;
+       text            prosrc BKI_DEFAULT("${proname}") BKI_FORCE_NOT_NULL;

we could remove the prosrc field from many of the records, which would
do a better job of calling attention to the remaining records where the
C function name differs from the SQL function name.

These two changes have been made in the patch I am submitting, with the
consequence that pg_type.dat drops from 52954 bytes to 49106 bytes, and
pg_proc.dat drops from 521302 bytes to 464554 bytes.  Since postgres.bki
is unchanged, I don't mean to suggest any runtime or initdb time performance
improvement; I only mention the size reduction to emphasize that there
is less text for human programmers to review.

There are further changes possible along these lines, where instead of
specifying s/FOO/BAR/ type substitition, we have something more like
s/FOO/BAR/e and s/FOO/BAR/ee type symantics, but before proposing
them, I'd like to see if the community likes the direction I am going
with this patch.  If so, we can debate whether, for example, the default
alignment requirements of a type can be derived from the type's size
rather than having to be specified for every row in pg_type.dat.

mark




Вложения

Re: WIP: a way forward on bootstrap data

От
Robert Haas
Дата:
On Wed, Apr 25, 2018 at 3:44 PM, Mark Dilger <hornschnorter@gmail.com> wrote:
> There still seems to be a lot of boilerplate in the .dat files
> that could be eliminated.  Tom mentioned upthread that he did
> not want too much magic in genbki.pl or Catalog.pm, but I think
> I can propose putting some magic in the header files themselves.
>
> Take, for example, some of the fields in pg_type.dat.  I'll elide
> the ones I'm not talking about with ...:
>
>
> {... typname => 'X', ... typinput => 'Xin', typoutput => 'Xout',
>  typreceive => 'Xrecv', typsend => 'Xsend', ... },

-1 for trying to automate this.  It varies between fooin and foo_in,
and it'll be annoying to have to remember which one happens
automatically and which one needs an override.

> If we changed pg_proc.h:
>
>         /* procedure source text */
> -       text            prosrc BKI_FORCE_NOT_NULL;
> +       text            prosrc BKI_DEFAULT("${proname}") BKI_FORCE_NOT_NULL;
>
> we could remove the prosrc field from many of the records, which would
> do a better job of calling attention to the remaining records where the
> C function name differs from the SQL function name.

That one I kinda like.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: WIP: a way forward on bootstrap data

От
Mark Dilger
Дата:
> On Apr 25, 2018, at 1:00 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Wed, Apr 25, 2018 at 3:44 PM, Mark Dilger <hornschnorter@gmail.com> wrote:
>> There still seems to be a lot of boilerplate in the .dat files
>> that could be eliminated.  Tom mentioned upthread that he did
>> not want too much magic in genbki.pl or Catalog.pm, but I think
>> I can propose putting some magic in the header files themselves.
>>
>> Take, for example, some of the fields in pg_type.dat.  I'll elide
>> the ones I'm not talking about with ...:
>>
>>
>> {... typname => 'X', ... typinput => 'Xin', typoutput => 'Xout',
>> typreceive => 'Xrecv', typsend => 'Xsend', ... },
>
> -1 for trying to automate this.  It varies between fooin and foo_in,
> and it'll be annoying to have to remember which one happens
> automatically and which one needs an override.

That may be a good argument.  I was on the fence about it, because I
like the idea that the project might do some cleanup and standardize
the names of all in/out/send/recv functions so that no overrides would
be required.  (Likewise, I'd like the eq,ne,lt,le,gt,ge functions to
be named in a standard way.)

Would that be an API break and hence a non-starter?

>
>> If we changed pg_proc.h:
>>
>>        /* procedure source text */
>> -       text            prosrc BKI_FORCE_NOT_NULL;
>> +       text            prosrc BKI_DEFAULT("${proname}") BKI_FORCE_NOT_NULL;
>>
>> we could remove the prosrc field from many of the records, which would
>> do a better job of calling attention to the remaining records where the
>> C function name differs from the SQL function name.
>
> That one I kinda like.

Yeah, that one is simpler, and it is the one that makes the majority of
the difference in bringing down the size of the .dat files.


mark



Re: WIP: a way forward on bootstrap data

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> On Wed, Apr 25, 2018 at 3:44 PM, Mark Dilger <hornschnorter@gmail.com> wrote:
>> There still seems to be a lot of boilerplate in the .dat files
>> that could be eliminated. ...

>> {... typname => 'X', ... typinput => 'Xin', typoutput => 'Xout',
>> typreceive => 'Xrecv', typsend => 'Xsend', ... },

> -1 for trying to automate this.  It varies between fooin and foo_in,
> and it'll be annoying to have to remember which one happens
> automatically and which one needs an override.

Yeah, that was my first reaction to that example as well.  If it
weren't so nearly fifty-fifty then we might have more traction there;
but it's pretty close, and actually the foo_in cases outnumber fooin
if I counted correctly.  (Array entries should be ignored for this
purpose; maybe we'll autogenerate them someday.)

>> +       text            prosrc BKI_DEFAULT("${proname}") BKI_FORCE_NOT_NULL;

> That one I kinda like.

Agreed, this seems more compelling.  However, I think we need more than
one compelling example to justify the additional infrastructure.  There
aren't that many places where there's obvious internal redundancy in
single catalog rows, so I'm not sure that we're going to find a lot of win
here.  (The prosrc-from-proname case, in isolation, could be handled about
as well by adding a hardwired rule like we have for pronargs.)

I don't especially like the idea of trying to compute, for instance,
typalign from typlen.  That's mostly going to encourage people to overlook
it, which isn't a good thing, because there are plenty of places where
genbki.pl couldn't be expected to get it right.

            regards, tom lane


Re: WIP: a way forward on bootstrap data

От
Tom Lane
Дата:
Mark Dilger <hornschnorter@gmail.com> writes:
>> On Apr 25, 2018, at 1:00 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> -1 for trying to automate this.  It varies between fooin and foo_in,
>> and it'll be annoying to have to remember which one happens
>> automatically and which one needs an override.

> That may be a good argument.  I was on the fence about it, because I
> like the idea that the project might do some cleanup and standardize
> the names of all in/out/send/recv functions so that no overrides would
> be required.  (Likewise, I'd like the eq,ne,lt,le,gt,ge functions to
> be named in a standard way.)

> Would that be an API break and hence a non-starter?

Yeah, I'm afraid so.  I'm pretty sure I recall cases where people
invoked I/O functions by name, and I definitely recall cases where
operator-implementation functions were invoked by name.  Those might
not be very bright things to do, but people do 'em.

We'd also be risking creating problems for individual apps by conflicting
with user-defined functions that we didn't use to conflict with.  We take
that chance every time we add functionality, of course, but usually we can
soothe complaints by pointing out that they got some new functionality in
return.  I don't think "we made I/O function names more uniform" is going
to be a very satisfactory excuse for breakage.

            regards, tom lane


Re: WIP: a way forward on bootstrap data

От
Mark Dilger
Дата:
> On Apr 25, 2018, at 1:31 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Wed, Apr 25, 2018 at 3:44 PM, Mark Dilger <hornschnorter@gmail.com> wrote:
>>> There still seems to be a lot of boilerplate in the .dat files
>>> that could be eliminated. ...
>
>>> {... typname => 'X', ... typinput => 'Xin', typoutput => 'Xout',
>>> typreceive => 'Xrecv', typsend => 'Xsend', ... },
>
>> -1 for trying to automate this.  It varies between fooin and foo_in,
>> and it'll be annoying to have to remember which one happens
>> automatically and which one needs an override.
>
> Yeah, that was my first reaction to that example as well.  If it
> weren't so nearly fifty-fifty then we might have more traction there;
> but it's pretty close, and actually the foo_in cases outnumber fooin
> if I counted correctly.  (Array entries should be ignored for this
> purpose; maybe we'll autogenerate them someday.)

Part of the problem right now is that nothing really encourages new
functions to be named foo_in vs. fooin, so the nearly 50/50 split will
continue when new code is contributed.  If the system forced you to
specify the name when you did it in a nonstandard way, and not otherwise,
that would have the effect of documenting which way is now considered
standard.

>
>>> +       text            prosrc BKI_DEFAULT("${proname}") BKI_FORCE_NOT_NULL;
>
>> That one I kinda like.
>
> Agreed, this seems more compelling.  However, I think we need more than
> one compelling example to justify the additional infrastructure.

I'll look for more.

>  There
> aren't that many places where there's obvious internal redundancy in
> single catalog rows, so I'm not sure that we're going to find a lot of win
> here.  (The prosrc-from-proname case, in isolation, could be handled about
> as well by adding a hardwired rule like we have for pronargs.)

Right, but doing that in genbki.pl or Catalog.pm is an obfuscation.  Doing
it in pg_proc.h makes is much more reasonable, I think.  Note that my
patch does not hardcode the logic in the perl code.  It teaches the perl
code how to do variable substitution, but then allows the actual rules
for each row to be written in the header file itself.

>
> I don't especially like the idea of trying to compute, for instance,
> typalign from typlen.  That's mostly going to encourage people to overlook
> it, which isn't a good thing, because there are plenty of places where
> genbki.pl couldn't be expected to get it right.

Well, it wouldn't be in genbki.pl, it would be in pg_type.h, but I take
your broader point that you don't want this calculated.



Re: WIP: a way forward on bootstrap data

От
Mark Dilger
Дата:
>>
>>>> +       text            prosrc BKI_DEFAULT("${proname}") BKI_FORCE_NOT_NULL;
>>
>>> That one I kinda like.
>>
>> Agreed, this seems more compelling.  However, I think we need more than
>> one compelling example to justify the additional infrastructure.
>
> I'll look for more.

There are two more that seem reasonable optimizations in pg_cast.h:

diff --git a/src/include/catalog/pg_cast.h b/src/include/catalog/pg_cast.h
index 7f4a25b2da..98794df7f8 100644
--- a/src/include/catalog/pg_cast.h
+++ b/src/include/catalog/pg_cast.h
@@ -37,13 +37,13 @@ CATALOG(pg_cast,2605,CastRelationId)
        Oid                     casttarget BKI_LOOKUP(pg_type);

        /* cast function; 0 = binary coercible */
-       Oid                     castfunc BKI_LOOKUP(pg_proc);
+       Oid                     castfunc BKI_DEFAULT("${casttarget}(${castsource})") BKI_LOOKUP(pg_proc);

        /* contexts in which cast can be used */
        char            castcontext;

        /* cast method */
-       char            castmethod;
+       char            castmethod BKI_DEFAULT("/${castfunc} == 0 ? 'b' : 'f'/e");
 } FormData_pg_cast;

 /* ----------------

Which would convert numerous lines like:

{ castsource => 'money', casttarget => 'numeric', castfunc => 'numeric(money)',
  castcontext => 'a', castmethod => 'f' },

To shorter lines like:

{ castsource => 'money', casttarget => 'numeric', castcontext => 'a' },

which is great, because all you really are trying to tell the postgres system is
that when you cast from numeric to money it has to be an explicit assignment and
not an implicit cast.  The extra stuff about castfunc and castmethod just gets in
the way of understanding what is being done.



There are another two in pg_opclass.h:

diff --git a/src/include/catalog/pg_opclass.h b/src/include/catalog/pg_opclass.h
index b980327fc0..9f528f97c0 100644
--- a/src/include/catalog/pg_opclass.h
+++ b/src/include/catalog/pg_opclass.h
@@ -52,7 +52,7 @@ CATALOG(pg_opclass,2616,OperatorClassRelationId)
        Oid                     opcmethod BKI_LOOKUP(pg_am);

        /* name of this opclass */
-       NameData        opcname;
+       NameData        opcname BKI_DEFAULT("${opcintype}_ops");

        /* namespace of this opclass */
        Oid                     opcnamespace BKI_DEFAULT(PGNSP);
@@ -61,7 +61,7 @@ CATALOG(pg_opclass,2616,OperatorClassRelationId)
        Oid                     opcowner BKI_DEFAULT(PGUID);

        /* containing operator family */
-       Oid                     opcfamily BKI_LOOKUP(pg_opfamily);
+       Oid                     opcfamily BKI_DEFAULT("${opcmethod}/${opcintype}_ops") BKI_LOOKUP(pg_opfamily);

        /* type of data indexed by opclass */
        Oid                     opcintype BKI_LOOKUP(pg_type);

Which would convert numerous lines like the following line from pg_opclass.dat:

{ opcmethod => 'btree', opcname => 'bytea_ops', opcfamily => 'btree/bytea_ops',
  opcintype => 'bytea' },

to much easier to read lines like:

{ opcmethod => 'btree', opcintype => 'bytea' },

which is also great, because you're really just declaring that type bytea has a
btree opclass and letting the system do the rest of the work.  Having to manually
specify opfamily and the opname just clutters the row and makes it less intuitive.


There are a bunch more of varying quality, depending on which automations
you like or don't like.


mark

Re: WIP: a way forward on bootstrap data

От
Ashutosh Bapat
Дата:
On Thu, Apr 26, 2018 at 2:11 AM, Mark Dilger <hornschnorter@gmail.com> wrote:
>
>> On Apr 25, 2018, at 1:31 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>
>> Robert Haas <robertmhaas@gmail.com> writes:
>>> On Wed, Apr 25, 2018 at 3:44 PM, Mark Dilger <hornschnorter@gmail.com> wrote:
>>>> There still seems to be a lot of boilerplate in the .dat files
>>>> that could be eliminated. ...
>>
>>>> {... typname => 'X', ... typinput => 'Xin', typoutput => 'Xout',
>>>> typreceive => 'Xrecv', typsend => 'Xsend', ... },
>>
>>> -1 for trying to automate this.  It varies between fooin and foo_in,
>>> and it'll be annoying to have to remember which one happens
>>> automatically and which one needs an override.
>>
>> Yeah, that was my first reaction to that example as well.  If it
>> weren't so nearly fifty-fifty then we might have more traction there;
>> but it's pretty close, and actually the foo_in cases outnumber fooin
>> if I counted correctly.  (Array entries should be ignored for this
>> purpose; maybe we'll autogenerate them someday.)
>
> Part of the problem right now is that nothing really encourages new
> functions to be named foo_in vs. fooin, so the nearly 50/50 split will
> continue when new code is contributed.  If the system forced you to
> specify the name when you did it in a nonstandard way, and not otherwise,
> that would have the effect of documenting which way is now considered
> standard.
>

FWIW, I would like some standard naming convention one way or the
other for in/out function names. Looking up pg_type.h for in/out
functions when you know the built-in type name isn't great. But that
itself may not be enough reason to change it.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


Re: WIP: a way forward on bootstrap data

От
David Fetter
Дата:
On Wed, Apr 25, 2018 at 04:39:42PM -0400, Tom Lane wrote:
> Mark Dilger <hornschnorter@gmail.com> writes:
> >> On Apr 25, 2018, at 1:00 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> >> -1 for trying to automate this.  It varies between fooin and foo_in,
> >> and it'll be annoying to have to remember which one happens
> >> automatically and which one needs an override.
> 
> > That may be a good argument.  I was on the fence about it, because I
> > like the idea that the project might do some cleanup and standardize
> > the names of all in/out/send/recv functions so that no overrides would
> > be required.  (Likewise, I'd like the eq,ne,lt,le,gt,ge functions to
> > be named in a standard way.)
> 
> > Would that be an API break and hence a non-starter?
> 
> Yeah, I'm afraid so.  I'm pretty sure I recall cases where people
> invoked I/O functions by name, and I definitely recall cases where
> operator-implementation functions were invoked by name.  Those might
> not be very bright things to do, but people do 'em.
> 
> We'd also be risking creating problems for individual apps by conflicting
> with user-defined functions that we didn't use to conflict with.  We take
> that chance every time we add functionality, of course, but usually we can
> soothe complaints by pointing out that they got some new functionality in
> return.  I don't think "we made I/O function names more uniform" is going
> to be a very satisfactory excuse for breakage.

What do you rate the chances that someone created a foo_in when
there's already a built-in fooin?  If it's low enough, we could add
all the foo_ins as aliases to fooins and then mandate that new ones be
called foo_in for future foos.

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


Re: WIP: a way forward on bootstrap data

От
John Naylor
Дата:
On 4/26/18, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> (The prosrc-from-proname case, in isolation, could be handled about
> as well by adding a hardwired rule like we have for pronargs.)

If we think we might go in the direction of more special-case
behavior, the attached patch more fully documents what we already do,
and does some minor refactoring to make future additions more
straightforward. I think this would even be good to do for v11.

> Robert Haas <robertmhaas@gmail.com> writes:
>> -1 for trying to automate this.  It varies between fooin and foo_in,
>> and it'll be annoying to have to remember which one happens
>> automatically and which one needs an override.
>
> Yeah, that was my first reaction to that example as well.  If it
> weren't so nearly fifty-fifty then we might have more traction there;
> but it's pretty close, and actually the foo_in cases outnumber fooin
> if I counted correctly.  (Array entries should be ignored for this
> purpose; maybe we'll autogenerate them someday.)

Hmm, that wouldn't be too hard. Add a new metadata field called
'array_type_oid', then if it finds such an OID, teach genbki.pl to
construct a tuple for the corresponding array type by consulting
something like

char    typcategory    BKI_ARRAY(A);
char    typstorage      BKI_ARRAY(x);
...etc

in the header file, plus copying typalign from the element type. I'll
whip this up sometime and add it to the next commitfest.

On 4/26/18, Mark Dilger <hornschnorter@gmail.com> wrote:
>         /* cast method */
> -       char            castmethod;
> +       char            castmethod BKI_DEFAULT("/${castfunc} == 0 ? 'b' : 'f'/e");
>  } FormData_pg_cast;

I don't have a strong opinion about your simple substitution
mechanism, but I find the above a bit harder to reason about. It also
has the added disadvantage that there is whitespace in the BKI
annotation, so it can no longer be parsed with the current setup. That
could be overcome with additional complexity, of course, but then
you're back in the position of advocating that for a single use case.

-John Naylor

Вложения

Re: WIP: a way forward on bootstrap data

От
Tom Lane
Дата:
John Naylor <jcnaylor@gmail.com> writes:
> On 4/26/18, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> (The prosrc-from-proname case, in isolation, could be handled about
>> as well by adding a hardwired rule like we have for pronargs.)

> If we think we might go in the direction of more special-case
> behavior, the attached patch more fully documents what we already do,
> and does some minor refactoring to make future additions more
> straightforward. I think this would even be good to do for v11.

Agreed, pushed.

            regards, tom lane


Re: WIP: a way forward on bootstrap data

От
Mark Dilger
Дата:
Hackers,

Have you already considered and rejected the idea of having
genbki.pl/Catalog.pm define constants that can be used in
the catalog .dat files?  I'm mostly curious if people think
the resulting .dat files are better or worse using constants
of this sort.  For example:

diff --git a/src/backend/catalog/Catalog.pm b/src/backend/catalog/Catalog.pm
index 7497d9cd9f..58ce24adf0 100644
--- a/src/backend/catalog/Catalog.pm
+++ b/src/backend/catalog/Catalog.pm
@@ -250,6 +250,21 @@ sub ParseData

            if ($lcnt == $rcnt)
            {
+               # pg_cast constants for castcontext
+               use constant IMPLICIT => 'i';
+               use constant ASSIGNMENT => 'a';
+               use constant EXPLICIT => 'e';
+
+               # pg_cast constants for castmethod
+               use constant FUNCTION => 'f';
+               use constant BINARY => 'b';
+               use constant INOUT => 'i';
+
+               # pg_proc constants for provolatile
+               use constant IMMUTABLE => 'i';
+               use constant STABLE => 's';
+               use constant VOLATILE => 'v';
+
                eval '$hash_ref = ' . $_;
                if (!ref $hash_ref)
                {
diff --git a/src/include/catalog/pg_cast.dat b/src/include/catalog/pg_cast.dat
index cf007528fd..a4ceceb652 100644
--- a/src/include/catalog/pg_cast.dat
+++ b/src/include/catalog/pg_cast.dat
@@ -19,79 +19,79 @@
 # int2->int4->int8->numeric->float4->float8, while casts in the
 # reverse direction are assignment-only.
 { castsource => 'int8', casttarget => 'int2', castfunc => 'int2(int8)',
-  castcontext => 'a', castmethod => 'f' },
+  castcontext => ASSIGNMENT, castmethod => FUNCTION },
 { castsource => 'int8', casttarget => 'int4', castfunc => 'int4(int8)',
-  castcontext => 'a', castmethod => 'f' },
+  castcontext => ASSIGNMENT, castmethod => FUNCTION },
 { castsource => 'int8', casttarget => 'float4', castfunc => 'float4(int8)',



Re: WIP: a way forward on bootstrap data

От
Tom Lane
Дата:
Mark Dilger <hornschnorter@gmail.com> writes:
> Have you already considered and rejected the idea of having
> genbki.pl/Catalog.pm define constants that can be used in 
> the catalog .dat files?  I'm mostly curious if people think
> the resulting .dat files are better or worse using constants
> of this sort.  For example:

Hm, I don't think it's been debated in exactly these terms; but I find
myself a bit skeptical of the proposal as-is.  At least going by your
examples, it'd make the .dat files a good bit more verbose, which doesn't
seem like what we want.  Also it creates a bigger impedance mismatch
between what you see in the .dat files and what you see in the actual
catalogs, inviting confusion.

Also, with the particular implementation technique you suggest here,
there'd be a single namespace for the constants across all catalogs,
which creates a lot of possibilities for conflicts and mistakes.
We could fix that by instituting some sort of unique-ifying naming
convention, say CAST_IMPLICIT not just IMPLICIT, but that makes the
verbosity problem worse.

I think if we wanted to have something along this line, my preferred
approach would be to continue to write the entries as string literals:

    castcontext => 'ASSIGNMENT', castmethod => 'FUNCTION' },

and make it the responsibility of genbki.pl to convert to the values
recognized by the backend.  That way the conversion could happen on
a per-column basis, eliminating the conflict issue.  But I'm not
really convinced that this'd be an improvement over where we are now.

            regards, tom lane


Re: WIP: a way forward on bootstrap data

От
John Naylor
Дата:
On 5/7/18, Mark Dilger <hornschnorter@gmail.com> wrote:
> Hackers,
>
> Have you already considered and rejected the idea of having
> genbki.pl/Catalog.pm define constants that can be used in
> the catalog .dat files?  I'm mostly curious if people think
> the resulting .dat files are better or worse using constants
> of this sort.  For example:
...
> +               # pg_cast constants for castcontext
> +               use constant IMPLICIT => 'i';
> +               use constant ASSIGNMENT => 'a';
> +               use constant EXPLICIT => 'e';

The comment refers to pg_cast, but these constants apply globally.
It's also not the right place from a maintainability perspective, and
if it was, now these values have different macros defined in two
places. This is not good.

> -  castcontext => 'a', castmethod => 'f' },
> +  castcontext => ASSIGNMENT, castmethod => FUNCTION },

For one, this breaks convention that the values are always
single-quoted. If you had a use case for something like this, I would
instead use the existing lookup infrastructure and teach genbki.pl to
parse the enums (or #defines as the case might be) in the relevant
header file. You'd need some improvement in readability to justify
that additional code, though. I don't think this example quite passes
(it's pretty obvious locally what the letters refer to), but others
may feel differently.

-John Naylor


Re: WIP: a way forward on bootstrap data

От
Mark Dilger
Дата:
> On May 6, 2018, at 12:08 PM, John Naylor <jcnaylor@gmail.com> wrote:
> 
> On 5/7/18, Mark Dilger <hornschnorter@gmail.com> wrote:
>> Hackers,
>> 
>> Have you already considered and rejected the idea of having
>> genbki.pl/Catalog.pm define constants that can be used in
>> the catalog .dat files?  I'm mostly curious if people think
>> the resulting .dat files are better or worse using constants
>> of this sort.  For example:
> ...
>> +               # pg_cast constants for castcontext
>> +               use constant IMPLICIT => 'i';
>> +               use constant ASSIGNMENT => 'a';
>> +               use constant EXPLICIT => 'e';
> 
> The comment refers to pg_cast, but these constants apply globally.
> It's also not the right place from a maintainability perspective, and
> if it was, now these values have different macros defined in two
> places. This is not good.
> 
>> -  castcontext => 'a', castmethod => 'f' },
>> +  castcontext => ASSIGNMENT, castmethod => FUNCTION },
> 
> For one, this breaks convention that the values are always
> single-quoted. If you had a use case for something like this, I would
> instead use the existing lookup infrastructure and teach genbki.pl to
> parse the enums (or #defines as the case might be) in the relevant
> header file. You'd need some improvement in readability to justify
> that additional code, though. I don't think this example quite passes
> (it's pretty obvious locally what the letters refer to), but others
> may feel differently.


John, Tom, thanks for the feedback.  I share your concerns about my
straw-man proposal. But....

In the catalogs, 'f' usually means 'false', not 'function'.  A person
reading pg_cast.dat could see:

  castmethod => 'f'

and think that meant a binary conversion, since castmethod is false.
That's almost exactly wrong.  Hence my desire to write

  castmethod => FUNCTION

I don't have any better proposal, though.

mark