Re: Bootstrap DATA is a pita

Поиск
Список
Период
Сортировка
От Caleb Welton
Тема Re: Bootstrap DATA is a pita
Дата
Msg-id CAOjayEdTiViVianMEsXCZ2w9a+DYi40Jch66887+v9=TuLukLg@mail.gmail.com
обсуждение исходный текст
Ответ на Bootstrap DATA is a pita  (Andres Freund <andres@2ndquadrant.com>)
Список pgsql-hackers
Hello Hackers,
  
  Reviving an old thread on simplifying the bootstrap process.

  I'm a developer from the GPDB / HAWQ side of the world where we did some work a while back to enable catalog definition via SQL files and we have found it valuable from a dev perspective.  The mechanism currently in those products is a bit.. convoluted where SQL is processed in perl to create the existing DATA statements, which are then processed as they are today in Postgres... I wouldn't suggest this route, but having worked with both the DATA mechanism and the SQL based one I've certainly found SQL to be a more convenient way of interacting with the catalog.

  I'd propose:
     - Keep enough of the existing bootstrap mechanism functional to get a small tidy core, essentially you need enough of pg_type, pg_proc, pg_class, pg_attribute to support the 25 types used by catalog tables and most everything else can be moved into SQL processing like how system_views.sql is handled today.

  The above was largely proposed back in March and rejected based on concerns that
 
  1. initdb would be slower.
  2. It would introduce too much special purpose bootstrap cruft into the code.
  3. Editing SQL commands is not comfortable in bulk

On 1.

I have a prototype that handles about 1000 functions (all the functions in pg_proc.h that are not used by other catalog tables, e.g. pg_type, pg_language, pg_range, pg_aggregate, window functions, pg_ts_parser, etc).  

All of initdb can be processed in 1.53s. This compares to 1.37s with the current bootstrap approach.  So yes, this is slower, but not 'noticeably slower' - I certainly didn't notice the 0.16s until I saw the concern and then timed it.

On 2.

So far the amount of cruft has been:
  - Enabling adding functions with specific OIDs when creating functions.
    1 line changes in pg_aggregate.c, proclang.c, typecmds.c
    about dozen lines of code in functioncmds.c
    3 lines changed in pg_proc.c
  - Update the fmgr_internal_validator for builtin functions while the catalog is mutable
    3 lines changed in pg_proc.c
  - Update how the builtin function cache is built
    Some significant work in fmgr.c that honestly still needs cleanup before it would be ready to propose as a patch that would be worthy of committing.
  - Update how builtin functions are resolved outside of bootstrap
    Minor updates to dynloader for lookup of symbols within the current executable, so far I've only done darwin.c for my prototype, this would need to be extended to the other ports.
  - Initializitation of the builtin cache
    2 line change in postinit.c
  - Addition of a stage in initdb to process the sql directives similar in scope to the processing of system_views.sql.

No changes needed in the parser, planner, etc.  My assessment is that this worry is not a major concern in practice with the right implementation.

On 3.

Having worked with both SQL and bki DATA directives I have personally found the convenience of SQL outweighs the pain.  In many cases changes, such as adding a new column to pg_proc, have minimal impact on the SQL representation and what changes are needed are often simple to implement.  E.g. accounting for COST only needs to be done for the functions that need something other than the default value.  This however is somewhat subjective.

On the Pros side:

  a. Debugging bootstrap is extremely painful, debugging once initdb has gotten to 'postgres --single' is way easier.

  b. It is easier to introduce minor issues with DATA directives than it is when using the SQL processing used for all other user objects.
  
   Example: currently in Postgres all builtin functions default to COST 1, and all SQL functions default to cost 100. However the following SQL functions included in bootstrap inexplicably are initialized with a COST of 1:  
   age(timestamp with time zone)
   age(timestamp without time zone)
   bit_length(bytea)
   bit_length(text)
   bit_length(bit)
   date_part(text, abstime)
   date_part(text, reltime)
   date_part(text, date)
   ... and 26 other examples

  c. SQL files are significantly less of a PITA (subjective opinion, but I can say this from a perspective of experience working with both DATA directives and SQL driven catalog definition).

If people have interest I can share my patch so far if that helps address concerns, but if there is not interest then I'll probably leave my prototype where it is rather than investing more effort in the proof of concept.

Thanks,
  Caleb


On Sat, Mar 7, 2015 at 5:20 PM, <pgsql-hackers-owner@postgresql.org> wrote:
Date: Sat, 7 Mar 2015 23:46:54 +0100
From: Andres Freund <andres@2ndquadrant.com>
To: Jim Nasby <Jim.Nasby@BlueTreble.com>
Cc: Stephen Frost <sfrost@snowman.net>, Robert Haas <robertmhaas@gmail.com>,
        Tom Lane <tgl@sss.pgh.pa.us>, Peter Eisentraut <peter_e@gmx.net>,
        Josh Berkus <josh@agliodbs.com>,
        "pgsql-hackers@postgresql.org" <pgsql-hackers@postgresql.org>
Subject: Re: Bootstrap DATA is a pita
Message-ID: <20150307224654.GC12213@awork2.anarazel.de>

On 2015-03-07 16:43:15 -0600, Jim Nasby wrote:
> Semi-related... if we put some special handling in some places for bootstrap
> mode, couldn't most catalog objects be created using SQL, once we got
> pg_class, pg_attributes and pg_type created? That would theoretically allow
> us to drive much more of initdb with plain SQL (possibly created via
> pg_dump).

Several people have now made that suggestion, but I *seriously* doubt
that we actually want to go there. The overhead of executing SQL
commands in comparison to the bki stuff is really rather
noticeable. Doing the majority of the large number of insertions via SQL
will make initdb noticeably slower. And it's already annoyingly
slow. Besides make install it's probably the thing I wait most for
during development.

That's besides the fact that SQL commands aren't actually that
comfortably editable in bulk.

Greetings,

Andres Freund

--
 Andres Freund                     http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Simon Riggs
Дата:
Сообщение: Re: Speedup twophase transactions
Следующее
От: Robert Haas
Дата:
Сообщение: Re: Inaccurate results from numeric ln(), log(), exp() and pow()