Re: Make deparsing of column defaults faster

Поиск
Список
Период
Сортировка
От Jeff Janes
Тема Re: Make deparsing of column defaults faster
Дата
Msg-id CAMkU=1w+GM3D0H45mEp_A1xZs=tECD03syC8bSY0O6SrsNqeoA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Make deparsing of column defaults faster  (Peter Eisentraut <peter.eisentraut@2ndquadrant.com>)
Ответы Re: Make deparsing of column defaults faster  (Peter Eisentraut <peter.eisentraut@2ndquadrant.com>)
Список pgsql-hackers
On Mon, Jun 4, 2018 at 10:00 PM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
On 6/4/18 20:55, Jeff Janes wrote:
> The user docs say about column defaults: "The value is any variable-free
> expression (subqueries and cross-references to other columns in the
> current table are not allowed)"
>
> And also say about pg_get_expr "If the expression might contain Vars,
> specify the OID of the relation they refer to as the second parameter;
> if no Vars are expected, zero is sufficient"
>
> Since defaults can't contain Vars, this patch converts the second
> parameter to zero in places where pg_get_expr is invoked on column
> defaults.

My in-progress generated columns patch removes that assumption (since a
generated column generally refers to another column of the same table).

Will your patch put the generated columns data into pg_attrdef, or create a new catalog to hold them?  I guess it would make sense to re-use the catalog since you can't have both a default value and be a generated column.


> Doing this drops the time needed to run `pg_dump -s` on a
> 1600 column table with defaults on all columns by a factor of 40.  So
> pg_upgrade users with such tables should see a big win (cc Justin).

That's impressive but also a bit disturbing.  Where does this
performance change come from?  Can we harness it in different ways?

The most fundamental way would be to change make_colname_unique to use a hash table, rather than an N^2 algorithm.  It would probably be similar to Tom's commit 8004953b5a2449c26c4e "Speed up ruleutils' name de-duplication code" but he does warn in the commit message that his method would be trickier to use for column aliases than table aliases.  That would speed up a lot more cases than just DEFAULTS (check constraints, maybe triggers) , but would be harder to do.  

Even once we fix this, it still wouldn't make sense to create a deparse context which cannot be necessary.  Perhaps the call to deparse_context_for could be made lazy, so that it is delayed until a Var is actually found in the expression node tree, rather than being done preemptively.  Then all cases without Vars would be improved, not just cases where it could be proved by static code analysis that Vars are not possible.

When setting up the deparse context for a single table, the column names already need to be unique (I think).  So another possibility is to pass a flag down from set_simple_column_names to set_relation_column_names to indicate no unique checking is needed.

Since pg_dump calls pg_get_expr once over and over again on the same table consecutively, perhaps we could cache the column alias assignments in a single-entry cache, so if it is called on the same table as last time it just re-uses the aliases from last time.  I am not planning on working on that, I don't know where such a cache could be stored such that is freed and invalidated at the appropriate times.

Cheers,

Jeff

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: SCRAM with channel binding downgrade attack
Следующее
От: Amit Langote
Дата:
Сообщение: Re: partition tree inspection functions