Extracting only the columns needed for a query

Поиск
Список
Период
Сортировка
От Melanie Plageman
Тема Extracting only the columns needed for a query
Дата
Msg-id CAAKRu_Yj=Q_ZxiGX+pgstNWMbUJApEJX-imvAEwryCk5SLUebg@mail.gmail.com
обсуждение исходный текст
Ответы Re: Extracting only the columns needed for a query  (David Rowley <david.rowley@2ndquadrant.com>)
Re: Extracting only the columns needed for a query  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
While hacking on zedstore, we needed to get a list of the columns to
be projected--basically all of the columns needed to satisfy the
query. The two use cases we have for this is
1) to pass this column list down to the AM layer for the AM to leverage it
2) for use during planning to improving costing
In other threads, such as [1], there has been discussion about the
possible benefits for all table types of having access to this set of
columns.

Focusing on how to get this used cols list (as opposed to how to pass
it down to the AM), we have tried a few approaches to constructing it
and wanted to get some ideas on how best to do it.

We are trying to determine which phase to get the columns -- after
parsing, after planning, or during execution right before calling the
AM.

Approach A: right before calling AM

    Leverage expression_tree_walker() right before calling beginscan()
    and collecting the columns into a needed columns context. This
    approach is what is currently in the zedstore patch mentioned in
    this thread [2].

    The benefit of this approach is that it walks the tree right
    before the used column set will be used--which makes it easy to
    skip this walk for queries or AMs that don't benefit from this
    used columns list.

Approach B: after parsing and/or after planning

    Add a new member 'used_cols' to PlannedStmt which contains the
    attributes for each relation present in the query. Construct
    'used_cols' at the end of planning using the PathTargets in the
    RelOptInfos in the PlannerInfo->simple_rel_array and the
    RangeTblEntries in PlannerInfo->simple_rte_array.

    The nice thing about this is that it does not require a full walk
    of the plan tree. Approach A could be more expensive if the tree
    is quite large. I'm not sure, however, if just getting the
    PathTargets from the RelOptInfos is sufficient for obtaining the
    whole set of columns used in the query.

    Approach B, however, does not work for utility statements which do
    not go through planning.

    One potential solution to this that we tried was getting the
    columns from the query tree after parse analyze and then in
    exec_simple_query() adding the column list to the PlannedStmt.

    This turned out to be as messy or more than Approach A because
    each kind of utility statement has its own data structure that is
    composed of elements taken from the Query tree but does not
    directly include the original PlannedStmt created for the query
    (the PlannedStmt doesn't contain anything except the query tree
    for utility statements since they do not go through planning). So,
    for each type of utility statement, we would have to separately
    copy over the column list from the PlannedStmt in its own way.

    It is worth noting that Approach A also requires special handling
    for each type of utility statement.

We are wondering about specific benefits of Approach B--that is, is
there some use case (maybe plan re-use) for having the column set
accessible in the PlannedStmt itself?

One potential benefit of Approach B could be for scans of partition
tables. Collecting the used column list could be done once for the
query instead of once for each partition.

Both approaches, however, do not address our second use case, as we
would not have the column list during planning for non-utility
statements. To satisfy this, we would likely have to extract the
columns from the query tree after parse analyze for non-utility
statements as well.

An approach which extracted this list before planning and saved it
somewhere would help avoid having to do the same walk during planning
and then again during execution. Though, using the list constructed
after parsing may not be ideal when some columns were able to be
eliminated during planning.

Melanie & Ashwin

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Joe Conway
Дата:
Сообщение: Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS)
Следующее
От: David Rowley
Дата:
Сообщение: Re: Speed up transaction completion faster after many relations areaccessed in a transaction