Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

Поиск
Список
Период
Сортировка
От Andy Fan
Тема Re: [PATCH] Keeps tracking the uniqueness with UniqueKey
Дата
Msg-id CAKU4AWr1BmbQB4F7j22G+NS4dNuem6dKaUf+1BK8me61uBgqqg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [PATCH] Keeps tracking the uniqueness with UniqueKey  (Andy Fan <zhihui.fan1213@gmail.com>)
Ответы Re: [PATCH] Keeps tracking the uniqueness with UniqueKey  (Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>)
Список pgsql-hackers
I just uploaded the v7 version and split it into smaller commits for easier
review/merge. I also maintain a  up-to-date README.uniquekey
document since something may changed during discussion or later code. 

Here is the simple introduction of each commit.

====
1. v7-0001-Introduce-RelOptInfo-notnullattrs-attribute.patch

This commit adds the notnullattrs to RelOptInfo,  which grabs the information
from both catalog and user's query.


2. v7-0002-Introuduce-RelOptInfo.uniquekeys-attribute.patch

This commit just add the uniquekeys to RelOptInfo and maintain it at every
stage. However the upper level code is not changed due to this.

Some changes of this part in v7:
1). Removed the UniqueKey.positions attribute. In the past it is used in
    convert_subquery_uniquekeys, however we don't need it actually (And I
    maintained it wrong in the past). Now I build the relationship between the
    outer var to subuqery's TargetList with outrel.subquery.processed_tlist.
2). onerow UniqueKey(exprs = NIL) need to be converted to normal uniquekey(exprs
   != NIL) if it is not one-row any more. This may happen on some outer join.


3. v7-0003-Refactor-existing-uniqueness-related-code-to-use-.patch

Refactor the existing functions like innerrel_is_unique/res_is_distinct_for to
use UniqueKey, and postpone the call of remove_useless_join and 
reduce_unique_semijoins to use the new implementation.

4. v7-0004-Remove-distinct-node-AggNode-if-the-input-is-uniq.patch

Remove the distinct node if the result is distinct already.  Remove the aggnode
if the group by clause is unique already AND there is no aggregation function in
query.

5. v7-0005-If-the-group-by-clause-is-unique-and-we-have-aggr.patch

If the group by clause is unique and query has aggregation function, we use
the AGG_SORT strategy but without really sort since it has only one row in each
group. 


6. v7-0006-Join-removal-at-run-time-with-UniqueKey.patch

This commit run join removal at build_join_rel.  At that time, it can fully uses
unique key. It can handle some more cases, I added some new test cases to
join.sql. However it can be a replacement of the current one. There are some
cases the new strategy can work run well but the current one can.  Like

SELECT a.* FROM a LEFT JOIN (b left join c on b.c_id = c.id) ON (a.b_id = b.id);

during the join a & b, the join can't be removed since b.id is still useful in
future. However in the future, we know the b.id can be removed as well, but
it is too late to remove the previous join.

At the implementation part,  the main idea is if the join_canbe_removed. we
will copy the pathlist from outerrel to joinrel. There are several items need to
handle.

1. To make sure the overall join_search_one_level, we have to keep the joinrel
   even the innerrel is removed (rather than discard the joinrel).
2. If the innerrel can be removed, we don't need to build pathlist for joinrel,
   we just reuse the pathlist from outerrel. However there are many places where
   use assert rel->pathlist[*]->parent == rel. so I copied the pathlist, we
   have to change the parent to joinrel.
3. During create plan for some path on RTE_RELATION, it needs to know the
   relation Oid with path->parent->relid. so we have to use the outerrel->relid
   to overwrite the joinrel->relid which is 0 before.
4. Almost same paths as item 3, it usually assert best_path->parent->rtekind ==
   RTE_RELATION; now the path may appeared in joinrel, so I used
   outerrel->rtekind to overwrite joinrel->rtekind.
5. I guess there are some dependencies between path->pathtarget and
   rel->reltarget. since we reuse the pathlist of outerrel, so I used the
   outer->reltarget as well. If the join can be removed, I guess the length of
   list_length(outrel->reltarget->exprs) >= (joinrel->reltarget->exprs). we can
   rely on the ProjectionPath to reduce the tlist.

My patches is based on the current latest commit fb544735f1.

Best Regards
Andy Fan
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Justin Pryzby
Дата:
Сообщение: Re: PG 13 release notes, first draft
Следующее
От: Kyotaro Horiguchi
Дата:
Сообщение: Re: +(pg_lsn, int8) and -(pg_lsn, int8) operators