Re: [PATCH] Erase the distinctClause if the result is unique by definition

Поиск
Список
Период
Сортировка
От Andy Fan
Тема Re: [PATCH] Erase the distinctClause if the result is unique by definition
Дата
Msg-id CAKU4AWqOORqW900O-+L4L2+0xknsEqpfcs9FF7SeiO9TmpeZOg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [PATCH] Erase the distinctClause if the result is unique by definition  (David Rowley <dgrowleyml@gmail.com>)
Ответы Re: [PATCH] Erase the distinctClause if the result is unique by definition  (David Rowley <dgrowleyml@gmail.com>)
Список pgsql-hackers
Hi David:

Thanks for your time. 

On Wed, Mar 18, 2020 at 9:56 AM David Rowley <dgrowleyml@gmail.com> wrote:
On Mon, 16 Mar 2020 at 06:01, Andy Fan <zhihui.fan1213@gmail.com> wrote:
>
> Hi All:
>
> I have re-implemented the patch based on David's suggestion/code,  Looks it
> works well.   The updated patch mainly includes:
>
> 1. Maintain the not_null_colno in RelOptInfo, which includes the not null from
>     catalog and the not null from vars.

What about non-nullability that we can derive from means other than
NOT NULL constraints. Where will you track that now that you've
removed the UniqueKeySet type?

I tracked it in 'deconstruct_recurse',  just before the distribute_qual_to_rels call. 

+                       ListCell        *lc;
+                       foreach(lc, find_nonnullable_vars(qual))
+                       {
+                               Var *var = lfirst_node(Var, lc);
+                               RelOptInfo *rel = root->simple_rel_array[var->varno];
+                               if (var->varattno > InvalidAttrNumber)
+                                       rel->not_null_cols = bms_add_member(rel->not_null_cols, var->varattno);
+                       }


Traditionally we use attno or attnum rather than colno for variable
names containing attribute numbers

Currently I use a list of Var for a UnqiueKey,   I guess it is ok? 
 

> 3. postpone the propagate_unique_keys_to_joinrel call to populate_joinrel_with_paths
>   since we know jointype at that time. so we can handle the semi/anti join specially.

ok, but the join type was known already where I was calling the
function from. It just wasn't passed to the function.

> 4. Add the rule I suggested above,  if both of the 2 relation yields the a unique result,
>   the join result will be unique as well. the UK can be  ( (rel1_uk1, rel1_uk2).. )

I see. So basically you're saying that the joinrel's uniquekeys should
be the cartesian product of the unique rels from either side of the
join.  I wonder if that's a special case we need to worry about too
much. Surely it only applies for clauseless joins

Some other cases we may need this as well:).   like select m1.pk, m2.pk  from m1, m2 
where m1.b = m2.b; 

The cartesian product of the unique rels will make the unqiue keys too long,  so I maintain
the UnqiueKeyContext to make it short.  The idea is if  (UK1) is unique already,  no bother
to add another UK as (UK1, UK2) which is just a superset of it. 
 
 
> 5. If the unique key is impossible to be referenced by others,  we can safely ignore
>     it in order  to keep the (join)rel->unqiuekeys short.

You could probably have an equivalent of has_useful_pathkeys() and
pathkeys_useful_for_ordering()

 
Thanks for suggestion,  I will do so  in the v5-xx.patch.  
 
> 6. I only consider the not null check/opfamily check for the uniquekey which comes
>    from UniqueIndex.  I think that should be correct.
> 7. I defined each uniquekey as List of Expr,  so I didn't introduce new node type.

Where will you store the collation Oid? I left comments to mention
that needed to be checked but just didn't wire it up.

This is too embarrassed,  I am not sure if it is safe to ignore it.  I removed it due to
the following reasons (sorry for that I didn't explain it carefully for the last email). 

1.  When we choose if an UK is usable,  we have chance to compare the collation info
for  restrictinfo  (where uk = 1) or target list (select uk from t) with the indexinfo's collation,
the targetlist one has more trouble since we need to figure out the default collation  for it.
However relation_has_unique_index_for has the same situation as us,  it ignores it as well.
See comment /* XXX at some point we may need to check collations here too. */. It think
if there are some reasons we can ignore that.  

2. What we expect from UK is:
a).  Where m1.uniquekey = m2.b   m2.uk will not be duplicated by this joinclause.   Here
if m1.uk has different collation, it will raise runtime error. 
b).  Where m1.uniquekey collate 'xxxx' = m2.b.   We may can't depends on 
the run-time error this time. But if we are sure that  *if uk is uk at default collation is unique,  
then (uk collcate 'other-colation') is unique as well**.  if so we may safe ignore it as well.
c).  select uniquekey from t / select uniquekey collate 'xxxx'  from t.  This have the same
requirement as item b). 

3).  Looks maintain the collation for our case totally is a big effort,  and user rarely use it,  If
my expectation for 2.b is not true,  I prefer to detect such case (user use a different collation),
we can just ignore the UK for that. 

But After all, I think this should be an open question for now. 

---
At last,  I am so grateful for your suggestion/feedback,  that's really heuristic and constructive. 
And so thanks Tom's for the quick review and suggest to add a new fields for RelOptInfo, 
without it I don't think I can add a new field to a so important struct.  And also thanks Bapat who 
explains the thing more detailed.   I'm now writing the code for partition index stuff, which 
is a bit of boring, since every partition may have different unique index.  I am expecting that 
I can finish it in the following 2 days,  and hope you can have another round of review again.

Thanks for your feedback!

Best Regards
Andy Fan

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: ALTER tbl rewrite loses CLUSTER ON index
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Small docs bugfix: make it clear what can be used in UPDATE FROM and DELETE USING