Re: Multi-Column List Partitioning

Поиск
Список
Период
Сортировка
От Amul Sul
Тема Re: Multi-Column List Partitioning
Дата
Msg-id CAAJ_b97sgbtEeP18X6E+_v7Gt1T1MCPJP+p=unYZ9PO3e-ihnw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Multi-Column List Partitioning  (Amul Sul <sulamul@gmail.com>)
Список pgsql-hackers
On Thu, Dec 9, 2021 at 12:43 PM Amul Sul <sulamul@gmail.com> wrote:
>
> On Thu, Dec 9, 2021 at 12:03 PM Amit Langote <amitlangote09@gmail.com> wrote:
> >
> > On Thu, Dec 9, 2021 at 3:12 PM Amul Sul <sulamul@gmail.com> wrote:
> > > On Thu, Dec 9, 2021 at 11:24 AM Amit Langote <amitlangote09@gmail.com> wrote:
> > > >
> > > [....]
> > > > On Mon, Dec 6, 2021 at 10:57 PM Nitin Jadhav
> > > > <nitinjadhavpostgres@gmail.com> wrote:
> > > > > > Looks difficult to understand at first glance, how about the following:
> > > > > >
> > > > > > if (b1->isnulls != b2->isnulls)
> > > > > >    return false;
> > > >
> > > > I don't think having this block is correct, because this says that two
> > > > PartitionBoundInfos can't be "logically" equal unless their isnulls
> > > > pointers are the same, which is not the case unless they are
> > > > physically the same PartitionBoundInfo.  What this means for its only
> > > > caller compute_partition_bounds() is that it now always needs to
> > > > perform partition_bounds_merge() for a pair of list-partitioned
> > > > relations, even if they have exactly the same bounds.
> > > >
> > > > So, I'd suggest removing the block.
> > > >
> > >
> > > Agreed, I too realized the same; the check is incorrect and have noted
> > > it for the next post. But note that, we need a kind of check here otherwise,
> > > how could two bounds be equal if one has nulls and the other doesn't.
> >
> > We check partition strategy at the top and that ensures that isnulls
> > fields should either be both NULL or not, same as the block above that
> > checks 'kind'.  Maybe adding an Assert inside the block makes sense,
> > like this:
> >
> >                 /*
> >                  * If the bound datums can be NULL, check that the datums on
> >                  * both sides are either both NULL or not NULL.
> >                  */
> >                 if (b1->isnulls != NULL)
> >                 {
> >                     /*
> >                      * Both bound collections have the same partition strategy,
> >                      * so the other side must allow NULL datums as well.
> >                      */
> >                     Assert(b2->isnulls != NULL);
> >
>
> Make sense, thanks!
>

In addition to Amit's suggestions, here are a few more:

+   char          **colname = (char **) palloc0(partnatts * sizeof(char *));
+   Oid            *coltype = palloc0(partnatts * sizeof(Oid));
+   int32          *coltypmod = palloc0(partnatts * sizeof(int));
+   Oid            *partcollation = palloc0(partnatts * sizeof(Oid));
+

None of them really needed to be palloc0; also, as described
previously you can avoid the last three by using get_partition_col_*
directly.
---

+           i = 0;
+           foreach(cell2, rowexpr->args)
+           {

It's up to you, rather than using a separate index variable and
incrementing that at the end, I think we can use
foreach_current_index(cell2) which would look much nicer.
---

+           all_values[j].values = (Datum *) palloc0(key->partnatts *
sizeof(Datum));
+           all_values[j].isnulls = (bool *) palloc0(key->partnatts *
sizeof(bool));
+           all_values[j].index = i;

palloc0 is unnecessary for the "values".
---

        dest->datums[i] = &boundDatums[i * natts];
+       if (src->isnulls)
+           dest->isnulls[i] = (bool *) palloc(sizeof(bool) * natts);

I think you can allocate memory for isnulls the same way you do
allocate boundDatums and just do the memcpy.
---

+       for (i = 0; i < partnatts; i++)
+       {
+           if (outer_isnull && outer_isnull[i])
+           {
+               outer_has_null = true;
+               if (outer_map.merged_indexes[outer_index] == -1)
+                   consider_outer_null = true;
+           }

I am wondering why you are not breaking the loop once you set
consider_outer_null?
Note that if you do that then you need a separate loop for the
inner_isnull part.
---

@@ -1351,14 +1431,30 @@ merge_list_bounds(FmgrInfo *partsupfunc, Oid
*partcollation,
            /* A list value missing from the inner side. */
            Assert(outer_pos < outer_bi->ndatums);

-           /*
-            * If the inner side has the default partition, or this is an
-            * outer join, try to assign a merged partition to the outer
-            * partition (see process_outer_partition()).  Otherwise, the
-            * outer partition will not contribute to the result.
-            */
-           if (inner_has_default || IS_OUTER_JOIN(jointype))
+           if (outer_has_null || inner_has_null)
            {
+               if (consider_outer_null || consider_inner_null)
+               {
+                   /* Merge the NULL partitions. */
+                   merged_index = merge_null_partitions(&outer_map, &inner_map,
+                                                        consider_outer_null,
+                                                        consider_inner_null,
+                                                        outer_index,
inner_index,
+                                                        jointype, &next_index);
+

I have doubts about the condition that allows reaching
merge_null_partitions() but I am not sure I am correct. I think if the
list values missing from the __inner side__ then we might need to
check only "inner_has_null" & "consider_inner_null" and merge the
same, but why is this code also checking "outer_has_null" &
"consider_outer_null". Correct me if I am missing something.
---

+           if (isnulls && isnulls[i])
+               cmpval = 0;     /* NULL "=" NULL */
+           else
+               cmpval = 1;     /* NULL ">" not-NULL */
+       }
+       else if (isnulls && isnulls[i])
+           cmpval = -1;        /* not-NULL "<" NULL */

I really doubt this assumption is correct; aren't those strict operators?
---

+get_list_partbound_value_string(List *bound_value)
+{
+   StringInfo      buf = makeStringInfo();
+   StringInfo      boundconstraint = makeStringInfo();

boundconstraint should be declared inside "if (ncols > 1)" block.
---

+   foreach(cell, bound_value)
+   {
+       Const      *val = castNode(Const, lfirst(cell));
+
+       appendStringInfoString(buf, sep);
+       get_const_expr(val, &context, -1);
+       sep = ", ";
+       ncols++;
+   }

I think no need to increment ncols every time, you have a list and you
can get that. Also, I think since you have ncols already, you can
prepend and append parenthesis before and after so that you can avoid
extra StringInfo.
---

 typedef struct PartitionBoundInfoData
 {
    char        strategy;       /* hash, list or range? */
+   int         partnatts;      /* number of partition key columns */
    int         ndatums;        /* Length of the datums[] array */
    Datum     **datums;
+   bool      **isnulls;

Adding "partnatts" to this struct seems to be unnecessary, AFAIUC,
added that for partition_bound_accepts_nulls(), but we can easily get
that value from the partitioning key & pass an additional argument.
Also, no information about the length of the "isnulls" array.
---

I think it would be helpful if you could split the patch: one for
multi-value list partitioning and another for the partition wise join, thanks.

Regards,
Amul



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Simon Riggs
Дата:
Сообщение: Documenting when to retry on serialization failure
Следующее
От: Andrew Dunstan
Дата:
Сообщение: Re: A test for replay of regression tests