Обсуждение: Duplicate unique key values in inheritance tables

Поиск
Список
Период
Сортировка

Duplicate unique key values in inheritance tables

От
Richard Guo
Дата:
I came across a query that returned incorrect results and I traced it
down to being caused by duplicate unique key values in an inheritance
table.  As a simple example, consider

create table p (a int primary key, b int);
create table c () inherits (p);

insert into p select 1, 1;
insert into c select 1, 2;

select a, b from p;
 a | b
---+---
 1 | 1
 1 | 2
(2 rows)

explain (verbose, costs off)
select a, b from p group by a;
              QUERY PLAN
--------------------------------------
 HashAggregate
   Output: p.a, p.b
   Group Key: p.a
   ->  Append
         ->  Seq Scan on public.p p_1
               Output: p_1.a, p_1.b
         ->  Seq Scan on public.c p_2
               Output: p_2.a, p_2.b
(8 rows)

The parser considers 'p.b' functionally dependent on the group by
column 'p.a' because 'p.a' is identified as the primary key for table
'p'.  However, this causes confusion for the executor when determining
which 'p.b' value should be returned for each group.  In my case, I
observed that sorted and hashed aggregation produce different results
for the same query.

Reading the doc, it seems that this is a documented limitation of the
inheritance feature that we would have duplicate unique key values in
inheritance tables.  Even adding a unique constraint to the children
does not prevent duplication compared to the parent.

As a workaround for this issue, I'm considering whether we can skip
checking functional dependency on primary keys for inheritance
parents, given that we cannot guarantee uniqueness on the keys in this
case.  Maybe something like below.

@@ -1421,7 +1427,9 @@ check_ungrouped_columns_walker(Node *node,
        Assert(var->varno > 0 &&
               (int) var->varno <= list_length(context->pstate->p_rtable));
        rte = rt_fetch(var->varno, context->pstate->p_rtable);
-       if (rte->rtekind == RTE_RELATION)
+       if (rte->rtekind == RTE_RELATION &&
+           !(rte->relkind == RELKIND_RELATION &&
+             rte->inh && has_subclass(rte->relid)))
        {
            if (check_functional_grouping(rte->relid,

Any thoughts?

Thanks
Richard



Re: Duplicate unique key values in inheritance tables

От
David Rowley
Дата:
On Tue, 16 Jul 2024 at 12:45, Richard Guo <guofenglinux@gmail.com> wrote:
> As a workaround for this issue, I'm considering whether we can skip
> checking functional dependency on primary keys for inheritance
> parents, given that we cannot guarantee uniqueness on the keys in this
> case.  Maybe something like below.
>
> @@ -1421,7 +1427,9 @@ check_ungrouped_columns_walker(Node *node,
>         Assert(var->varno > 0 &&
>                (int) var->varno <= list_length(context->pstate->p_rtable));
>         rte = rt_fetch(var->varno, context->pstate->p_rtable);
> -       if (rte->rtekind == RTE_RELATION)
> +       if (rte->rtekind == RTE_RELATION &&
> +           !(rte->relkind == RELKIND_RELATION &&
> +             rte->inh && has_subclass(rte->relid)))
>         {
>             if (check_functional_grouping(rte->relid,
>
> Any thoughts?

The problem with doing that is that it might mean queries that used to
work no longer work.  CREATE VIEW could also fail where it used to
work which could render pg_dumps unrestorable.

Because it's a parser issue, I don't think we can fix it the same way
as a5be4062f was fixed.

I don't have any ideas on what we can do about this right now, but
thought it was worth sharing the above.

David



Re: Duplicate unique key values in inheritance tables

От
"David G. Johnston"
Дата:
On Monday, July 15, 2024, David Rowley <dgrowleyml@gmail.com> wrote:
On Tue, 16 Jul 2024 at 12:45, Richard Guo <guofenglinux@gmail.com> wrote:
> As a workaround for this issue, I'm considering whether we can skip
> checking functional dependency on primary keys for inheritance
> parents, given that we cannot guarantee uniqueness on the keys in this
> case.

Because it's a parser issue, I don't think we can fix it the same way
as a5be4062f was fixed.

I don't have any ideas on what we can do about this right now, but
thought it was worth sharing the above.

Add another note to caveats in the docs and call it a feature.  We produce a valid answer for the data model encountered.  The non-determinism isn’t wrong, it’s just a poorly written query/model with non-deterministic results. Since v15 we have an any_value aggregate - we basically are applying this to the dependent columns implicitly.  A bit of revisionist history but I’d rather do that than break said queries.  Especially at parse time; I’d be a bit more open to execution-time enforcement if functional dependency on the id turns out to have actually been violated.  But people want, and in other products have, any_value implicit aggregation in this situation so it’s hard to say it is wrong even if we otherwise take the position that we will not accept it.

David J.

 

Re: Duplicate unique key values in inheritance tables

От
David Rowley
Дата:
On Tue, 16 Jul 2024 at 13:28, David G. Johnston
<david.g.johnston@gmail.com> wrote:
> Add another note to caveats in the docs and call it a feature.  We produce a valid answer for the data model
encountered. The non-determinism isn’t wrong, it’s just a poorly written query/model with non-deterministic results.
Sincev15 we have an any_value aggregate - we basically are applying this to the dependent columns implicitly.  A bit of
revisionisthistory but I’d rather do that than break said queries.  Especially at parse time; I’d be a bit more open to
execution-timeenforcement if functional dependency on the id turns out to have actually been violated.  But people
want,and in other products have, any_value implicit aggregation in this situation so it’s hard to say it is wrong even
ifwe otherwise take the position that we will not accept it. 

I think it might be best just to ignore it and do nothing. Maybe it
would be worth putting something into the docs about it if people from
userland come complaining about a bug as the doc mention might stop
them wasting their time reporting something we already know about.
Otherwise, I feel the docs would just draw attention to something that
I'd personally rather people didn't do. As you say, using any_value()
would be the way we'd encourage people to do it if they don't care
which value of the ungrouped column they want, so documenting
something else doesn't seem quite right to me.

David