Обсуждение: Duplicate unique key values in inheritance tables
I came across a query that returned incorrect results and I traced it down to being caused by duplicate unique key values in an inheritance table. As a simple example, consider create table p (a int primary key, b int); create table c () inherits (p); insert into p select 1, 1; insert into c select 1, 2; select a, b from p; a | b ---+--- 1 | 1 1 | 2 (2 rows) explain (verbose, costs off) select a, b from p group by a; QUERY PLAN -------------------------------------- HashAggregate Output: p.a, p.b Group Key: p.a -> Append -> Seq Scan on public.p p_1 Output: p_1.a, p_1.b -> Seq Scan on public.c p_2 Output: p_2.a, p_2.b (8 rows) The parser considers 'p.b' functionally dependent on the group by column 'p.a' because 'p.a' is identified as the primary key for table 'p'. However, this causes confusion for the executor when determining which 'p.b' value should be returned for each group. In my case, I observed that sorted and hashed aggregation produce different results for the same query. Reading the doc, it seems that this is a documented limitation of the inheritance feature that we would have duplicate unique key values in inheritance tables. Even adding a unique constraint to the children does not prevent duplication compared to the parent. As a workaround for this issue, I'm considering whether we can skip checking functional dependency on primary keys for inheritance parents, given that we cannot guarantee uniqueness on the keys in this case. Maybe something like below. @@ -1421,7 +1427,9 @@ check_ungrouped_columns_walker(Node *node, Assert(var->varno > 0 && (int) var->varno <= list_length(context->pstate->p_rtable)); rte = rt_fetch(var->varno, context->pstate->p_rtable); - if (rte->rtekind == RTE_RELATION) + if (rte->rtekind == RTE_RELATION && + !(rte->relkind == RELKIND_RELATION && + rte->inh && has_subclass(rte->relid))) { if (check_functional_grouping(rte->relid, Any thoughts? Thanks Richard
On Tue, 16 Jul 2024 at 12:45, Richard Guo <guofenglinux@gmail.com> wrote: > As a workaround for this issue, I'm considering whether we can skip > checking functional dependency on primary keys for inheritance > parents, given that we cannot guarantee uniqueness on the keys in this > case. Maybe something like below. > > @@ -1421,7 +1427,9 @@ check_ungrouped_columns_walker(Node *node, > Assert(var->varno > 0 && > (int) var->varno <= list_length(context->pstate->p_rtable)); > rte = rt_fetch(var->varno, context->pstate->p_rtable); > - if (rte->rtekind == RTE_RELATION) > + if (rte->rtekind == RTE_RELATION && > + !(rte->relkind == RELKIND_RELATION && > + rte->inh && has_subclass(rte->relid))) > { > if (check_functional_grouping(rte->relid, > > Any thoughts? The problem with doing that is that it might mean queries that used to work no longer work. CREATE VIEW could also fail where it used to work which could render pg_dumps unrestorable. Because it's a parser issue, I don't think we can fix it the same way as a5be4062f was fixed. I don't have any ideas on what we can do about this right now, but thought it was worth sharing the above. David
On Monday, July 15, 2024, David Rowley <dgrowleyml@gmail.com> wrote:
On Tue, 16 Jul 2024 at 12:45, Richard Guo <guofenglinux@gmail.com> wrote:
> As a workaround for this issue, I'm considering whether we can skip
> checking functional dependency on primary keys for inheritance
> parents, given that we cannot guarantee uniqueness on the keys in this
> case.
Because it's a parser issue, I don't think we can fix it the same way
as a5be4062f was fixed.
I don't have any ideas on what we can do about this right now, but
thought it was worth sharing the above.
Add another note to caveats in the docs and call it a feature. We produce a valid answer for the data model encountered. The non-determinism isn’t wrong, it’s just a poorly written query/model with non-deterministic results. Since v15 we have an any_value aggregate - we basically are applying this to the dependent columns implicitly. A bit of revisionist history but I’d rather do that than break said queries. Especially at parse time; I’d be a bit more open to execution-time enforcement if functional dependency on the id turns out to have actually been violated. But people want, and in other products have, any_value implicit aggregation in this situation so it’s hard to say it is wrong even if we otherwise take the position that we will not accept it.
David J.
On Tue, 16 Jul 2024 at 13:28, David G. Johnston <david.g.johnston@gmail.com> wrote: > Add another note to caveats in the docs and call it a feature. We produce a valid answer for the data model encountered. The non-determinism isn’t wrong, it’s just a poorly written query/model with non-deterministic results. Sincev15 we have an any_value aggregate - we basically are applying this to the dependent columns implicitly. A bit of revisionisthistory but I’d rather do that than break said queries. Especially at parse time; I’d be a bit more open to execution-timeenforcement if functional dependency on the id turns out to have actually been violated. But people want,and in other products have, any_value implicit aggregation in this situation so it’s hard to say it is wrong even ifwe otherwise take the position that we will not accept it. I think it might be best just to ignore it and do nothing. Maybe it would be worth putting something into the docs about it if people from userland come complaining about a bug as the doc mention might stop them wasting their time reporting something we already know about. Otherwise, I feel the docs would just draw attention to something that I'd personally rather people didn't do. As you say, using any_value() would be the way we'd encourage people to do it if they don't care which value of the ungrouped column they want, so documenting something else doesn't seem quite right to me. David