Re: BUG #15847: Running out of memory when planning full outer joins involving many partitions
От | Tom Lane |
---|---|
Тема | Re: BUG #15847: Running out of memory when planning full outer joins involving many partitions |
Дата | |
Msg-id | 8168.1560446056@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | BUG #15847: Running out of memory when planning full outer joins involving many partitions (PG Bug reporting form <noreply@postgresql.org>) |
Ответы |
Re: BUG #15847: Running out of memory when planning full outer joinsinvolving many partitions
(Feike Steenbergen <feikesteenbergen@gmail.com>)
|
Список | pgsql-bugs |
PG Bug reporting form <noreply@postgresql.org> writes: > We've had a few reports recently that had a backend consume a lot of > memory causing either an OOM-kill or kubernetes rescheduling their > PostgreSQL pod. > For this specific bug report there were two things that clearly stood out: > - a FULL OUTER JOIN is done > - many partitions (thousands) are involved I poked into this and found the cause. For the sample query, we have an EquivalenceClass containing the expression COALESCE(COALESCE(Var_1_1, Var_2_1), Var_3_1) where each of the Vars belongs to an appendrel parent. add_child_rel_equivalences() needs to add expressions representing the transform of that to each child relation. That is, if the children of table 1 are A1 and A2, of table 2 are B1 and B2, and of table 3 are C1 and C2, what we'd like to add are the expressions COALESCE(COALESCE(Var_A1_1, Var_2_1), Var_3_1) COALESCE(COALESCE(Var_A2_1, Var_2_1), Var_3_1) COALESCE(COALESCE(Var_1_1, Var_B1_1), Var_3_1) COALESCE(COALESCE(Var_1_1, Var_B2_1), Var_3_1) COALESCE(COALESCE(Var_1_1, Var_2_1), Var_C1_1) COALESCE(COALESCE(Var_1_1, Var_2_1), Var_C2_1) However, what it's actually producing is additional combinations for each appendrel after the first, because each call also mutates the previously-added child expressions. So in this example we also get COALESCE(COALESCE(Var_A1_1, Var_B1_1), Var_3_1) COALESCE(COALESCE(Var_A2_1, Var_B2_1), Var_3_1) COALESCE(COALESCE(Var_A1_1, Var_2_1), Var_C1_1) COALESCE(COALESCE(Var_A2_1, Var_2_1), Var_C2_1) COALESCE(COALESCE(Var_A1_1, Var_B1_1), Var_C1_1) COALESCE(COALESCE(Var_A2_1, Var_B2_1), Var_C2_1) With two appendrels involved, that's O(N^2) expressions; with three appendrels, more like O(N^3). This is by no means specific to FULL JOINs; you could get the same behavior with join clauses like "WHERE t1.a + t2.b + t3.c = t4.d". These extra expressions don't have any use, since we're not going to join the children directly to each other. So we need to fix add_child_rel_equivalences() to not do that. The simplest way seems to be to make it ignore em_is_child EC members, which requires that we use the adjust_appendrel_attrs_multilevel machinery if we're trying to convert one of the original expressions for a grandchild relation. As attached. This patch fixes the described performance problem and still passes check-world. While I don't have any hesitation about pushing this patch into HEAD, I do feel a bit nervous about back-patching it, particularly right before a set of minor releases. I don't think that we consider large numbers of partitions to be a well-supported case in v11 or before, so for the released branches I'd rather just say "if it hurts, don't do that". As an aside, adjust_appendrel_attrs_multilevel() makes me positively ill (and it's not the head cold I have today...). It's unbelievably brute-force, which might be okay if it were something we'd execute only once per query, but examples like this can require it to be executed thousands of times. Still, right now is probably not a good time to blow it up and rewrite it. regards, tom lane diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c index e9bd5ea..b50e9cc 100644 --- a/src/backend/optimizer/path/equivclass.c +++ b/src/backend/optimizer/path/equivclass.c @@ -2135,11 +2135,10 @@ add_child_rel_equivalences(PlannerInfo *root, continue; /* - * No point in searching if parent rel not mentioned in eclass; but we - * can't tell that for sure if parent rel is itself a child. + * No point in searching if child's topmost parent rel is not + * mentioned in eclass. */ - if (parent_rel->reloptkind == RELOPT_BASEREL && - !bms_is_subset(parent_rel->relids, cur_ec->ec_relids)) + if (!bms_is_subset(child_rel->top_parent_relids, cur_ec->ec_relids)) continue; foreach(lc2, cur_ec->ec_members) @@ -2149,18 +2148,41 @@ add_child_rel_equivalences(PlannerInfo *root, if (cur_em->em_is_const) continue; /* ignore consts here */ - /* Does it reference parent_rel? */ - if (bms_overlap(cur_em->em_relids, parent_rel->relids)) + /* + * We consider only original EC members here, not + * already-transformed child members. Otherwise, if some original + * member expression references more than one appendrel, we'd get + * an O(N^2) explosion of useless derived expressions for + * combinations of children. + */ + if (cur_em->em_is_child) + continue; /* ignore children here */ + + /* Does this member reference child's topmost parent rel? */ + if (bms_overlap(cur_em->em_relids, child_rel->top_parent_relids)) { /* Yes, generate transformed child version */ Expr *child_expr; Relids new_relids; Relids new_nullable_relids; - child_expr = (Expr *) - adjust_appendrel_attrs(root, - (Node *) cur_em->em_expr, - 1, &appinfo); + if (parent_rel->reloptkind == RELOPT_BASEREL) + { + /* Simple single-level transformation */ + child_expr = (Expr *) + adjust_appendrel_attrs(root, + (Node *) cur_em->em_expr, + 1, &appinfo); + } + else + { + /* Must do multi-level transformation */ + child_expr = (Expr *) + adjust_appendrel_attrs_multilevel(root, + (Node *) cur_em->em_expr, + child_rel->relids, + child_rel->top_parent_relids); + } /* * Transform em_relids to match. Note we do *not* do @@ -2169,7 +2191,7 @@ add_child_rel_equivalences(PlannerInfo *root, * don't want the child member to be marked as constant. */ new_relids = bms_difference(cur_em->em_relids, - parent_rel->relids); + child_rel->top_parent_relids); new_relids = bms_add_members(new_relids, child_rel->relids); /* @@ -2177,10 +2199,11 @@ add_child_rel_equivalences(PlannerInfo *root, * parent and child relids are singletons. */ new_nullable_relids = cur_em->em_nullable_relids; - if (bms_overlap(new_nullable_relids, parent_rel->relids)) + if (bms_overlap(new_nullable_relids, + child_rel->top_parent_relids)) { new_nullable_relids = bms_difference(new_nullable_relids, - parent_rel->relids); + child_rel->top_parent_relids); new_nullable_relids = bms_add_members(new_nullable_relids, child_rel->relids); }
В списке pgsql-bugs по дате отправления:
Предыдущее
От: Tom LaneДата:
Сообщение: Re: BUG #15844: MIPS: remove .set mips2 in s_lock.h to fix r6 build
Следующее
От: PG Bug reporting formДата:
Сообщение: BUG #15851: Concurrent Refresh of Materialized views not preserving the order of the underlying query