Re: support for MERGE

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: support for MERGE
Дата
Msg-id 20220128221933.qgu4n7tphdiryjt3@alap3.anarazel.de
обсуждение исходный текст
Ответ на Re: support for MERGE  (Alvaro Herrera <alvherre@alvh.no-ip.org>)
Ответы Re: support for MERGE  (Zhihong Yu <zyu@yugabyte.com>)
Re: support for MERGE  (Alvaro Herrera <alvherre@alvh.no-ip.org>)
Список pgsql-hackers
Hi,

On 2022-01-28 17:27:37 -0300, Alvaro Herrera wrote:
> MERGE, v10.  I am much more comfortable with this version; I have
> removed a bunch of temporary hacks and cleaned up the interactions with
> table AM and executor, which is something that had been bothering me for
> a while.  The complete set of changes can be seen in github,
> https://github.com/alvherre/postgres/commits/merge-15

Any chance you could split this into something more reviewable? The overall
diff stats are: 102 files changed, 8589 insertions(+), 234 deletions(-) thats
pretty hard to really review. Incremental commits don't realy help with that.


> I am not aware of anything of significance in terms of remaining work
> for this project.

One thing from skimming: There's not enough documentation about the approach
imo - it's a complicated enough feature to deserve more than the one paragraph
in src/backend/executor/README.


I'm still [1],[2] uncomfortable with execMerge.c kinda-but-not-really being an
executor node.


> The one thing I'm a bit bothered about is the fact that we expose a lot of
> executor functions previously static.  I am now wondering if it would be
> better to move the MERGE executor support functions into nodeModifyTable.c,
> which I think would mean we would not have to expose those function
> prototypes.

I'm worried about the complexity of nodeModifyTuple.c as is. Moving more code
in there does not seem like the right call to me - we should do the opposite
if anything.


A few inline comments below. No real review, just stuff noticed while skimming
to see where this is at.


Greetings,

Andres


[1] https://www.postgresql.org/message-id/20180405200003.gar3j26gsk32gqpe@alap3.anarazel.de
[2] https://www.postgresql.org/message-id/20180403021800.b5nsgiclzanobiup@alap3.anarazel.de


> diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
> index 1a9c1ac290..280ac40e63 100644
> --- a/src/backend/commands/trigger.c
> +++ b/src/backend/commands/trigger.c

This stuff seems like one candidate for splitting out.


> +    /*
> +     * We maintain separate transition tables for UPDATE/INSERT/DELETE since
> +     * MERGE can run all three actions in a single statement. Note that UPDATE
> +     * needs both old and new transition tables whereas INSERT needs only new,
> +     * and DELETE needs only old.
> +     */
> +
> +    /* "old" transition table for UPDATE, if any */
> +    Tuplestorestate *old_upd_tuplestore;
> +    /* "new" transition table for UPDATE, if any */
> +    Tuplestorestate *new_upd_tuplestore;
> +    /* "old" transition table for DELETE, if any */
> +    Tuplestorestate *old_del_tuplestore;
> +    /* "new" transition table INSERT, if any */
> +    Tuplestorestate *new_ins_tuplestore;
> +
>      TupleTableSlot *storeslot;    /* for converting to tuplestore's format */
>  };

Do we need to do something slightly clever about the memory limits for the
tuplestore? Each of them having a separate work_mem makes nodeModifyTable.c
one memory hungry node (the worst of all maybe?).




> +
> +/*
> + * Perform MERGE.
> + */
> +TupleTableSlot *
> +ExecMerge(ModifyTableState *mtstate, ResultRelInfo *resultRelInfo,
> +          EState *estate, TupleTableSlot *slot)
> +{
> +    ExprContext *econtext = mtstate->ps.ps_ExprContext;
> +    ItemPointer tupleid;
> +    ItemPointerData tuple_ctid;
> +    bool        matched = false;
> +    Datum        datum;
> +    bool        isNull;
> +
> +    /*
> +     * Reset per-tuple memory context to free any expression evaluation
> +     * storage allocated in the previous cycle.
> +     */
> +    ResetExprContext(econtext);

Hasn't this already happened in ExecModifyTable()?


> +    /*
> +     * We run a JOIN between the target relation and the source relation to
> +     * find a set of candidate source rows that have a matching row in the
> +     * target table and a set of candidate source rows that do not have a
> +     * matching row in the target table. If the join returns a tuple with the
> +     * target relation's row-ID set, that implies that the join found a
> +     * matching row for the given source tuple. This case triggers the WHEN
> +     * MATCHED clause of the MERGE. Whereas a NULL in the target relation's
> +     * row-ID column indicates a NOT MATCHED case.
> +     */
> +    datum = ExecGetJunkAttribute(slot,
> +                                 resultRelInfo->ri_RowIdAttNo,
> +                                 &isNull);

Could this be put somewhere common with the equivalent call in
ExecModifyTable()?


> +     * A concurrent update can:
> +     *
> +     * 1. modify the target tuple so that it no longer satisfies the
> +     *    additional quals attached to the current WHEN MATCHED action
> +     *
> +     *    In this case, we are still dealing with a WHEN MATCHED case.
> +     *    In this case, we recheck the list of WHEN MATCHED actions from
> +     *    the start and choose the first one that satisfies the new target
> +     *    tuple.

Duplicated "in this case"


> +                case TM_Invisible:
> +
> +                    /*
> +                     * This state should never be reached since the underlying
> +                     * JOIN runs with a MVCC snapshot and EvalPlanQual runs
> +                     * with a dirty snapshot. So such a row should have never
> +                     * been returned for MERGE.
> +                     */
> +                    elog(ERROR, "unexpected invisible tuple");
> +                    break;

Seems like it was half duplicated at some point?


> +                case TM_Updated:
> +                case TM_Deleted:
> +
> +                    /*
> +                     * The target tuple was concurrently updated/deleted by
> +                     * some other transaction.
> +                     *
> +                     * If the current tuple is the last tuple in the update
> +                     * chain, then we know that the tuple was concurrently
> +                     * deleted. Just return and let the caller try NOT MATCHED
> +                     * actions.

Is it really correct to behave that way when using repeatable read /
serializable?


> +                /*
> +                 * Project the tuple.  In case of a partitioned table, the
> +                 * projection was already built to use the root's descriptor,
> +                 * so we don't need to map the tuple here.
> +                 */
> +                actionInfo.actionState = action;
> +                insert_slot = ExecProject(action->mas_proj);
> +
> +                (void) ExecInsert(mtstate, rootRelInfo,
> +                                  insert_slot, slot,
> +                                  estate, &actionInfo,
> +                                  mtstate->canSetTag);
> +                InstrCountFiltered1(&mtstate->ps, 1);

What happens if somebody concurrently inserts a conflicting row?

> --- a/src/include/executor/instrument.h
> +++ b/src/include/executor/instrument.h
> @@ -82,8 +82,11 @@ typedef struct Instrumentation
>      double        ntuples;        /* total tuples produced */
>      double        ntuples2;        /* secondary node-specific tuple counter */
>      double        nloops;            /* # of run cycles for this node */
> -    double        nfiltered1;        /* # of tuples removed by scanqual or joinqual */
> -    double        nfiltered2;        /* # of tuples removed by "other" quals */
> +    double        nfiltered1;        /* # tuples removed by scanqual or joinqual OR
> +                                 * # tuples inserted by MERGE */
> +    double        nfiltered2;        /* # tuples removed by "other" quals OR #
> +                                 * tuples updated by MERGE */
> +    double        nfiltered3;        /* # tuples deleted by MERGE */

It was a bad idea to have nfiltered1/2. Arriving at nfiltered3, it seems we
really should rename them...



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Support tab completion for upper character inputs in psql
Следующее
От: Zhihong Yu
Дата:
Сообщение: Re: support for MERGE