Обсуждение: ask for review of MERGE

Поиск

Список

Период

Сортировка

ask for review of MERGE

От

Boxuan Zhai

Дата:

23 сентября 2010 г., 07:31:33

Dear All,

I have just generate a new patch of MERGE command.

One main change in this edition is the removal of RASIE ERROR action from MEREG, because its semantics is not well defined yet.

I also rewrote the regress test file merge.sql, to make it consistent with the examples I used in my wiki page.

Some little (error and warning) bugs are fixed.

In this patch, all the main features of MERGE (sublinks, explain, rule and trigger, inheritance) are not changed. And so is the DO NOTHING action.

I do wish the MERGE command can be added into psql 9.1. And I wonder what improvement should be made on current edition.

Could you please have a review on this patch, if you have time and interest?

Your feedback will be highly appreciated.

Thanks

Yours Boxuan

Вложения

merge_v202.patch

Re: ask for review of MERGE

От

Thom Brown

Дата:

23 сентября 2010 г., 08:00:10

On 23 September 2010 11:31, Boxuan Zhai <bxzhai2010@gmail.com> wrote:
> Dear All,
> I have just generate a new patch of MERGE command.
> One main change in this edition is the removal of RASIE ERROR action from
> MEREG, because its semantics is not well defined yet.
> I also rewrote the regress test file merge.sql, to make it consistent with
> the examples I used in my wiki page.
> Some little (error and warning) bugs are fixed.
> In this patch, all the main features of MERGE (sublinks, explain, rule and
> trigger, inheritance) are not changed. And so is the DO NOTHING action.
> I do wish the MERGE command can be added into psql 9.1. And I wonder what
> improvement should be made on current edition.
> Could you please have a review on this patch, if you have time and interest?
> Your feedback will be highly appreciated.
> Thanks
>
> Yours Boxuan

A few corrections:

in src/backend/executor/nodeModifyTable.c:   s/excute/execute/   s/orignial/original/

in src/backend/optimizer/plan/planner.c   s/expreesions/expressions/   s/So,we/So, we/   s/comand/command/
s/fileds/fields/

in src/backend/optimizer/prep/preptlist.c:   s/aggresive/aggressive/

in src/backend/optimizer/util/var.c   s/targe/target/ -- this appears twice   s/sourse/source/

in src/backend/parser/analyze.c   s/takend/taken/   s/relaion/relation/   s/targe/target/ -- this appears twice
s/consitency/consistency/  s/commond/command/   s/seperate/separate/
 

in src/backend/rewrite/rewriteHandler.c   s/acton/action/

in src/include/nodes/execnodes.h   s/meger/merge/

in src/include/nodes/parsenodes.h   s/proecess/process/   s/allwo/allow/   s/elments/elements/

in src/test/regress/expected/merge.out   s/qulifications/qualifications/ -- this appears twice   s/suceeds/succeeds/ --
thisappears twice
 

-- 
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935

Re: ask for review of MERGE

От

Marko Tiikkaja

Дата:

23 сентября 2010 г., 08:55:37

On 2010-09-23 1:31 PM +0300, Boxuan Zhai wrote:
> I have just generate a new patch of MERGE command.

I haven't followed the discussion very closely, but this part in the 
regression tests caught my attention:

+-- we now have a duplicate key in Buy, so when we join to
+-- Stock we will generate 2 matching rows, not one.
+-- According to standard this command should fail.
+-- But it suceeds in PostgreSQL implementation by simply ignoring the 
second

It doesn't seem like a very good idea to go against the standard here. 
The "second" row is not well defined in this case so the results are 
unpredictable.


The patch is also missing a (trivial) change to explain.c.


Regards,
Marko Tiikkaja

Re: ask for review of MERGE

От

Boxuan Zhai

Дата:

23 сентября 2010 г., 10:49:07

On Thu, Sep 23, 2010 at 7:55 PM, Marko Tiikkaja <marko.tiikkaja@cs.helsinki.fi> wrote:

On 2010-09-23 1:31 PM +0300, Boxuan Zhai wrote:
I have just generate a new patch of MERGE command.

I haven't followed the discussion very closely, but this part in the regression tests caught my attention:

+-- we now have a duplicate key in Buy, so when we join to
+-- Stock we will generate 2 matching rows, not one.
+-- According to standard this command should fail.
+-- But it suceeds in PostgreSQL implementation by simply ignoring the second

It doesn't seem like a very good idea to go against the standard here. The "second" row is not well defined in this case so the results are unpredictable.

Yes, the result is uncertain. It depends on which row is scanned first, which is almost out of the control of users.

But, in postgres, this is what the system do for UPDATE.

For example, consider a simple update query like the following:

CREATE TABLE target (id int, val int);

INSERT INTO target VALUES (1, 10);

CREATE TABLE source (id int, add int);

INSERT INTO source VALUES (1, 100);

INSERT INTO source VALUES (1, 100000);

-- DO the update query with source table, which has multiple matched rows

UPDATE target SET val = val + add FROM source

WHERE source.id = target.id;

t=# SELECT * FROM target;

id | val

----+-----

1 | 110

(1 row)

The target tuple has two matched source tuples, but it is only updated once. And, yet, this query is not forbidden by postgres. The result is also uncertain.

The patch is also missing a (trivial) change to explain.c.

Sorry, I massed up the files. Here comes the new patch file, with EXPLAIN in it.

Regards,
Marko Tiikkaja

Вложения

merge_v203.patch

Re: ask for review of MERGE

От

Greg Smith

Дата:

24 сентября 2010 г., 05:13:38

Finding time for a review as large as this one is a bit tough, but I've 
managed to set aside a couple of days for it over the next week.  I'm 
delivering a large project tonight and intend to start in on the review 
work tomorrow onced that's cleared up.  If you're ever not sure who is 
working on your patch and what state they feel it's in, check 
https://commitfest.postgresql.org/action/commitfest_view?id=7 for an 
update; that's where we keep track of all that information.

Did you ever end up keeping a current version of this patch in an 
alternate repository location, such as github?  I thought I saw a 
suggestion from you about that, but after looking through the history 
here all I see are the diff patches you've been sending to the list.  
That's fine, just trying to confirm where everything is at.

-- 
Greg Smith, 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
PostgreSQL Training, Services and Support  www.2ndQuadrant.us
Author, "PostgreSQL 9.0 High Performance"    Pre-ordering at:
https://www.packtpub.com/postgresql-9-0-high-performance/book

Re: ask for review of MERGE

От

Josh Kupershmidt

Дата:

29 сентября 2010 г., 10:13:59

On Wed, Sep 29, 2010 at 2:44 AM, Greg Smith <greg@2ndquadrant.com> wrote:

> The rest of the compiler warnings I saw didn't look related to his code,
> maybe stuff my picky Ubuntu compiler is noticing that was done recently to
> HEAD. I haven't checked HEAD without this patch yet to confirm, and am done
> for the night now. Here's the list if anyone is interested:
>
> Warning in src/backend/parser/scan.c:
>
> gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith
> -Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing -fwrapv -g
> -I../../../src/include -D_GNU_SOURCE -c -o index.o index.c -MMD -MP -MF
> .deps/index.Po
> In file included from gram.y:12172:
> scan.c: In function ‘yy_try_NUL_trans’:
> scan.c:16256: warning: unused variable ‘yyg’

Known problem: http://archives.postgresql.org/pgsql-hackers/2009-07/msg00657.php

I'm pretty sure I've seen the warn_unused_result warnings on HEAD as well.

Josh

Re: ask for review of MERGE

От

Robert Haas

Дата:

29 сентября 2010 г., 10:16:52

On Wed, Sep 29, 2010 at 2:44 AM, Greg Smith <greg@2ndquadrant.com> wrote:
> One compiler warning I noticed that needs to get resolved:
>
> src/backend/commands/explain.c:
>
> explain.c: In function ‘ExplainMergeActions’:
> explain.c:1528: warning: comparison of distinct pointer types lacks a cast
>
> That is complaining about this section:
>
> if (mt_planstate->operation != CMD_MERGE ||
> mt_planstate->mt_mergeActPstates == NIL)
> return;
>
> So presumably that comparison with NIL needs a cast. Once I get more
> familiar with the code I'll fix that myself if Boxuan doesn't offer a
> suggestion first.

Possibly NULL was meant instead of NIL.  NIL is specifically for a List.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

Re: ask for review of MERGE

От

Boxuan Zhai

Дата:

29 сентября 2010 г., 10:29:29

On Wed, Sep 29, 2010 at 9:16 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Sep 29, 2010 at 2:44 AM, Greg Smith <greg@2ndquadrant.com> wrote:
> One compiler warning I noticed that needs to get resolved:
>
> src/backend/commands/explain.c:
>
> explain.c: In function ‘ExplainMergeActions’:
> explain.c:1528: warning: comparison of distinct pointer types lacks a cast
>
> That is complaining about this section:
>
> if (mt_planstate->operation != CMD_MERGE ||
> mt_planstate->mt_mergeActPstates == NIL)
> return;
>
> So presumably that comparison with NIL needs a cast. Once I get more
> familiar with the code I'll fix that myself if Boxuan doesn't offer a
> suggestion first.

Possibly NULL was meant instead of NIL. NIL is specifically for a List.

Yes, it should be NULL instead of NIL.

At first, I designed this filed as a List. But I changed it into an array in the last editions. This is why I have an unmatched assignment here. Sorry for that.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

Re: ask for review of MERGE

От

Robert Haas

Дата:

14 октября 2010 г., 07:25:16

On Wed, Sep 29, 2010 at 2:44 AM, Greg Smith <greg@2ndquadrant.com> wrote:
> Starting looking at the latest MERGE patch from Boxuan here tonight. The
> regression tests pass for me here, good starting sign. I expect to move onto
> trying to break it more creatively next, then onto performance tests.
> Nothing much more exciting than that to report so far.

Greg, are you still working on a review of this patch?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: ask for review of MERGE

От

Greg Smith

Дата:

14 октября 2010 г., 11:29:58

Robert Haas wrote:
> Greg, are you still working on a review of this patch?
>   

Yes, just had more distractions while coming to speed up on this area 
than I'd hoped.  I'll get a second round of looking at this done by the 
weekend.

-- 
Greg Smith, 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
PostgreSQL Training, Services and Support  www.2ndQuadrant.us

Re: ask for review of MERGE

От

Greg Smith

Дата:

17 октября 2010 г., 21:20:36

Boxuan Zhai wrote:
> Yes, it should be NULL instead of NIL.

OK, I applied that patch to my local copy and pushed it out to github:

http://github.com/greg2ndQuadrant/postgres/commit/9013ba9e81490e3623add1b029760817021297c0

That represents what I tested against.  However, when I tried to merge
against HEAD once I was finished, I discovered this patch has bit-rotted
significantly.  If you have a local copy that works for you, I would not
recommend pulling in the PostgreSQL repo updates done in the last couple
of weeks yet.  Friday's "Allow WITH clauses to be attached to INSERT,
UPDATE, DELETE statements" commit in particular conflicts quite a bit
with your changes.  Attached is a rebased patch that applies to HEAD now
after a round of fixes to resolve those.  But it doesn't work yet,
because of recent change to the ExecUpdate and ExecDelete functions
you're calling from src/backend/executor/nodeModifyTable.c inside
ExecMerge.  If you can continue working on the patch without merging
recent repo work, I can hack away at fixing that once I figure out what
got changed there recently.  It's taken some painful git work to sort
out what I've done so far, there's more left to do, and I know that's
not an area you specialize in.

Back to the feature review...I dove into how I expected this to work,
relative to what it actually does at the moment.  That didn't really go
too well so far, but I don't know that this represents any fundamental
issue with the patch.  Just situations the existing code didn't really
anticipate we have to flush out.  As a general community FYI here, while
it's taken me a while to get up to speed on this whole feature, I expect
to keep chugging away on this regardless of the CommitFest boundaries.
This feature is both too big and too important to just stop working on
it because a date has passed.

Onto the test cases.  The examples that Boxuan has been working with,
and that the regression tests included with the patch exercise, all
involve two tables being joined together using MERGE.  The use case I
decided to test instead was when you're trying to simulate an UPSERT
where only a single table is involved.  I couldn't get to this to work
correctly.  Maybe I'm just using MERGE wrong here, but I tried so many
different variations without success (including one that's basically
copied from Simon's original regression test set suggestions) that I
suspect there may be a subtle problem with the implementation instead.

To replicate the most straightforward variations of what I ran into, you
can start with the same data that's populated by the regression test set:

CREATE TABLE Stock(item_id int UNIQUE, balance int);
INSERT INTO Stock VALUES (10, 2200);
INSERT INTO Stock VALUES (20, 1900);
SELECT * FROM Stock;

 item_id | balance
---------+---------
      10 |    2200
      20 |    1900

If you now execute the following:

MERGE INTO Stock t
  USING (SELECT * FROM Stock WHERE item_id=10) AS s
  ON s.item_id=t.item_id
  WHEN MATCHED THEN UPDATE SET balance=s.balance + 1
  WHEN NOT MATCHED THEN INSERT VALUES (10,1)
  ;

This works fine, and updates the matching row:

 item_id | balance
---------+---------
      20 |    1900
      10 |    2201


But if I give it a key that doesn't exist instead:

MERGE INTO Stock t
  USING (SELECT * FROM Stock WHERE item_id=30) AS s
  ON s.item_id=t.item_id
  WHEN MATCHED THEN UPDATE SET balance=s.balance + 1
  WHEN NOT MATCHED THEN INSERT VALUES (30,1)
  ;

This doesn't execute the NOT MATCHED case and INSERT the way I expected
it to.  It just gives back "MERGE 0".

Since I wasn't sure if the whole "subquery in the USING clause" case was
really implemented fully, I then tried to do this with something more
like the working regression test examples.  I expected this to do the
same thing as the first example:

MERGE INTO Stock t
  USING Stock s
  ON s.item_id=10 AND s.item_id=t.item_id
  WHEN MATCHED THEN UPDATE SET balance=s.balance + 1
  WHEN NOT MATCHED THEN INSERT VALUES (10,1)
  ;

But it gives back this:

ERROR:  duplicate key value violates unique constraint "stock_item_id_key"
DETAIL:  Key (item_id)=(10) already exists.

Can't tell from that whether it's hitting the MATCHED or NOT MATCHED
side of things to generate that.  But it doesn't work any better if you
give it an example that doesn't exist:

MERGE INTO Stock t
  USING Stock s
  ON s.item_id=30 AND s.item_id=t.item_id
  WHEN MATCHED THEN UPDATE SET balance=s.balance + 1
  WHEN NOT MATCHED THEN INSERT VALUES (30,1)
  ;

ERROR:  duplicate key value violates unique constraint "stock_item_id_key"
DETAIL:  Key (item_id)=(30) already exists.

Now that one is really weird, because that key value certainly doesn't
exist yet in the table.  There seem to be a couple of issues in the area
of joining a table with itself here.  Feel free to tell me I just don't
know how to use MERGE if that's the case in any of these.

The other thing I noticed that may take some work to sort out is that I
haven't had any luck getting MERGE to execute from within a plpgsql
function.  I was hoping I could use this to update the pgbench tables:

CREATE OR REPLACE FUNCTION merge_account_balance(key INT, delta NUMERIC)
RETURNS VOID AS
$$
BEGIN
MERGE INTO pgbench_accounts t USING (SELECT * FROM pgbench_accounts
WHERE aid = key) AS s ON t.aid=s.aid WHEN MATCHED THEN UPDATE SET
abalance = s.abalance + delta WHEN NOT MATCHED THEN INSERT VALUES
(key,1+(key / 100000)::integer,delta,'');
END;
$$
LANGUAGE plpgsql;

But I just get this instead:

ERROR:  "pgbench_accounts" is not a known variable
LINE 4: MERGE INTO pgbench_accounts t USING (SELECT * FROM p...

The other way I wrote the MERGE statement above (not using a subquery)
does the same thing.  I know that error messages is coming from the
changes introduced in
http://archives.postgresql.org/pgsql-committers/2010-01/msg00161.php but
I'm not familiar enough with the whole grammar implementation to know
what that means yet.

That's what I found so far in my second pass over this.  Once the first
problem here is sorted out, I've already worked out how to test the
performance of the code using pgbench.  Have all the scripts ready to go
once the correct MERGE statement is plugged into them, just ran into
this same class of problems when I tried them.  So far I was only able
to see how fast the UPDATE path worked though, which isn't very helpful
yet.  My hope here is to test the MERGE implementation vs. the classic
pl/pgsql implementation of UPSERT, calling both within a function so
it's a fair comparison, and see how that goes.  This may flush out
concurrency bugs that are in the MERGE code as well.

--
Greg Smith, 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
PostgreSQL Training, Services and Support  www.2ndQuadrant.us


diff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
index f5d67a2..29652ac 100644
--- a/doc/src/sgml/ref/allfiles.sgml
+++ b/doc/src/sgml/ref/allfiles.sgml
@@ -119,6 +119,7 @@ Complete list of usable sgml source files in this directory.
 <!entity listen             system "listen.sgml">
 <!entity load               system "load.sgml">
 <!entity lock               system "lock.sgml">
+<!entity merge              system "merge.sgml">
 <!entity move               system "move.sgml">
 <!entity notify             system "notify.sgml">
 <!entity prepare            system "prepare.sgml">
diff --git a/doc/src/sgml/ref/merge.sgml b/doc/src/sgml/ref/merge.sgml
new file mode 100644
index 0000000..7c73623
--- /dev/null
+++ b/doc/src/sgml/ref/merge.sgml
@@ -0,0 +1,409 @@
+<!--
+$PostgreSQL$
+-->
+
+<refentry id="SQL-MERGE">
+ <refmeta>
+  <refentrytitle id="SQL-MERGE-TITLE">MERGE</refentrytitle>
+  <refmiscinfo>SQL - Language Statements</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+  <refname>MERGE</refname>
+  <refpurpose>update, insert or delete rows of a table based upon source data</refpurpose>
+ </refnamediv>
+
+ <indexterm zone="sql-merge">
+  <primary>MERGE</primary>
+ </indexterm>
+
+ <refsynopsisdiv>
+<synopsis>
+MERGE INTO <replaceable class="PARAMETER">table</replaceable> [ [ AS ] <replaceable
class="parameter">alias</replaceable>] 
+USING <replaceable class="PARAMETER">source-query</replaceable>
+ON <replaceable class="PARAMETER">join_condition</replaceable>
+[<replaceable class="PARAMETER">when_clause</replaceable> [...]]
+
+where <replaceable class="PARAMETER">when_clause</replaceable> is
+
+{ WHEN MATCHED [ AND <replaceable class="PARAMETER">condition</replaceable> ] THEN { <replaceable
class="PARAMETER">merge_update</replaceable>| DELETE | DO NOTHING | RAISE ERROR} 
+  WHEN NOT MATCHED [ AND <replaceable class="PARAMETER">condition</replaceable> ] THEN { <replaceable
class="PARAMETER">merge_insert</replaceable>| DO NOTHING | RAISE ERROR} } 
+
+where <replaceable class="PARAMETER">merge_update</replaceable> is
+
+UPDATE SET { <replaceable class="PARAMETER">column</replaceable> = { <replaceable
class="PARAMETER">expression</replaceable>| DEFAULT } | 
+           ( <replaceable class="PARAMETER">column</replaceable> [, ...] ) = ( { <replaceable
class="PARAMETER">expression</replaceable>| DEFAULT } [, ...] ) } [, ...] 
+
+and <replaceable class="PARAMETER">merge_insert</replaceable> is
+
+INSERT [( <replaceable class="PARAMETER">column</replaceable> [, ...] )] { VALUES ( { <replaceable
class="PARAMETER">expression</replaceable>| DEFAULT } [, ...] ) | DEFAULT VALUES } 
+</synopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+  <title>Description</title>
+
+  <para>
+   <command>MERGE</command> performs at most one action on each row from
+   the target table, driven by the rows from the source query. This
+   provides a way to specify a single SQL statement that can conditionally
+   <command>UPDATE</command> or <command>INSERT</command> rows, a task
+   that would otherwise require multiple procedural language statements.
+  </para>
+
+  <para>
+   First, the <command>MERGE</command> command performs a left outer join
+   from source query to target table, producing zero or more merged rows. For
+   each merged row, <literal>WHEN</> clauses are evaluated in the
+   specified order until one of them is activated. The corresponding action
+   is then applied and processing continues for the next row.
+  </para>
+
+  <para>
+   <command>MERGE</command> actions have the same effect as
+   regular <command>UPDATE</command>, <command>INSERT</command>, or
+   <command>DELETE</command> commands of the same names, though the syntax
+   is slightly different.
+  </para>
+
+  <para>
+   If no <literal>WHEN</> clause activates then an implicit action of
+   <literal>RAISE ERROR</> is performed for that row. If that
+   implicit action is not desirable an explicit action of
+   <literal>DO NOTHING</> may be specified instead.
+  </para>
+
+  <para>
+   <command>MERGE</command> will only affect rows only in the specified table.
+  </para>
+
+  <para>
+   There is no <literal>RETURNING</> clause with <command>MERGE</command>.
+  </para>
+
+  <para>
+   There is no MERGE privilege.
+   You must have the <literal>UPDATE</literal> privilege on the table
+   if you specify an update action, the <literal>INSERT</literal> privilege if
+   you specify an insert action and/or the <literal>DELETE</literal> privilege
+   if you wish to delete. You will also require the
+   <literal>SELECT</literal> privilege to any table whose values are read
+   in the <replaceable class="parameter">expressions</replaceable> or
+   <replaceable class="parameter">condition</replaceable>.
+  </para>
+ </refsect1>
+
+ <refsect1>
+  <title>Parameters</title>
+
+  <variablelist>
+   <varlistentry>
+    <term><replaceable class="PARAMETER">table</replaceable></term>
+    <listitem>
+     <para>
+      The name (optionally schema-qualified) of the table to merge into.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><replaceable class="parameter">alias</replaceable></term>
+    <listitem>
+     <para>
+      A substitute name for the target table. When an alias is
+      provided, it completely hides the actual name of the table.  For
+      example, given <literal>MERGE foo AS f</>, the remainder of the
+      <command>MERGE</command> statement must refer to this table as
+      <literal>f</> not <literal>foo</>.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><replaceable class="PARAMETER">source-query</replaceable></term>
+    <listitem>
+     <para>
+      A query (<command>SELECT</command> statement or <command>VALUES</command>
+      statement) that supplies the rows to be merged into the target table.
+      Refer to the <xref linkend="sql-select">
+      statement or <xref linkend="sql-values">
+      statement for a description of the syntax.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><replaceable class="PARAMETER">join_condition</replaceable></term>
+    <listitem>
+     <para>
+      <replaceable class="parameter">join_condition</replaceable> is
+      an expression resulting in a value of type
+      <type>boolean</type> (similar to a <literal>WHERE</literal>
+      clause) that specifies which rows in the join are considered to
+      match.  You should ensure that the join produces at most one output
+      row for each row to be modified.  An attempt to modify any row of the
+      target table more than once will result in an error.  This behaviour
+      requires the user to take greater care in using <command>MERGE</command>,
+      though is required explicitly by the SQL Standard.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><replaceable class="PARAMETER">condition</replaceable></term>
+    <listitem>
+     <para>
+      An expression that returns a value of type <type>boolean</type>.
+      If this expression returns <literal>true</> then the <literal>WHEN</>
+      clause will be activated and the corresponding action will occur for
+      that row.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><replaceable class="PARAMETER">merge_update</replaceable></term>
+    <listitem>
+     <para>
+      The specification of an <literal>UPDATE</> action.  Do not include
+      the table name, as you would normally do with an
+      <xref linkend="sql-update"> command.
+      For example, <literal>UPDATE tab SET col = 1</> is invalid. Also,
+      do not include a <literal>WHERE</> clause, since only the current
+      can be updated. For example,
+      <literal>UPDATE SET col = 1 WHERE key = 57</> is invalid.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><replaceable class="PARAMETER">merge_insert</replaceable></term>
+    <listitem>
+     <para>
+      The specification of an <literal>INSERT</> action.  Do not include
+      the table name, as you would normally do with an
+      <xref linkend="sql-insert"> command.
+      For example, <literal>INSERT INTO tab VALUES (1, 50)</> is invalid.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><replaceable class="PARAMETER">column</replaceable></term>
+    <listitem>
+     <para>
+      The name of a column in <replaceable
+      class="PARAMETER">table</replaceable>.
+      The column name can be qualified with a subfield name or array
+      subscript, if needed.  Do not include the table's name in the
+      specification of a target column — for example,
+      <literal>UPDATE SET tab.col = 1</> is invalid.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><replaceable class="PARAMETER">expression</replaceable></term>
+    <listitem>
+     <para>
+      An expression to assign to the column.  The expression can use the
+      old values of this and other columns in the table.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term><literal>DEFAULT</literal></term>
+    <listitem>
+     <para>
+      Set the column to its default value (which will be NULL if no
+      specific default expression has been assigned to it).
+     </para>
+    </listitem>
+   </varlistentry>
+
+  </variablelist>
+ </refsect1>
+
+ <refsect1>
+  <title>Outputs</title>
+
+  <para>
+   On successful completion, a <command>MERGE</> command returns a command
+   tag of the form
+<screen>
+MERGE <replaceable class="parameter">total-count</replaceable>
+</screen>
+   The <replaceable class="parameter">total-count</replaceable> is the number
+   of rows changed (either updated, inserted or deleted).
+   If <replaceable class="parameter">total-count</replaceable> is 0, no rows
+   were changed (this is not considered an error).
+  </para>
+
+  <para>
+   The number of rows updated, inserted or deleted is not available as part
+   of the command tag. An optional NOTIFY message can be generated to
+   present this information, if desired.
+<screen>
+NOTIFY:  34 rows processed: 11 updated, 5 deleted, 15 inserted, 3 default inserts, 0 no action
+</screen>
+  </para>
+
+ </refsect1>
+
+ <refsect1>
+  <title>Notes</title>
+
+  <para>
+   What essentially happens is that the target table is left outer-joined to
+   the tables mentioned in the <replaceable>source-query</replaceable>, and
+   each output row of the join may then activate at most one when-clause.
+   The row will be matched only once per statement, so the status of
+   <literal>MATCHED</> or <literal>NOT MATCHED</> cannot change once testing
+   of <literal>WHEN</> clauses has begun. <command>MERGE</command> will not
+   invoke Rules.
+  </para>
+
+  <para>
+   The following steps take place during the execution of
+   <command>MERGE</command>.
+    <orderedlist>
+     <listitem>
+      <para>
+       Perform any BEFORE STATEMENT triggers for actions specified, whether or
+       not they actually occur.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Perform left outer join from source to target table. Then for each row:
+       <orderedlist>
+        <listitem>
+         <para>
+          Evaluate whether each row is MATCHED or NOT MATCHED.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          Test each WHEN condition in the order specified until one activates.
+          Identify the action and its event type.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          Perform any BEFORE ROW triggers that fire for the action's event type.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          Apply the action specified.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          Perform any AFTER ROW triggers that fire for the action's event type.
+         </para>
+        </listitem>
+       </orderedlist>
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Perform any AFTER STATEMENT triggers for actions specified, whether or
+       not they actually occur.
+      </para>
+     </listitem>
+    </orderedlist>
+   In summary, statement triggers for an event type (say, INSERT) will
+   be fired whenever we <emphasis>specify</> an action of that kind. Row-level
+   triggers will fire only for event type <emphasis>activated</>.
+   So a <command>MERGE</command> might fire statement triggers for both
+   <literal>UPDATE</> and <literal>INSERT</>, even though only
+   <literal>UPDATE</> row triggers were fired.
+  </para>
+
+ </refsect1>
+
+ <refsect1>
+  <title>Examples</title>
+
+  <para>
+   Attempt to insert a new stock item along with the quantity of stock. If
+   the item already exists, instead update the stock count of the existing
+   item.
+<programlisting>
+MERGE INTO wines w
+USING (VALUES('Chateau Lafite 2003', '24')) v
+ON v.column1 = w.winename
+WHEN NOT MATCHED THEN
+  INSERT VALUES(v.column1, v.column2)
+WHEN MATCHED THEN
+  UPDATE SET stock = stock + v.column2;
+</programlisting>
+  </para>
+
+  <para>
+   Perform maintenance on CustomerAccounts based upon new Transactions.
+   The following statement will fail if any accounts have had more than
+   one transaction
+
+<programlisting>
+MERGE CustomerAccount CA
+
+USING (SELECT CustomerId, TransactionValue,
+       FROM Transactions
+       WHERE TransactionId > 35345678) AS T
+
+ON T.CustomerId = CA.CustomerId
+
+WHEN MATCHED THEN
+  UPDATE SET Balance = Balance - TransactionValue
+
+WHEN NOT MATCHED THEN
+  INSERT (CustomerId, Balance)
+  VALUES (T.CustomerId, T.TransactionValue)
+;
+</programlisting>
+
+   so the right way to do this is to pre-aggregate the data
+
+<programlisting>
+MERGE CustomerAccount CA
+
+USING (SELECT CustomerId, Sum(TransactionValue) As TransactionSum
+       FROM Transactions
+       WHERE TransactionId > 35345678
+       GROUP BY CustomerId) AS T
+
+ON T.CustomerId = CA.CustomerId
+
+WHEN MATCHED THEN
+  UPDATE SET Balance = Balance - TransactionSum
+
+WHEN NOT MATCHED THEN
+  INSERT (CustomerId, Balance)
+  VALUES (T.CustomerId, T.TransactionSum)
+;
+</programlisting>
+  </para>
+
+ </refsect1>
+
+ <refsect1>
+  <title>Compatibility</title>
+
+  <para>
+   This command conforms to the <acronym>SQL</acronym> standard, except
+   that the <literal>DELETE</literal> and <literal>DO NOTHING</> actions
+   are <productname>PostgreSQL</productname> extensions.
+  </para>
+
+  <para>
+   According to the standard, the column-list syntax for an <literal>UPDATE</>
+   action should allow a list of columns to be assigned from a single
+   row-valued expression.
+   This is not currently implemented — the source must be a list
+   of independent expressions.
+  </para>
+ </refsect1>
+</refentry>
diff --git a/doc/src/sgml/ref/update.sgml b/doc/src/sgml/ref/update.sgml
index 5968db1..4b681bb 100644
--- a/doc/src/sgml/ref/update.sgml
+++ b/doc/src/sgml/ref/update.sgml
@@ -340,6 +340,9 @@ UPDATE wines SET stock = stock + 24 WHERE winename = 'Chateau Lafite 2003';
 -- continue with other operations, and eventually
 COMMIT;
 </programlisting>
+
+    This operation can be executed in a single statement using
+    <xref linkend="sql-merge" endterm="sql-merge-title">.
   </para>

   <para>
diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml
index 463746c..cf9d678 100644
--- a/doc/src/sgml/reference.sgml
+++ b/doc/src/sgml/reference.sgml
@@ -147,6 +147,7 @@
    &listen;
    &load;
    &lock;
+   &merge;
    &move;
    ¬ify;
    &prepare;
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index f494ec9..a5814a7 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -81,6 +81,8 @@ static void show_sort_keys_common(PlanState *planstate,
 static void show_sort_info(SortState *sortstate, ExplainState *es);
 static void show_hash_info(HashState *hashstate, ExplainState *es);
 static const char *explain_get_index_name(Oid indexId);
+static void ExplainMergeActions(ModifyTableState *mt_planstate,
+                    List *ancestors, ExplainState *es);
 static void ExplainScanTarget(Scan *plan, ExplainState *es);
 static void ExplainMemberNodes(List *plans, PlanState **planstates,
                    List *ancestors, ExplainState *es);
@@ -644,6 +646,9 @@ ExplainNode(PlanState *planstate, List *ancestors,
                 case CMD_DELETE:
                     pname = operation = "Delete";
                     break;
+                case CMD_MERGE:
+                    pname = operation = "Merge";
+                    break;
                 default:
                     pname = "???";
                     break;
@@ -1207,6 +1212,11 @@ ExplainNode(PlanState *planstate, List *ancestors,
     if (innerPlanState(planstate))
         ExplainNode(innerPlanState(planstate), ancestors,
                     "Inner", NULL, es);
+
+    if (IsA(plan, ModifyTable) &&
+                ((ModifyTable *)plan)->operation == CMD_MERGE)
+            ExplainMergeActions((ModifyTableState *) planstate,
+                                        ancestors, es);

     /* special child plans */
     switch (nodeTag(plan))
@@ -1547,6 +1557,89 @@ explain_get_index_name(Oid indexId)
     return result;
 }

+static void
+ExplainMergeActions(ModifyTableState *mt_planstate, List *ancestors,
+                    ExplainState *es)
+{
+    ListCell *l;
+    int actno = 1;
+    StringInfo buf = makeStringInfo();
+    StringInfo acttitle = makeStringInfo();
+    MergeActionSet *actset;
+
+    if (mt_planstate->operation != CMD_MERGE ||
+        mt_planstate->mt_mergeActPstates == NULL)
+        return;
+
+    actset = mt_planstate->mt_mergeActPstates[0];
+
+    foreach(l, actset->actions)
+    {
+        ModifyTableState *mt_state = (ModifyTableState *) lfirst(l);
+        MergeActionState *act_pstate = (MergeActionState *) mt_state->mt_plans[0];
+        MergeAction *act_plan = (MergeAction *) act_pstate->ps.plan;
+
+        /*prepare the title*/
+        resetStringInfo(acttitle);
+        appendStringInfo(acttitle, "Action %d", actno);
+
+        /*prepare the string for printing*/
+        resetStringInfo(buf);
+        switch(act_pstate->operation)
+        {
+            case CMD_INSERT:
+                appendStringInfoString(buf, "Insert When ");
+                break;
+            case CMD_UPDATE:
+                appendStringInfoString(buf, "Update When ");
+                break;
+            case CMD_DELETE:
+                appendStringInfoString(buf, "Delete When ");
+                break;
+            case CMD_DONOTHING:
+                appendStringInfoString(buf, "Do Nothing When ");
+                break;
+            default:
+                elog(ERROR, "unknown merge action");
+        }
+
+        if (act_plan->matched)
+            appendStringInfoString(buf, "Matched ");
+        else
+            appendStringInfoString(buf, "Not Mactched ");
+
+        if (act_plan->flattenedqual)
+            appendStringInfoString(buf, "And ");
+
+        /*print the action type*/
+        ExplainPropertyText(acttitle->data, buf->data, es);
+
+        /*print the action qual*/
+        show_qual(act_plan->flattenedqual, "Qual",
+                  &act_pstate->ps, ancestors, true, es);
+
+        /*print the target list of action*/
+        if (es->verbose &&
+            (act_plan->operation == CMD_INSERT ||
+            act_plan->operation == CMD_UPDATE))
+        {
+            List *orignialtlist;
+
+            orignialtlist = act_plan->plan.targetlist;
+            act_plan->plan.targetlist = act_plan->flattenedtlist;
+            show_plan_tlist((PlanState *) act_pstate, ancestors, es);
+            act_plan->plan.targetlist = orignialtlist;
+        }
+
+        if (act_pstate->ps.subPlan)
+            ExplainSubPlans(act_pstate->ps.subPlan, ancestors, "SubPlan", es);
+
+        actno++;
+    }
+
+    ExplainPropertyText("MainPlan", "", es);
+}
+
 /*
  * Show the target of a Scan node
  */
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index d69fdcf..0bd0b68 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -2443,6 +2443,87 @@ ExecASTruncateTriggers(EState *estate, ResultRelInfo *relinfo)
                               false, NULL, NULL, NIL, NULL);
 }

+void
+ExecBSMergeTriggers(ModifyTableState *mt_state)
+{
+    ListCell *l;
+    MergeActionSet *actset;
+    bool doUpdateTriggers = false;
+    bool doInsertTriggers = false;
+    bool doDeleteTriggers = false;
+
+    /* Scan the actions to see what kind of statements there is */
+    actset = mt_state->mt_mergeActPstates[0];
+    foreach(l, actset->actions)
+    {
+        ModifyTableState *actmtstate;
+        MergeActionState *actPstate;
+        MergeAction *actplan;
+
+        actmtstate = (ModifyTableState *) lfirst(l);
+        actPstate = (MergeActionState *) actmtstate->mt_plans[0];
+        actplan = (MergeAction *) actPstate->ps.plan;
+
+        if (actplan->operation == CMD_UPDATE)
+            doUpdateTriggers = true;
+        else if (actplan->operation == CMD_INSERT)
+            doInsertTriggers = true;
+        else if (actplan->operation == CMD_DELETE)
+            doDeleteTriggers = true;
+    }
+
+    /* And fire the triggers */
+    if (doUpdateTriggers)
+        ExecBSUpdateTriggers(mt_state->ps.state,
+                             mt_state->ps.state->es_result_relations);
+    if (doInsertTriggers)
+        ExecBSInsertTriggers(mt_state->ps.state,
+                             mt_state->ps.state->es_result_relations);
+    if (doDeleteTriggers)
+        ExecBSDeleteTriggers(mt_state->ps.state,
+                             mt_state->ps.state->es_result_relations);
+}
+
+void
+ExecASMergeTriggers(ModifyTableState *mt_state)
+{
+    ListCell *l;
+    MergeActionSet *actset;
+    bool doUpdateTriggers = false;
+    bool doInsertTriggers = false;
+    bool doDeleteTriggers = false;
+
+    /* Scan the actions to see what kind of statements there is */
+    actset = mt_state->mt_mergeActPstates[0];
+    foreach(l, actset->actions)
+    {
+        ModifyTableState *actmtstate;
+        MergeActionState *actPstate;
+        MergeAction *actplan;
+
+        actmtstate = (ModifyTableState *)lfirst(l);
+        actPstate = (MergeActionState *)actmtstate->mt_plans[0];
+        actplan = (MergeAction *)actPstate->ps.plan;
+
+        if(actplan->operation == CMD_UPDATE)
+            doUpdateTriggers = true;
+        else if(actplan->operation == CMD_INSERT)
+            doInsertTriggers = true;
+        else if(actplan->operation == CMD_DELETE)
+            doDeleteTriggers = true;
+    }
+
+    /* And fire the triggers */
+    if (doUpdateTriggers)
+        ExecASUpdateTriggers(mt_state->ps.state,
+                             mt_state->ps.state->es_result_relations);
+    if (doInsertTriggers)
+        ExecASInsertTriggers(mt_state->ps.state,
+                             mt_state->ps.state->es_result_relations);
+    if (doDeleteTriggers)
+        ExecASDeleteTriggers(mt_state->ps.state,
+                             mt_state->ps.state->es_result_relations);
+}

 static HeapTuple
 GetTupleForTrigger(EState *estate,
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 69f3a28..8637fa8 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -170,6 +170,7 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
         case CMD_INSERT:
         case CMD_DELETE:
         case CMD_UPDATE:
+        case CMD_MERGE:
             estate->es_output_cid = GetCurrentCommandId(true);
             break;

diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index edd175e..2be0491 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -154,6 +154,11 @@ ExecInitNode(Plan *node, EState *estate, int eflags)
                                                        estate, eflags);
             break;

+        case T_MergeAction:
+            result = (PlanState *) ExecInitMergeAction((MergeAction *) node,
+                                                       estate, eflags);
+            break;
+
         case T_Append:
             result = (PlanState *) ExecInitAppend((Append *) node,
                                                   estate, eflags);
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 541adaf..893794f 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -704,6 +704,103 @@ lreplace:;
     return NULL;
 }

+static TupleTableSlot *
+ExecMerge(ItemPointer tupleid,
+           TupleTableSlot *slot,
+           TupleTableSlot *planSlot,
+           MergeActionSet *actset,
+           EState *estate)
+{
+
+    TupleTableSlot *actslot = NULL;
+    ListCell *each;
+
+    /*
+     * Try the merge actions one by one until we have a match.
+     */
+    foreach(each, actset->actions)
+    {
+        ModifyTableState *mt_pstate;
+        MergeActionState *action_pstate;
+        ExprContext *econtext;
+        bool matched;
+
+        mt_pstate = (ModifyTableState *) lfirst(each);
+        Assert(IsA(mt_pstate, ModifyTableState));
+
+        /*
+         * mt_pstate is supposed to have only ONE mt_plans,
+         * which is a MergeActionState
+         */
+        action_pstate = (MergeActionState *) mt_pstate->mt_plans[0];
+        matched = ((MergeAction *)action_pstate->ps.plan)->matched;
+
+        /*
+         * If tupleid == NULL, it is a NOT MATCHED case,
+         * else, it is a MATCHED case,
+         */
+        if ((tupleid == NULL && matched) ||
+            (tupleid != NULL && !matched))
+            continue;
+
+        /* Setup the expression context. */
+        econtext = action_pstate->ps.ps_ExprContext;
+
+        /*
+         * Check that additional quals match, if any.
+         */
+        if (action_pstate->ps.qual)
+        {
+            ResetExprContext(econtext);
+
+            econtext->ecxt_scantuple = slot;
+            econtext->ecxt_outertuple = planSlot;
+
+            if (!ExecQual(action_pstate->ps.qual, econtext, false))
+                continue;
+        }
+
+        /* Ok, we have a match. Perform the action */
+
+        /* First project any RETURNING result tuple slot, if needed */
+        if (action_pstate->operation == CMD_INSERT ||
+            action_pstate->operation == CMD_UPDATE)
+            actslot = ExecProcessReturning(action_pstate->ps.ps_ProjInfo,
+                                           slot, planSlot);
+
+        switch (action_pstate->operation)
+        {
+            case CMD_INSERT:
+                return ExecInsert(actslot, planSlot, estate);
+
+            case CMD_UPDATE:
+                return ExecUpdate(tupleid,
+                                  actslot,
+                                  planSlot,
+                                  &mt_pstate->mt_epqstate,
+                                  estate);
+
+            case CMD_DELETE:
+                return ExecDelete(tupleid,
+                                  planSlot,
+                                  &mt_pstate->mt_epqstate,
+                                  estate);
+
+            case CMD_DONOTHING:
+                return NULL;
+
+            default:
+                elog(ERROR, "unknown merge action type for execute");
+                break;
+        }
+    }
+
+    /*
+     * No matching action found. Perform the default action, which is
+     * DO NOTHING.
+     */
+    return NULL;
+}

 /*
  * Process BEFORE EACH STATEMENT triggers
@@ -725,6 +822,9 @@ fireBSTriggers(ModifyTableState *node)
             ExecBSDeleteTriggers(node->ps.state,
                                  node->ps.state->es_result_relations);
             break;
+        case CMD_MERGE:
+            ExecBSMergeTriggers(node);
+            break;
         default:
             elog(ERROR, "unknown operation");
             break;
@@ -751,6 +851,9 @@ fireASTriggers(ModifyTableState *node)
             ExecASDeleteTriggers(node->ps.state,
                                  node->ps.state->es_result_relations);
             break;
+        case CMD_MERGE:
+            ExecASMergeTriggers(node);
+            break;
         default:
             elog(ERROR, "unknown operation");
             break;
@@ -777,6 +880,7 @@ ExecModifyTable(ModifyTableState *node)
     ItemPointer tupleid = NULL;
     ItemPointerData tuple_ctid;
     HeapTupleHeader    oldtuple = NULL;
+    MergeActionSet *mergeActSet = NULL;

     /*
      * On first call, fire BEFORE STATEMENT triggers before proceeding.
@@ -798,6 +902,8 @@ ExecModifyTable(ModifyTableState *node)
     /* Preload local variables */
     subplanstate = node->mt_plans[node->mt_whichplan];
     junkfilter = estate->es_result_relation_info->ri_junkFilter;
+    if(node->mt_mergeActPstates)
+        mergeActSet = node->mt_mergeActPstates[node->mt_whichplan];

     /*
      * Fetch rows from subplan(s), and execute the required table modification
@@ -824,6 +930,8 @@ ExecModifyTable(ModifyTableState *node)
                 estate->es_result_relation_info++;
                 subplanstate = node->mt_plans[node->mt_whichplan];
                 junkfilter = estate->es_result_relation_info->ri_junkFilter;
+                if(node->mt_mergeActPstates)
+                    mergeActSet = node->mt_mergeActPstates[node->mt_whichplan];
                 EvalPlanQualSetPlan(&node->mt_epqstate, subplanstate->plan);
                 continue;
             }
@@ -839,7 +947,7 @@ ExecModifyTable(ModifyTableState *node)
             /*
              * extract the 'ctid' or 'wholerow' junk attribute.
              */
-            if (operation == CMD_UPDATE || operation == CMD_DELETE)
+            if (operation == CMD_UPDATE || operation == CMD_DELETE || operation == CMD_MERGE)
             {
                 Datum        datum;
                 bool        isNull;
@@ -849,13 +957,24 @@ ExecModifyTable(ModifyTableState *node)
                     datum = ExecGetJunkAttribute(slot,
                                                  junkfilter->jf_junkAttNo,
                                                  &isNull);
-                    /* shouldn't ever get a null result... */
                     if (isNull)
-                        elog(ERROR, "ctid is NULL");
+                    {
+                        /*
+                         * Shouldn't ever get a null result for UPDATE or DELETE.
+                         * MERGE gets a null ctid in "NOT MATCHED" case
+                         */
+                        if (operation != CMD_MERGE)
+                            elog(ERROR, "ctid is NULL");
+                        else
+                            tupleid = NULL;
+                    }
+                    else
+                    {

-                    tupleid = (ItemPointer) DatumGetPointer(datum);
-                    tuple_ctid = *tupleid;    /* be sure we don't free ctid!! */
-                    tupleid = &tuple_ctid;
+                        tupleid = (ItemPointer) DatumGetPointer(datum);
+                        tuple_ctid = *tupleid;    /* be sure we don't free the ctid!! */
+                        tupleid = &tuple_ctid;
+                    }
                 }
                 else
                 {
@@ -890,6 +1009,10 @@ ExecModifyTable(ModifyTableState *node)
                 slot = ExecDelete(tupleid, oldtuple, planSlot,
                                   &node->mt_epqstate, estate);
                 break;
+            case CMD_MERGE:
+                slot = ExecMerge(tupleid, slot, planSlot,
+                                 mergeActSet, estate);
+                break;
             default:
                 elog(ERROR, "unknown operation");
                 break;
@@ -917,6 +1040,69 @@ ExecModifyTable(ModifyTableState *node)
     return NULL;
 }

+/*
+ *    When init a merge plan, we also need init its action plans.
+ *    These action plans are "MergeAction" plans.
+ *
+ *    This function mainly handles the tlist and qual in the plan.
+ *    The returning result is a "MergeActionState".
+ */
+MergeActionState *
+ExecInitMergeAction(MergeAction *node, EState *estate, int eflags)
+{
+    MergeActionState *result;
+
+    /*
+     * do nothing when we get to the end of a leaf on tree.
+     */
+    if (node == NULL)
+        return NULL;
+
+    /*
+     * create state structure
+     */
+    result = makeNode(MergeActionState);
+    result->operation = node->operation;
+    result->ps.plan = (Plan *)node;
+    result->ps.state = estate;
+
+    /*
+     * tuple table initialization
+     */
+    ExecInitResultTupleSlot(estate, &result->ps);
+
+    /*
+     * initialize tuple type
+     */
+    ExecAssignResultTypeFromTL(&result->ps);
+
+    /*
+     * create expression context for node
+     */
+    ExecAssignExprContext(estate, &result->ps);
+
+    /*
+     * initialize child expressions
+     */
+    result->ps.targetlist = (List *)
+        ExecInitExpr((Expr *) node->plan.targetlist,  &result->ps);
+
+    result->ps.qual = (List *)
+        ExecInitExpr((Expr *) node->plan.qual, &result->ps);
+
+    /*
+     * init the projection information
+     */
+    ExecAssignProjectionInfo(&result->ps, NULL);
+
+    /*
+     * XXX: do we need a check for the plan output here ?
+     * (by calling the ExecCheckPlanOutput() function
+     */
+
+    return result;
+}
+
 /* ----------------------------------------------------------------
  *        ExecInitModifyTable
  * ----------------------------------------------------------------
@@ -932,6 +1118,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
     Plan       *subplan;
     ListCell   *l;
     int            i;
+    bool         isMergeAction = false;

     /* check for unsupported flags */
     Assert(!(eflags & (EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK)));
@@ -972,6 +1159,16 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
     foreach(l, node->plans)
     {
         subplan = (Plan *) lfirst(l);
+
+        /*
+         *     test if this subplan node is a MergeAction.
+         *     We need this information for setting the junkfilter.
+         *    junkfilter is necessary for an ordinary UPDATE/DELETE plan,
+         *    but not for an UPDATE/DELETE merge action
+         */
+        if (IsA(subplan, MergeAction))
+            isMergeAction = true;
+
         mtstate->mt_plans[i] = ExecInitNode(subplan, estate, eflags);
         estate->es_result_relation_info++;
         i++;
@@ -1101,7 +1298,11 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
                 break;
             case CMD_UPDATE:
             case CMD_DELETE:
-                junk_filter_needed = true;
+            case CMD_MERGE:
+                if(!isMergeAction)
+                    junk_filter_needed = true;
+                break;
+            case CMD_DONOTHING:
                 break;
             default:
                 elog(ERROR, "unknown operation");
@@ -1124,9 +1325,10 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
                             resultRelInfo->ri_RelationDesc->rd_att->tdhasoid,
                                        ExecInitExtraTupleSlot(estate));

-                if (operation == CMD_UPDATE || operation == CMD_DELETE)
+                if (operation == CMD_UPDATE || operation == CMD_DELETE ||
+                    operation == CMD_MERGE)
                 {
-                    /* For UPDATE/DELETE, find the appropriate junk attr now */
+                    /* For UPDATE/DELETE/MERGE, find the appropriate junk attr now */
                     if (resultRelInfo->ri_RelationDesc->rd_rel->relkind == RELKIND_RELATION)
                     {
                         j->jf_junkAttNo = ExecFindJunkAttribute(j, "ctid");
@@ -1161,6 +1363,36 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
     if (estate->es_trig_tuple_slot == NULL)
         estate->es_trig_tuple_slot = ExecInitExtraTupleSlot(estate);

+    /*
+     * for the merge actions, we need to do similar things as above.
+     *Each action in each action set should be init by the ExecInitNode.
+     *The returned planstates will take the place of the original plan nodes.
+     *These new action set will be put in the mt_mergeActPstates array.
+     */
+    if(node->operation == CMD_MERGE)
+    {
+        /*we have one action set for each result relation (main plan)*/
+        mtstate->mt_mergeActPstates =
+            (MergeActionSet **) palloc0(sizeof(MergeActionSet*) * nplans);
+
+        estate->es_result_relation_info = estate->es_result_relations;
+        i = 0;
+        foreach(l, node->mergeActPlan)
+        {
+            ListCell *e;
+            MergeActionSet *actset = (MergeActionSet *) lfirst(l);
+
+            foreach(e, actset->actions)
+            {
+                lfirst(e) = ExecInitNode((Plan *)lfirst(e),  estate, 0);
+            }
+            mtstate->mt_mergeActPstates[i] = actset;
+            estate->es_result_relation_info++;
+            i++;
+        }
+        estate->es_result_relation_info = NULL;
+    }
+
     return mtstate;
 }

diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 508d7c7..4a57aa7 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -177,6 +177,42 @@ _copyModifyTable(ModifyTable *from)
     COPY_NODE_FIELD(returningLists);
     COPY_NODE_FIELD(rowMarks);
     COPY_SCALAR_FIELD(epqParam);
+    COPY_NODE_FIELD(mergeActPlan);
+
+    return newnode;
+}
+
+/*
+ * _copyMergeAction
+ */
+static MergeAction *
+_copyMergeAction(MergeAction *from)
+{
+    MergeAction *newnode = makeNode(MergeAction);
+
+    /*
+     * copy node superclass fields
+     */
+    CopyPlanFields((Plan *) from, (Plan *) newnode);
+
+    COPY_SCALAR_FIELD(operation);
+    COPY_SCALAR_FIELD(matched);
+    COPY_NODE_FIELD(flattenedqual);
+    COPY_NODE_FIELD(flattenedtlist);
+
+    return newnode;
+}
+
+/*
+ * _copyMergeActionSet
+ */
+static MergeActionSet *
+_copyMergeActionSet(MergeActionSet *from)
+{
+    MergeActionSet *newnode = makeNode(MergeActionSet);
+
+    COPY_SCALAR_FIELD(result_relation);
+    COPY_NODE_FIELD(actions);

     return newnode;
 }
@@ -2300,6 +2336,10 @@ _copyQuery(Query *from)
     COPY_NODE_FIELD(rowMarks);
     COPY_NODE_FIELD(setOperations);
     COPY_NODE_FIELD(constraintDeps);
+    COPY_SCALAR_FIELD(isMergeAction);
+    COPY_SCALAR_FIELD(matched);
+    COPY_SCALAR_FIELD(sourceAttrNo);
+    COPY_NODE_FIELD(mergeActQry);

     return newnode;
 }
@@ -2309,6 +2349,7 @@ _copyInsertStmt(InsertStmt *from)
 {
     InsertStmt *newnode = makeNode(InsertStmt);

+    COPY_SCALAR_FIELD(isMergeAction);
     COPY_NODE_FIELD(relation);
     COPY_NODE_FIELD(cols);
     COPY_NODE_FIELD(selectStmt);
@@ -2323,6 +2364,7 @@ _copyDeleteStmt(DeleteStmt *from)
 {
     DeleteStmt *newnode = makeNode(DeleteStmt);

+    COPY_SCALAR_FIELD(isMergeAction);
     COPY_NODE_FIELD(relation);
     COPY_NODE_FIELD(usingClause);
     COPY_NODE_FIELD(whereClause);
@@ -2337,6 +2379,7 @@ _copyUpdateStmt(UpdateStmt *from)
 {
     UpdateStmt *newnode = makeNode(UpdateStmt);

+    COPY_SCALAR_FIELD(isMergeAction);
     COPY_NODE_FIELD(relation);
     COPY_NODE_FIELD(targetList);
     COPY_NODE_FIELD(whereClause);
@@ -2374,6 +2417,47 @@ _copySelectStmt(SelectStmt *from)
     return newnode;
 }

+static MergeStmt *
+_copyMergeStmt(MergeStmt *from)
+{
+    MergeStmt *newnode = makeNode(MergeStmt);
+
+    COPY_NODE_FIELD(relation);
+    COPY_NODE_FIELD(source);
+    COPY_NODE_FIELD(matchCondition);
+    COPY_NODE_FIELD(actions);
+
+    return newnode;
+}
+
+static MergeConditionAction *
+_copyMergeConditionAction(MergeConditionAction *from)
+{
+    MergeConditionAction *newnode = makeNode(MergeConditionAction);
+
+    COPY_SCALAR_FIELD(match);
+    COPY_NODE_FIELD(condition);
+    COPY_NODE_FIELD(action);
+
+    return newnode;
+}
+
+static MergeDoNothing *
+_copyMergeDoNothing(MergeDoNothing *from)
+{
+    MergeDoNothing *newnode = makeNode(MergeDoNothing);
+
+    return newnode;
+}
+
+static MergeError*
+_copyMergeError(MergeError *from)
+{
+    MergeError *newnode = makeNode(MergeError);
+
+    return newnode;
+}
+
 static SetOperationStmt *
 _copySetOperationStmt(SetOperationStmt *from)
 {
@@ -3650,6 +3734,12 @@ copyObject(void *from)
         case T_ModifyTable:
             retval = _copyModifyTable(from);
             break;
+        case T_MergeAction:
+            retval = _copyMergeAction(from);
+            break;
+        case T_MergeActionSet:
+            retval = _copyMergeActionSet(from);
+            break;
         case T_Append:
             retval = _copyAppend(from);
             break;
@@ -3953,6 +4043,18 @@ copyObject(void *from)
         case T_SelectStmt:
             retval = _copySelectStmt(from);
             break;
+        case T_MergeStmt:
+            retval = _copyMergeStmt(from);
+            break;
+        case T_MergeConditionAction:
+            retval = _copyMergeConditionAction(from);
+            break;
+        case T_MergeDoNothing:
+            retval = _copyMergeDoNothing(from);
+            break;
+        case T_MergeError:
+            retval = _copyMergeError(from);
+            break;
         case T_SetOperationStmt:
             retval = _copySetOperationStmt(from);
             break;
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 19262aa..a95057b 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -879,6 +879,10 @@ _equalQuery(Query *a, Query *b)
     COMPARE_NODE_FIELD(rowMarks);
     COMPARE_NODE_FIELD(setOperations);
     COMPARE_NODE_FIELD(constraintDeps);
+    COMPARE_SCALAR_FIELD(isMergeAction);
+    COMPARE_SCALAR_FIELD(matched);
+    COMPARE_SCALAR_FIELD(sourceAttrNo);
+    COMPARE_NODE_FIELD(mergeActQry);

     return true;
 }
@@ -886,6 +890,7 @@ _equalQuery(Query *a, Query *b)
 static bool
 _equalInsertStmt(InsertStmt *a, InsertStmt *b)
 {
+    COMPARE_SCALAR_FIELD(isMergeAction);
     COMPARE_NODE_FIELD(relation);
     COMPARE_NODE_FIELD(cols);
     COMPARE_NODE_FIELD(selectStmt);
@@ -898,6 +903,7 @@ _equalInsertStmt(InsertStmt *a, InsertStmt *b)
 static bool
 _equalDeleteStmt(DeleteStmt *a, DeleteStmt *b)
 {
+    COMPARE_SCALAR_FIELD(isMergeAction);
     COMPARE_NODE_FIELD(relation);
     COMPARE_NODE_FIELD(usingClause);
     COMPARE_NODE_FIELD(whereClause);
@@ -910,6 +916,7 @@ _equalDeleteStmt(DeleteStmt *a, DeleteStmt *b)
 static bool
 _equalUpdateStmt(UpdateStmt *a, UpdateStmt *b)
 {
+    COMPARE_SCALAR_FIELD(isMergeAction);
     COMPARE_NODE_FIELD(relation);
     COMPARE_NODE_FIELD(targetList);
     COMPARE_NODE_FIELD(whereClause);
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index ee2aeb0..3106040 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -332,6 +332,7 @@ _outModifyTable(StringInfo str, ModifyTable *node)
     WRITE_NODE_FIELD(returningLists);
     WRITE_NODE_FIELD(rowMarks);
     WRITE_INT_FIELD(epqParam);
+    WRITE_NODE_FIELD(mergeActPlan);
 }

 static void
@@ -2062,6 +2063,53 @@ _outQuery(StringInfo str, Query *node)
     WRITE_NODE_FIELD(rowMarks);
     WRITE_NODE_FIELD(setOperations);
     WRITE_NODE_FIELD(constraintDeps);
+    WRITE_BOOL_FIELD(isMergeAction);
+    WRITE_BOOL_FIELD(matched);
+    WRITE_INT_FIELD(sourceAttrNo);
+    WRITE_NODE_FIELD(mergeActQry);
+}
+
+static void
+_outMergeConditionAction(StringInfo str, MergeConditionAction *node)
+{
+    WRITE_NODE_TYPE("MERGECONDITIONACTION");
+
+    WRITE_BOOL_FIELD(match);
+    WRITE_NODE_FIELD(condition);
+    WRITE_NODE_FIELD(action);
+}
+
+static void
+_outMergeStmt(StringInfo str, MergeStmt *node)
+{
+    WRITE_NODE_TYPE("MERGESTMT");
+
+    WRITE_NODE_FIELD(relation);
+    WRITE_NODE_FIELD(source);
+    WRITE_NODE_FIELD(matchCondition);
+    WRITE_NODE_FIELD(actions);
+}
+
+static void
+_outMergeAction(StringInfo str, MergeAction *node)
+{
+    WRITE_NODE_TYPE("MERGEACTION");
+
+    _outPlanInfo(str, (Plan *)node);
+
+    WRITE_ENUM_FIELD(operation, CmdType);
+    WRITE_BOOL_FIELD(matched);
+    WRITE_NODE_FIELD(flattenedqual);
+    WRITE_NODE_FIELD(flattenedtlist);
+}
+
+static void
+_outMergeActionSet(StringInfo str, MergeActionSet *node)
+{
+    WRITE_NODE_TYPE("MERGEACTIONSET");
+
+    WRITE_INT_FIELD(result_relation);
+    WRITE_NODE_FIELD(actions);
 }

 static void
@@ -2953,6 +3001,18 @@ _outNode(StringInfo str, void *obj)
             case T_XmlSerialize:
                 _outXmlSerialize(str, obj);
                 break;
+            case T_MergeAction:
+                _outMergeAction(str, obj);
+                break;
+            case T_MergeActionSet:
+                _outMergeActionSet(str, obj);
+                break;
+            case T_MergeStmt:
+                _outMergeStmt(str, obj);
+                break;
+            case T_MergeConditionAction:
+                _outMergeConditionAction(str,obj);
+                break;

             default:

diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 9a4e03e..e06c5b8 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -219,6 +219,10 @@ _readQuery(void)
     READ_NODE_FIELD(rowMarks);
     READ_NODE_FIELD(setOperations);
     READ_NODE_FIELD(constraintDeps);
+    READ_BOOL_FIELD(isMergeAction);
+    READ_BOOL_FIELD(matched);
+    READ_INT_FIELD(sourceAttrNo);
+    READ_NODE_FIELD(mergeActQry);

     READ_DONE();
 }
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 52dd27b..ec6dd22 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -4090,7 +4090,7 @@ make_result(PlannerInfo *root,
 ModifyTable *
 make_modifytable(CmdType operation, List *resultRelations,
                  List *subplans, List *returningLists,
-                 List *rowMarks, int epqParam)
+                 List *rowMarks, List *mergeActPlans, int epqParam)
 {
     ModifyTable *node = makeNode(ModifyTable);
     Plan       *plan = &node->plan;
@@ -4143,6 +4143,8 @@ make_modifytable(CmdType operation, List *resultRelations,
     node->returningLists = returningLists;
     node->rowMarks = rowMarks;
     node->epqParam = epqParam;
+    if (operation == CMD_MERGE)
+        node->mergeActPlan = mergeActPlans;

     return node;
 }
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 93daedc..55ca01f 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -103,7 +103,8 @@ static void get_column_info_for_window(PlannerInfo *root, WindowClause *wc,
                            int *ordNumCols,
                            AttrNumber **ordColIdx,
                            Oid **ordOperators);
-
+static ModifyTable *merge_action_planner(PlannerInfo *root, Plan *mainPlan);
+static MergeActionSet *merge_action_set_planner(PlannerInfo *root, Plan *mainPlan);

 /*****************************************************************************
  *
@@ -453,6 +454,27 @@ subquery_planner(PlannerGlobal *glob, Query *parse,
     }

     /*
+    * for MERGE command, we need to preprocess
+    * the expressions for merge actions too.
+    */
+    if(parse->commandType == CMD_MERGE)
+    {
+        ListCell *e;
+
+        foreach(e, parse->mergeActQry)
+        {
+            Query *actqry = (Query *)lfirst(e);
+
+            actqry->targetList = (List *) preprocess_expression(root,
+                                        (Node *) actqry->targetList,
+                                        EXPRKIND_TARGET);
+            actqry->jointree->quals = preprocess_expression(root,
+                                        (Node *) actqry->jointree->quals,
+                                        EXPRKIND_QUAL);
+        }
+    }
+
+    /*
      * In some cases we may want to transfer a HAVING clause into WHERE. We
      * cannot do so if the HAVING clause contains aggregates (obviously) or
      * volatile functions (since a HAVING clause is supposed to be executed
@@ -529,6 +551,7 @@ subquery_planner(PlannerGlobal *glob, Query *parse,
         {
             List       *returningLists;
             List       *rowMarks;
+            List         *mergeAction;

             /*
              * Deal with the RETURNING clause if any.  It's convenient to pass
@@ -560,12 +583,19 @@ subquery_planner(PlannerGlobal *glob, Query *parse,
             else
                 rowMarks = root->rowMarks;

+            /*if here is a MERGE command, we need to plan the actions too*/
+            if(parse->commandType == CMD_MERGE)
+                mergeAction = list_make1(merge_action_set_planner(root, plan));
+            else
+                mergeAction = NIL;
+
             plan = (Plan *) make_modifytable(parse->commandType,
                                            copyObject(root->resultRelations),
                                              list_make1(plan),
                                              returningLists,
                                              rowMarks,
-                                             SS_assign_special_param(root));
+                                             mergeAction,
+                                             SS_assign_special_param(root));
         }
     }

@@ -586,6 +616,130 @@ subquery_planner(PlannerGlobal *glob, Query *parse,
 }

 /*
+to generate the merge action set for the main plan.
+Call this function after the grouping_planner()
+
+Only works for MERGE command
+*/
+static MergeActionSet *
+merge_action_set_planner(PlannerInfo *root, Plan *mainPlan)
+{
+    MergeActionSet *result;
+    Query *mainQry = root->parse;
+    ListCell *l;
+    PlannerInfo subroot;
+
+    /*for non-merge command, no need to plan the merge actions*/
+    if(mainQry->commandType != CMD_MERGE ||
+        mainQry->mergeActQry == NIL)
+        return NULL;
+
+    /*do a copy of the root info*/
+    memcpy(&subroot, root, sizeof(PlannerInfo));
+
+    /*create the result node*/
+    result = makeNode(MergeActionSet);
+    result->result_relation = mainQry->resultRelation;
+    result->actions = NIL;
+
+    /*plan the actions one by one*/
+    foreach(l, mainQry->mergeActQry)
+    {
+        ModifyTable *actplan;
+
+        /*put the action query into the subroot*/
+        subroot.parse = (Query *) lfirst(l);
+        actplan = merge_action_planner(&subroot, mainPlan);
+        result->actions = lappend(result->actions, actplan);
+    }
+    return result;
+}
+
+/* create plan for a single merge action */
+static ModifyTable *
+merge_action_planner(PlannerInfo *root, Plan *mainPlan)
+{
+    Query *parse = root->parse;
+    MergeAction    *actplan;
+    ModifyTable *result;
+
+    List           *returningLists;
+    List         *rowMarks;
+
+    /*
+     * no having clause in a merge action
+     */
+    Assert(parse->havingQual == NULL);
+
+    /*
+     * Create the action plan node
+     */
+    actplan = makeNode(MergeAction);
+    actplan->operation = parse->commandType;
+    actplan->matched = parse->matched;
+
+    /* copy the cost from the top_plan */
+    actplan->plan.startup_cost = mainPlan->startup_cost;
+    actplan->plan.total_cost = mainPlan->total_cost;
+    actplan->plan.plan_rows = mainPlan->plan_rows;
+    actplan->plan.plan_width = mainPlan->plan_width;
+
+    /*
+    * Here, the quals expressions are flattened, which is accepted
+    * by deparse functions in EXPLAIN.
+    * But, these expressions will be processed by push_up_vars
+    * latterly and become not flat again.
+    * So, we need to keep a copy of current quals for explaining.
+    */
+    actplan->flattenedqual = (List *) copyObject(parse->jointree->quals);
+    actplan->plan.qual = (List *)parse->jointree->quals;
+
+    /* prepare the target list */
+    if (parse->targetList)
+    {
+        actplan->plan.targetlist = preprocess_targetlist(root,
+                                        parse->targetList);
+        /*the target list should also be copied for EXPLAIN*/
+        actplan->flattenedtlist = (List *) copyObject(actplan->plan.targetlist);
+    }
+
+    /*
+    *In general situation, all the vars in target list and quals are flattened.
+    *But, we want them to point to the attributes of the top join plan, not to
+    *the subplans. So push them up again here.
+    */
+    push_up_merge_action_vars(actplan, parse);
+
+    if (parse->returningList)
+    {
+        List       *rlist;
+
+        Assert(parse->resultRelation);
+        rlist = set_returning_clause_references(root->glob,
+                                parse->returningList,
+                                &actplan->plan,
+                                parse->resultRelation);
+        returningLists = list_make1(rlist);
+    }
+    else
+        returningLists = NIL;
+
+    if (parse->rowMarks)
+        rowMarks = NIL;
+    else
+        rowMarks = root->rowMarks;
+
+    result = make_modifytable(parse->commandType,
+                              list_make1_int(parse->resultRelation),
+                              list_make1(actplan),
+                              returningLists,
+                              rowMarks,
+                              NIL,
+                              SS_assign_special_param(root));
+    return result;
+}
+
+/*
  * preprocess_expression
  *        Do subquery_planner's preprocessing work for an expression,
  *        which can be a targetlist, a WHERE clause (including JOIN/ON
@@ -730,6 +884,7 @@ inheritance_planner(PlannerInfo *root)
     List       *returningLists = NIL;
     List       *rtable = NIL;
     List       *rowMarks;
+    List       *mergeActSets = NIL;
     List       *tlist;
     PlannerInfo subroot;
     ListCell   *l;
@@ -791,6 +946,66 @@ inheritance_planner(PlannerInfo *root)
                                                     appinfo->child_relid);
             returningLists = lappend(returningLists, rlist);
         }
+
+        /*
+        *For a merge command, we need to generate a set of action plans corresponding
+        *to current result relations.
+        *The adjust_appendrel_attrs() will not process the merge action list of
+        *the main query. And, We need to do this here for a MERGE command.
+        *In fact, no need to do a full query tree mutator on the action queries.
+        *only adjust the target list and quals.
+        */
+        if (parse->commandType == CMD_MERGE)
+        {
+            ListCell *e;
+            MergeActionSet *maset;
+
+            /*
+            *the parse in subroot now is a copy of the main query of current result relation
+            *Here we need to generate a copy of the action queries and shift their target table
+            *to current result relation
+            */
+            subroot.parse->mergeActQry = NIL;
+
+            foreach(e, parse->mergeActQry)
+            {
+                Query *actqry = (Query *)lfirst(e);
+                Query *newactqry = makeNode(Query);
+
+                /*copy most of the common fields from original query*/
+                *newactqry = *actqry;
+
+                /*reset the result relation to current child table*/
+                newactqry->resultRelation = subroot.parse->resultRelation;
+
+                /*make the range table to be consistent with current main query*/
+                newactqry->rtable = subroot.parse->rtable;
+
+                /*adjust the target list*/
+                newactqry->targetList = (List *) adjust_appendrel_attrs(
+                                                            (Node *) actqry->targetList,
+                                                            appinfo);
+                newactqry->targetList = adjust_inherited_tlist(newactqry->targetList,
+                                                            appinfo);
+                /*and qual*/
+                newactqry->jointree = makeNode(FromExpr);
+                newactqry->jointree->fromlist = subroot.parse->jointree->fromlist;
+                newactqry->jointree->quals = adjust_appendrel_attrs(
+                                                            actqry->jointree->quals,
+                                                            appinfo);
+
+                /*put this new action query in to the action list of current main query*/
+                subroot.parse->mergeActQry = lappend(subroot.parse->mergeActQry, newactqry);
+            }
+
+            /*
+            * now we have a complete query (main query + action queries) that has been
+            *shifted to  current result relation. Plan these actions here.
+            */
+            maset = merge_action_set_planner(&subroot, subplan);
+            Assert(maset != NULL);
+            mergeActSets = lappend(mergeActSets, maset);
+        }
     }

     root->resultRelations = resultRelations;
@@ -843,6 +1058,7 @@ inheritance_planner(PlannerInfo *root)
                                      subplans,
                                      returningLists,
                                      rowMarks,
+                                     mergeActSets,
                                      SS_assign_special_param(root));
 }

diff --git a/src/backend/optimizer/prep/preptlist.c b/src/backend/optimizer/prep/preptlist.c
index 2c97c71..73e832d 100644
--- a/src/backend/optimizer/prep/preptlist.c
+++ b/src/backend/optimizer/prep/preptlist.c
@@ -78,6 +78,76 @@ preprocess_targetlist(PlannerInfo *root, List *tlist)
                                   result_relation, range_table);

     /*
+    for MERGE command, we also need to expend the target list of the main query.
+    Note that, the target list of main query is the combination of attrs
+    from Source table and target table. We only want to expend the part
+    of target table.
+
+    We do this in an aggressive way:
+    1. Truncate the old target list, keep only the entries for source table
+    2. expend the target list of result relation from an NIL list.
+    3. Append this new list at the end of old target list.
+    */
+    if (command_type == CMD_MERGE)
+    {
+        ListCell *l;
+        List *TLforResultRelatoin = NIL;
+        int new_resno;
+
+        Assert(parse->sourceAttrNo > 0);
+
+        tlist = list_truncate(tlist, parse->sourceAttrNo);
+
+        TLforResultRelatoin =  expand_targetlist(TLforResultRelatoin, command_type,
+                                                    result_relation, range_table);
+        new_resno = parse->sourceAttrNo + 1;
+        foreach(l, TLforResultRelatoin)
+        {
+            TargetEntry *te = (TargetEntry *)lfirst(l);
+            te->resno = new_resno++;
+        }
+
+        tlist = list_concat(tlist, TLforResultRelatoin);
+    }
+
+    /*
+     * for "update" , "delete"  and "merge" queries, add ctid of the result relation into
+     * the target list so that the ctid will propagate through execution and
+     * ExecutePlan() will be able to identify the right tuple to replace or
+     * delete.    This extra field is marked "junk" so that it is not stored
+     * back into the tuple.
+     *
+     * BUT, if the query node is a merge action,
+     * we don't need to expand the ctid attribute in tlist.
+     * The tlist of the merge top level plan already contains
+     * a "ctid" junk attr of the target relation.
+     */
+    if (!parse->isMergeAction  &&
+            (command_type == CMD_UPDATE ||
+            command_type == CMD_DELETE ||
+            command_type == CMD_MERGE))
+    {
+        TargetEntry *tle;
+        Var           *var;
+
+        var = makeVar(result_relation, SelfItemPointerAttributeNumber,
+                      TIDOID, -1, 0);
+        tle = makeTargetEntry((Expr *) var,
+                              list_length(tlist) + 1,
+                              pstrdup("ctid"),
+                              true);
+        /*
+         * For an UPDATE, expand_targetlist already created a fresh tlist. For
+         * DELETE, better do a listCopy so that we don't destructively modify
+         * the original tlist (is this really necessary?).
+         */
+        if (command_type == CMD_DELETE)
+            tlist = list_copy(tlist);
+
+        tlist = lappend(tlist, tle);
+    }
+
+    /*
      * Add necessary junk columns for rowmarked rels.  These values are needed
      * for locking of rels selected FOR UPDATE/SHARE, and to do EvalPlanQual
      * rechecking.    While we are at it, store these junk attnos in the
@@ -305,6 +375,7 @@ expand_targetlist(List *tlist, int command_type,
                     }
                     break;
                 case CMD_UPDATE:
+                case CMD_MERGE:
                     if (!att_tup->attisdropped)
                     {
                         new_expr = (Node *) makeVar(result_relation,
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 0d3a739..2fda504 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -101,9 +101,6 @@ static Bitmapset *translate_col_privs(const Bitmapset *parent_privs,
 static Node *adjust_appendrel_attrs_mutator(Node *node,
                                AppendRelInfo *context);
 static Relids adjust_relid_set(Relids relids, Index oldrelid, Index newrelid);
-static List *adjust_inherited_tlist(List *tlist,
-                       AppendRelInfo *context);
-

 /*
  * plan_set_operations
@@ -1741,8 +1738,9 @@ adjust_relid_set(Relids relids, Index oldrelid, Index newrelid)
  * scribble on.
  *
  * Note that this is not needed for INSERT because INSERT isn't inheritable.
+ * But the INSERT actions in MERGE need this function
  */
-static List *
+List *
 adjust_inherited_tlist(List *tlist, AppendRelInfo *context)
 {
     bool        changed_it = false;
diff --git a/src/backend/optimizer/util/var.c b/src/backend/optimizer/util/var.c
index 399f3a9..862dc5c 100644
--- a/src/backend/optimizer/util/var.c
+++ b/src/backend/optimizer/util/var.c
@@ -67,6 +67,16 @@ typedef struct
     bool        inserted_sublink;        /* have we inserted a SubLink? */
 } flatten_join_alias_vars_context;

+typedef struct
+{
+    int varno_source;
+    int varno_target;
+    int varno_join;
+
+    int offset_source;
+    int offset_target;
+} push_up_merge_action_vars_context;
+
 static bool pull_varnos_walker(Node *node,
                    pull_varnos_context *context);
 static bool pull_varattnos_walker(Node *node, Bitmapset **varattnos);
@@ -83,6 +93,8 @@ static bool pull_var_clause_walker(Node *node,
 static Node *flatten_join_alias_vars_mutator(Node *node,
                                 flatten_join_alias_vars_context *context);
 static Relids alias_relid_set(PlannerInfo *root, Relids relids);
+static bool push_up_merge_action_vars_walker(Node *node,
+                                 push_up_merge_action_vars_context *context);


 /*
@@ -677,6 +689,79 @@ pull_var_clause_walker(Node *node, pull_var_clause_context *context)
                                   (void *) context);
 }

+/*
+ *    When prepare for the MERGE command, we have made a
+ *    left join between the Source table and target table as the
+ *    main plan.
+ *
+ *    In this case, the range table contains ONLY THREE range table entries:
+ *    1. the source table, which may be a subquery or a plain table
+ *    2. the entry of the target table, which is a plain table
+ *    3. join expression with the source table and target table as its parameters.
+ *
+ *    Each merge action of the command has its own query and
+ *    plan nodes as well. And, the vars in its target list and qual
+ *    expressions may refers to the attribute in any one of the above 3
+ *     range table entries.
+ *
+ *    However, since the result tuple slots of merge actions are
+ *    projected from the returned tuple of the join, we need to
+ *    mapping the vars of source table and target table to their
+ *    corresponding attributes in the third range table entry.
+ *
+ *    This function does the opposite of the flatten_join_alias_vars()
+ *    function. It walks through the target list and qual of a
+ *    MergeAction plan, changes the vars' varno and varattno to the
+ *    corresponding position in the upper level join RTE.
+ */
+void
+push_up_merge_action_vars(MergeAction *actplan, Query *actqry)
+{
+    push_up_merge_action_vars_context context;
+
+    context.varno_source = Merge_SourceTableRTindex;
+    context.varno_target = actqry->resultRelation;
+    context.varno_join = Merge_TopJoinTableRTindex;
+    context.offset_source = 0;
+    context.offset_target = actqry->sourceAttrNo;
+
+    push_up_merge_action_vars_walker((Node *)actplan->plan.targetlist,
+                                            &context);
+    push_up_merge_action_vars_walker((Node *)actplan->plan.qual,
+                                            &context);
+}
+
+static bool
+push_up_merge_action_vars_walker(Node *node,
+                                    push_up_merge_action_vars_context *context)
+{
+    if (node == NULL)
+        return false;
+    if (IsA(node, Var))
+    {
+        Var *var = (Var *)node;
+
+        if(var->varno == context->varno_source)
+        {
+            var->varno = context->varno_join;
+            var->varattno += context->offset_source;
+            return false;
+        }
+        else if(var->varno == context->varno_target)
+        {
+            var->varno = context->varno_join;
+            var->varattno += context->offset_target;
+            return false;
+        }
+        else if(var->varno == context->varno_join)
+            return false;
+        else
+            elog(ERROR, "the vars in merge action tlist of qual should only belongs to the source table or target
table");
+    }
+
+    return expression_tree_walker(node, push_up_merge_action_vars_walker,
+                                  (void *) context);
+}

 /*
  * flatten_join_alias_vars
diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index bb2bf04..73a4708 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -48,6 +48,7 @@ static Query *transformInsertStmt(ParseState *pstate, InsertStmt *stmt);
 static List *transformInsertRow(ParseState *pstate, List *exprlist,
                    List *stmtcols, List *icolumns, List *attrnos);
 static int    count_rowexpr_columns(ParseState *pstate, Node *expr);
+static Query *transformMergeStmt(ParseState *pstate, MergeStmt *stmt);
 static Query *transformSelectStmt(ParseState *pstate, SelectStmt *stmt);
 static Query *transformValuesClause(ParseState *pstate, SelectStmt *stmt);
 static Query *transformSetOperationStmt(ParseState *pstate, SelectStmt *stmt);
@@ -176,6 +177,10 @@ transformStmt(ParseState *pstate, Node *parseTree)
             result = transformUpdateStmt(pstate, (UpdateStmt *) parseTree);
             break;

+        case T_MergeStmt:
+            result = transformMergeStmt(pstate, (MergeStmt *)parseTree);
+            break;
+
         case T_SelectStmt:
             {
                 SelectStmt *n = (SelectStmt *) parseTree;
@@ -246,6 +251,7 @@ analyze_requires_snapshot(Node *parseTree)
         case T_DeleteStmt:
         case T_UpdateStmt:
         case T_SelectStmt:
+        case T_MergeStmt:
             result = true;
             break;

@@ -299,12 +305,24 @@ transformDeleteStmt(ParseState *pstate, DeleteStmt *stmt)
     qry->distinctClause = NIL;

     /*
-     * The USING clause is non-standard SQL syntax, and is equivalent in
-     * functionality to the FROM list that can be specified for UPDATE. The
-     * USING keyword is used rather than FROM because FROM is already a
-     * keyword in the DELETE syntax.
+     * The input stmt could be a MergeDelete node.
+     * In this case, we don't need the process on range table.
      */
-    transformFromClause(pstate, stmt->usingClause);
+    if (!stmt->isMergeAction)
+    {
+        /* set up range table with just the result rel */
+        qry->resultRelation = setTargetTable(pstate, stmt->relation,
+                                      interpretInhOption(stmt->relation->inhOpt),
+                                             true,
+                                             ACL_DELETE);
+        /*
+         * The USING clause is non-standard SQL syntax, and is equivalent in
+         * functionality to the FROM list that can be specified for UPDATE. The
+         * USING keyword is used rather than FROM because FROM is already a
+         * keyword in the DELETE syntax.
+         */
+        transformFromClause(pstate, stmt->usingClause);
+    }

     qual = transformWhereClause(pstate, stmt->whereClause, "WHERE");

@@ -369,6 +387,8 @@ transformInsertStmt(ParseState *pstate, InsertStmt *stmt)
      * VALUES clause.  If we have any of those, treat it as a general SELECT;
      * so it will work, but you can't use DEFAULT items together with those.
      */
+
+    /* a MergeInsert statement is always a VALUES clause*/
     isGeneralSelect = (selectStmt && (selectStmt->valuesLists == NIL ||
                                       selectStmt->sortClause != NIL ||
                                       selectStmt->limitOffset != NULL ||
@@ -407,7 +427,8 @@ transformInsertStmt(ParseState *pstate, InsertStmt *stmt)
      * mentioned in the SELECT part.  Note that the target table is not added
      * to the joinlist or namespace.
      */
-    qry->resultRelation = setTargetTable(pstate, stmt->relation,
+    if (!stmt->isMergeAction) /* for MergeInsert, no need to do this */
+        qry->resultRelation = setTargetTable(pstate, stmt->relation,
                                          false, false, ACL_INSERT);

     /* Validate stmt->cols list, or build default list if no list given */
@@ -1802,16 +1823,18 @@ transformUpdateStmt(ParseState *pstate, UpdateStmt *stmt)
         qry->cteList = transformWithClause(pstate, stmt->withClause);
     }

-    qry->resultRelation = setTargetTable(pstate, stmt->relation,
-                                  interpretInhOption(stmt->relation->inhOpt),
-                                         true,
-                                         ACL_UPDATE);
-
-    /*
-     * the FROM clause is non-standard SQL syntax. We used to be able to do
-     * this with REPLACE in POSTQUEL so we keep the feature.
-     */
-    transformFromClause(pstate, stmt->fromClause);
+    if (!stmt->isMergeAction) /* for MergeUpdate, no need to do this */
+    {
+        qry->resultRelation = setTargetTable(pstate, stmt->relation,
+                                      interpretInhOption(stmt->relation->inhOpt),
+                                             true,
+                                             ACL_UPDATE);
+        /*
+         * the FROM clause is non-standard SQL syntax. We used to be able to do
+         * this with REPLACE in POSTQUEL so we keep the feature.
+         */
+        transformFromClause(pstate, stmt->fromClause);
+    }

     qry->targetList = transformTargetList(pstate, stmt->targetList);

@@ -2313,3 +2336,329 @@ applyLockingClause(Query *qry, Index rtindex,
     rc->pushedDown = pushedDown;
     qry->rowMarks = lappend(qry->rowMarks, rc);
 }
+
+/*
+ * transform an action of merge command into a query.
+ * No change of the pstate range table is allowed in this function.
+ */
+static Query *
+transformMergeActions(ParseState *pstate, MergeStmt *stmt,
+                      MergeConditionAction *condact)
+{
+    Query *actqry;
+
+    /*
+     * First, we need to make sure that DELETE and UPDATE
+     * actions are only taken in MATCHED condition,
+     * and INSERTs are only taken when not MATCHED
+     */
+    switch(condact->action->type)
+    {
+        case T_DeleteStmt:/*a delete action*/
+            {
+                DeleteStmt *deleteact = (DeleteStmt *) condact->action;
+                Assert(deleteact->isMergeAction);
+
+                if (!condact->match)
+                    ereport(ERROR,
+                        (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+                        errmsg("The DELETE action in MERGE command is not allowed when NOT MATCHED")));
+
+                /*put new right code to the result relation.
+                This line chages the RTE in range table directly*/
+                pstate->p_target_rangetblentry->requiredPerms |= ACL_DELETE;
+
+                deleteact->relation = stmt->relation;
+                deleteact->usingClause = list_make1(stmt->source);
+                deleteact->whereClause = condact->condition;
+
+                /*parse the action query*/
+                actqry = transformStmt(pstate, (Node *) deleteact);
+
+                if (!IsA(actqry, Query) ||
+                    actqry->commandType != CMD_DELETE ||
+                    actqry->utilityStmt != NULL)
+                    elog(ERROR, "improper DELETE action in merge stmt");
+
+                actqry->isMergeAction = true;
+                actqry->matched = condact->match;
+
+                return actqry;
+            }
+            break;
+        case T_UpdateStmt:/*an update action*/
+            {
+                UpdateStmt *updateact = (UpdateStmt *) condact->action;
+                Assert(updateact->isMergeAction);
+
+                if (!condact->match)
+                    ereport(ERROR,
+                            (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+                            errmsg("The UPDATE action in MERGE command is not allowed when NOT MATCHED")));
+
+                pstate->p_target_rangetblentry->requiredPerms |= ACL_UPDATE;
+
+                /*the "targetlist" of the updateact is filled in the parser */
+                updateact->relation = stmt->relation;
+                updateact->fromClause = list_make1(stmt->source);
+                updateact->whereClause = condact->condition;
+
+                /*parse the action query*/
+                actqry = transformStmt(pstate, (Node *)updateact);
+
+                if (!IsA(actqry, Query) ||
+                    actqry->commandType != CMD_UPDATE||
+                    actqry->utilityStmt != NULL)
+                    elog(ERROR, "improper UPDATE action in merge stmt");
+
+                actqry->isMergeAction = true;
+                actqry->matched = condact->match;
+
+                return actqry;
+            }
+            break;
+        case T_InsertStmt:/*an insert action*/
+            {
+                InsertStmt *insertact = (InsertStmt *) condact->action;
+                Assert(insertact->isMergeAction);
+
+                if(condact->match)
+                    ereport(ERROR,
+                            (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+                            errmsg("The INSERT action in MERGE command is not allowed when MATCHED")));
+
+                pstate->p_target_rangetblentry->requiredPerms |= ACL_INSERT;
+
+                /*the "cols" and "selectStmt" of the insertact is filled in the parser */
+                insertact->relation = stmt->relation;
+
+                /*
+                the merge insert action has a strange feature.
+                In an ordinary INSERT, the VALUES list can only
+                contains constants and DEFAULT. (am I right??)
+                But in the INSERT action of MERGE command,
+                the VALUES list can have expressions with
+                variables(attributes of the target and source tables).
+                Besides, in the ordinary INSERT, a VALUES list can
+                never be followed by a WHERE clause.
+                But in MERGE INSERT action, there are matching conditions.
+
+                Thus, the output qry of this function is an INSERT
+                query in the style of "INSERT...VALUES...",
+                except that we have other range tables and a WHERE clause.
+                Note that it is also different from the "INSERT ... SELECT..."
+                query, in which the whole SELECT is a subquery.
+                (We don't have subquery here).
+
+                We construct this novel query structure in order
+                to keep consistency with other merge action types
+                (DELETE, UPDATE). In this way, all the merge action
+                queries are in fact share the very same Range Table,
+                They only differs in their target lists and join trees
+                */
+
+                /*parse the action query, this will call
+                transformInsertStmt() which analyzes the VALUES list.*/
+                actqry = transformStmt(pstate, (Node *)insertact);
+
+                /*do the WHERE clause here, Since the
+                transformInsertStmt() function only analyzes
+                the VALUES list but not the WHERE clause*/
+                actqry->jointree = makeFromExpr(pstate->p_joinlist,
+                                            transformWhereClause(pstate,
+                                                            condact->condition,
+                                                            "WHERE"));
+                if(!IsA(actqry, Query) ||
+                    actqry->commandType != CMD_INSERT||
+                    actqry->utilityStmt != NULL)
+                    elog(ERROR, "improper INSERT action in merge stmt");
+
+                actqry->isMergeAction = true;
+                actqry->matched = condact->match;
+
+                return actqry;
+            }
+            break;
+        case T_MergeDoNothing:
+            {
+                MergeDoNothing *nothingact;
+
+                nothingact = (MergeDoNothing *)(condact->action);
+
+                Assert(IsA(nothingact,MergeDoNothing));
+
+                actqry = makeNode(Query);
+
+                actqry->jointree = makeFromExpr(pstate->p_joinlist,
+                                            transformWhereClause(pstate,
+                                                            condact->condition,
+                                                            "WHERE"));
+                actqry->rtable = pstate->p_rtable;
+
+                actqry->commandType = CMD_DONOTHING;
+                actqry->isMergeAction = true;
+                actqry->matched = condact->match;
+
+                return actqry;
+            }
+            break;
+        default:
+            elog(ERROR, "unknown MERGE action type %d", condact->action->type);
+            break;
+    }
+
+    /*never comes here*/
+    return NULL;
+}
+
+static Query *
+transformMergeStmt(ParseState *pstate, MergeStmt *stmt)
+{
+    Query       *qry;
+
+    ColumnRef *starRef;
+    ResTarget *starResTarget;
+    ListCell *act;
+    ListCell *l;
+    JoinExpr *joinexp;
+    RangeTblEntry *resultRTE = NULL;
+    RangeTblEntry *topjoinRTE = NULL;
+    RangeTblEntry *sourceRTE = NULL;
+
+    /*firstly, create the output node structure*/
+    qry = makeNode(Query);
+    qry->commandType = CMD_MERGE;
+
+    /*
+     * What we are doing here is to create a query like
+     * "SELECT * FROM <source_rel> LEFT JOIN <target_rel> ON <match_condition>;"
+     *
+     * Note:
+     * 1. we set the "match condition" as the join qualification.
+     * The left join will scan both the matched and non-matched tuples.
+     *
+     * 2. a normal SELECT query has no "target relation".
+     * But here we need to set the target relation in query,
+     * like the UPDATE/DELETE/INSERT queries.
+     * So this is a left join SELECT with a "target table" in its range table.
+     *
+     * 3. We don't have a specific ACL level for Merge, here we just use
+     * ACL_SELECT.
+     * But we will add other ACL levels when handle each merge actions.
+     */
+
+    /*
+     * Before transforming the FROM clause, acquire write lock on the
+     * target relation. We don't want to add it to the range table yet,
+     * so we use setTargetTableLock() instead of setTargetTable().
+     */
+    setTargetTableLock(pstate, stmt->relation);
+
+    /*
+    * Make the join expression on source table and target table,
+    * as the only element in FROM list
+    */
+    joinexp = makeNode(JoinExpr);
+    joinexp->jointype = JOIN_LEFT;
+    joinexp->isNatural = FALSE;
+    joinexp->larg = stmt->source;
+    joinexp->rarg = (Node *)stmt->relation;
+    joinexp->quals = stmt->matchCondition;
+
+    /*
+     * transform the FROM clause. The target relation and
+     * source relation will be added to the range table here.
+     */
+    transformFromClause(pstate, list_make1(joinexp));
+
+    /* the targetList of the main query is "*" */
+    starRef = makeNode(ColumnRef);
+    starRef->fields = list_make1(makeNode(A_Star));
+    starRef->location = 1;
+
+    starResTarget = makeNode(ResTarget);
+    starResTarget->name = NULL;
+    starResTarget->indirection = NIL;
+    starResTarget->val = (Node *)starRef;
+    starResTarget->location = 1;
+
+    qry->targetList = transformTargetList(pstate, list_make1(starResTarget));
+
+    /* we don't need a WHERE clause here. Set it null. */
+    qry->jointree = makeFromExpr(pstate->p_joinlist, NULL);
+
+    /*
+    *The range table of a MERGE command query has only 3 RTE.
+    *RTE 1 is the source table, RTE 2 is the target table, RTE 3 is the left join
+    *between source and target table.
+    */
+    qry->rtable = pstate->p_rtable;
+
+    /*Set RTE 2 as the result relation of query*/
+    qry->resultRelation = 2;
+    resultRTE = rt_fetch(qry->resultRelation, pstate->p_rtable);
+    if(resultRTE->relid != pstate->p_target_relation->rd_id)
+        elog(ERROR, "The target relation entry should be the second one in range table");
+
+    resultRTE->requiredPerms = ACL_SELECT;
+    resultRTE->inh = interpretInhOption(stmt->relation->inhOpt);
+    pstate->p_target_rangetblentry = resultRTE;
+
+    /*
+      *we also need to find out how many attributes are there in the source table.
+      *There are many ways to do this. Here, I choose to scan the join var list of the
+      *top join table entry. And count how many vars belong to source table.
+      */
+    topjoinRTE = rt_fetch(Merge_TopJoinTableRTindex, pstate->p_rtable);
+    Assert(topjoinRTE->jointype == JOIN_LEFT);
+
+    qry->sourceAttrNo = 0;
+    foreach(l, topjoinRTE->joinaliasvars)
+    {
+        Var *jv = (Var *)lfirst(l);
+
+        if(jv->varno == Merge_SourceTableRTindex)
+            qry->sourceAttrNo++;
+    }
+    /*lets do a simple check here*/
+    sourceRTE = rt_fetch(Merge_SourceTableRTindex, pstate->p_rtable);
+    Assert(qry->sourceAttrNo == list_length(sourceRTE->eref->colnames));
+
+    /*
+     * For each action, transform it to a separate query.
+     * The action queries share the range table with the main query.
+     *
+     * In other words, in the extra conditions of the sub actions,
+     * we don't allow involvement of new tables
+     */
+    qry->mergeActQry = NIL;
+
+    foreach(act,stmt->actions)
+    {
+        MergeConditionAction *mca = (MergeConditionAction *) lfirst(act);
+        Query *actqry;
+
+        /* transform the act (and its condition) as a single query. */
+        actqry = transformMergeActions(pstate, stmt, mca);
+
+        /*
+         * since we don't invoke setTargetTable() in transformMergeActions(),
+         * we need to set actqry->resultRelation here
+         */
+        actqry->resultRelation = qry->resultRelation;
+
+        /*record the source attr number in each action query, for latter use*/
+        actqry->sourceAttrNo = qry->sourceAttrNo;
+
+        /* put it into the list */
+        qry->mergeActQry = lappend(qry->mergeActQry, actqry);
+    }
+
+    /*
+    * set the sublink mark at the last.
+    * Thus, the sublink in actions will be counted in.
+    */
+    qry->hasSubLinks = pstate->p_hasSubLinks;
+
+    return qry;
+}
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 609c472..f045abd 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -212,6 +212,11 @@ static RangeVar *makeRangeVarFromAnyName(List *names, int position, core_yyscan_
         DeallocateStmt PrepareStmt ExecuteStmt
         DropOwnedStmt ReassignOwnedStmt
         AlterTSConfigurationStmt AlterTSDictionaryStmt
+        MergeStmt
+
+%type <node>    opt_and_condition merge_condition_action merge_action
+%type <boolean> opt_not
+%type <list>     merge_condition_action_list

 %type <node>    select_no_parens select_with_parens select_clause
                 simple_select values_clause
@@ -483,7 +488,7 @@ static RangeVar *makeRangeVarFromAnyName(List *names, int position, core_yyscan_
     DEFERRABLE DEFERRED DEFINER DELETE_P DELIMITER DELIMITERS DESC
     DICTIONARY DISABLE_P DISCARD DISTINCT DO DOCUMENT_P DOMAIN_P DOUBLE_P DROP

-    EACH ELSE ENABLE_P ENCODING ENCRYPTED END_P ENUM_P ESCAPE EXCEPT
+    EACH ELSE ENABLE_P ENCODING ENCRYPTED END_P ENUM_P ERROR_P ESCAPE EXCEPT
     EXCLUDE EXCLUDING EXCLUSIVE EXECUTE EXISTS EXPLAIN EXTERNAL EXTRACT

     FALSE_P FAMILY FETCH FIRST_P FLOAT_P FOLLOWING FOR FORCE FOREIGN FORWARD
@@ -506,7 +511,7 @@ static RangeVar *makeRangeVarFromAnyName(List *names, int position, core_yyscan_
     LEAST LEFT LEVEL LIKE LIMIT LISTEN LOAD LOCAL LOCALTIME LOCALTIMESTAMP
     LOCATION LOCK_P LOGIN_P

-    MAPPING MATCH MAXVALUE MINUTE_P MINVALUE MODE MONTH_P MOVE
+    MAPPING MATCH MATCHED MAXVALUE MERGE MINUTE_P MINVALUE MODE MONTH_P MOVE

     NAME_P NAMES NATIONAL NATURAL NCHAR NEXT NO NOCREATEDB
     NOCREATEROLE NOCREATEUSER NOINHERIT NOLOGIN_P NONE NOSUPERUSER
@@ -521,7 +526,7 @@ static RangeVar *makeRangeVarFromAnyName(List *names, int position, core_yyscan_

     QUOTE

-    RANGE READ REAL REASSIGN RECHECK RECURSIVE REF REFERENCES REINDEX
+    RAISE RANGE READ REAL REASSIGN RECHECK RECURSIVE REF REFERENCES REINDEX
     RELATIVE_P RELEASE RENAME REPEATABLE REPLACE REPLICA RESET RESTART
     RESTRICT RETURNING RETURNS REVOKE RIGHT ROLE ROLLBACK ROW ROWS RULE

@@ -730,6 +735,7 @@ stmt :
             | ListenStmt
             | LoadStmt
             | LockStmt
+            | MergeStmt
             | NotifyStmt
             | PrepareStmt
             | ReassignOwnedStmt
@@ -7128,6 +7134,7 @@ ExplainableStmt:
             | InsertStmt
             | UpdateStmt
             | DeleteStmt
+            | MergeStmt
             | DeclareCursorStmt
             | CreateAsStmt
             | ExecuteStmt                    /* by default all are $$=$1 */
@@ -7476,6 +7483,112 @@ set_target_list:
 /*****************************************************************************
  *
  *        QUERY:
+ *                MERGE STATEMENT
+ *
+ *****************************************************************************/
+
+
+MergeStmt:
+            MERGE INTO relation_expr_opt_alias
+            USING  table_ref
+            ON a_expr
+            merge_condition_action_list
+                {
+                    MergeStmt *m = makeNode(MergeStmt);
+
+                    m->relation = $3;
+                    m->source = $5;
+                    m->matchCondition = $7;
+                    m->actions = $8;
+
+                    $$ = (Node *)m;
+                }
+                ;
+
+merge_condition_action_list:
+            merge_condition_action
+                { $$ = list_make1($1); }
+            | merge_condition_action_list merge_condition_action
+                { $$ = lappend($1,$2); }
+                ;
+
+merge_condition_action:
+            WHEN opt_not MATCHED opt_and_condition THEN merge_action
+                {
+                    MergeConditionAction *m = makeNode(MergeConditionAction);
+
+                    m->match = $2;
+                    m->condition = $4;
+                    m->action = $6;
+
+                    $$ = (Node *)m;
+                }
+                ;
+
+opt_and_condition:
+            AND a_expr         { $$ = $2; }
+            | /*EMPTY*/     { $$ = NULL; }
+                ;
+
+opt_not:
+            NOT            { $$ = false; }
+            | /*EMPTY*/    { $$ = true; }
+                ;
+
+merge_action:
+            DELETE_P
+                {
+                    DeleteStmt *n = makeNode(DeleteStmt);
+                    n->isMergeAction = true;
+                    $$ = (Node *) n;
+                }
+            | UPDATE SET set_clause_list
+                {
+                    UpdateStmt *n = makeNode(UpdateStmt);
+                    n->targetList = $3;
+                    n->isMergeAction = true;
+
+                    $$ = (Node *) n;
+                }
+            | INSERT values_clause
+                {
+                    InsertStmt *n = makeNode(InsertStmt);
+                    n->cols = NIL;
+                    n->selectStmt = $2;
+                    n->isMergeAction = true;
+
+                    $$ = (Node *) n;
+                }
+
+            | INSERT '(' insert_column_list ')' values_clause
+                {
+                    InsertStmt *n = makeNode(InsertStmt);
+                    n->cols = $3;
+                    n->selectStmt = $5;
+                    n->isMergeAction = true;
+
+                    $$ = (Node *) n;
+                }
+            | INSERT DEFAULT VALUES
+                {
+                    InsertStmt *n = makeNode(InsertStmt);
+                    n->cols = NIL;
+                    n->selectStmt = NULL;
+                    n->isMergeAction = true;
+
+                    $$ = (Node *) n;
+                }
+            | DO NOTHING
+                {
+                    $$ = (Node *) makeNode(MergeDoNothing);
+                }
+            ;
+
+
+
+/*****************************************************************************
+ *
+ *        QUERY:
  *                CURSOR STATEMENTS
  *
  *****************************************************************************/
@@ -11108,6 +11221,7 @@ unreserved_keyword:
             | ENCODING
             | ENCRYPTED
             | ENUM_P
+            | ERROR_P
             | ESCAPE
             | EXCLUDE
             | EXCLUDING
@@ -11162,7 +11276,9 @@ unreserved_keyword:
             | LOGIN_P
             | MAPPING
             | MATCH
+            | MATCHED
             | MAXVALUE
+            | MERGE
             | MINUTE_P
             | MINVALUE
             | MODE
@@ -11205,6 +11321,7 @@ unreserved_keyword:
             | PROCEDURAL
             | PROCEDURE
             | QUOTE
+            | RAISE
             | RANGE
             | READ
             | REASSIGN
diff --git a/src/backend/parser/parse_clause.c b/src/backend/parser/parse_clause.c
index 5e13fa4..d149f5e 100644
--- a/src/backend/parser/parse_clause.c
+++ b/src/backend/parser/parse_clause.c
@@ -214,6 +214,25 @@ setTargetTable(ParseState *pstate, RangeVar *relation,
     return rtindex;
 }

+void
+setTargetTableLock(ParseState *pstate, RangeVar *relation)
+{
+
+    /* Close old target; this could only happen for multi-action rules */
+    if (pstate->p_target_relation != NULL)
+        heap_close(pstate->p_target_relation, NoLock);
+
+    /*
+     * Open target rel and grab suitable lock (which we will hold till end of
+     * transaction).
+     *
+     * free_parsestate() will eventually do the corresponding heap_close(),
+     * but *not* release the lock.
+     */
+    pstate->p_target_relation = parserOpenTable(pstate, relation,
+                                                RowExclusiveLock);
+}
+
 /*
  * Simplify InhOption (yes/no/default) into boolean yes/no.
  *
diff --git a/src/backend/rewrite/rewriteHandler.c b/src/backend/rewrite/rewriteHandler.c
index 31b2595..c6a34cc 100644
--- a/src/backend/rewrite/rewriteHandler.c
+++ b/src/backend/rewrite/rewriteHandler.c
@@ -2000,6 +2000,41 @@ RewriteQuery(Query *parsetree, List *rewrite_events)
     return rewritten;
 }

+/*if the merge action type has already been processed by rewriter*/
+#define insert_rewrite (1<<0)
+#define delete_rewrite (1<<1)
+#define update_rewrite (1<<2)
+
+/*if the merge action type is fully replace by rules.*/
+#define    insert_instead (1<<3)
+#define delete_instead (1<<4)
+#define update_instead (1<<5)
+
+#define merge_action_already_rewrite(acttype, flag) \
+    ((acttype == CMD_INSERT && (flag & insert_rewrite)) || \
+        (acttype == CMD_UPDATE && (flag & update_rewrite)) || \
+        (acttype == CMD_DELETE && (flag & delete_rewrite)))
+
+#define set_action_rewrite(acttype, flag)    \
+                if(acttype == CMD_INSERT)  \
+                    {flag |= insert_rewrite;}\
+                else if(acttype == CMD_UPDATE)  \
+                    {flag |= update_rewrite;}\
+                else if(acttype == CMD_DELETE)  \
+                    {flag |= delete_rewrite;}
+
+#define merge_action_instead(acttype, flag)        \
+            ((acttype == CMD_INSERT && (flag & insert_instead)) || \
+                (acttype == CMD_UPDATE && (flag & update_instead)) || \
+                (acttype == CMD_DELETE && (flag & delete_instead)))
+
+#define set_action_instead(acttype, flag)\
+                if(acttype == CMD_INSERT)  \
+                    {flag |= insert_instead;}\
+                else if(acttype == CMD_UPDATE)  \
+                    {flag |= update_instead;}\
+                else if(acttype == CMD_DELETE)  \
+                    {flag |= delete_instead;}

 /*
  * QueryRewrite -
@@ -2025,7 +2060,148 @@ QueryRewrite(Query *parsetree)
      *
      * Apply all non-SELECT rules possibly getting 0 or many queries
      */
-    querylist = RewriteQuery(parsetree, NIL);
+    if (parsetree->commandType == CMD_MERGE)
+    {
+        /*
+         * for MERGE, we have a set of action queries (not subquery).
+         * each of these action queries should rewritten with RewriteQuery().
+         */
+        ListCell   *l;
+        int     flag = 0;
+        List     *pre_qry = NIL;
+        List     *post_qry = NIL;
+
+        querylist = NIL;
+
+        /*rewrite the merge action queries one by one.*/
+        foreach(l, parsetree->mergeActQry)
+        {
+            List *queryList4action = NIL;
+            Query *actionqry;
+            Query *q;
+
+            actionqry = lfirst(l);
+
+            /* no rewriting for DO NOTHING action*/
+            if (actionqry->commandType == CMD_DONOTHING)
+                continue;
+
+            /*
+             * if this kind of actions are fully replaced by rules,
+             * we change it into a DO NOTHING action
+             */
+            if (merge_action_instead(actionqry->commandType, flag))
+            {
+                /*
+                 * Still need to call RewriteQuery(),
+                 * since we need the process on target list and so on.
+                 * BUT, the returned list is discarded
+                 */
+                RewriteQuery(actionqry, NIL);
+                actionqry->commandType = CMD_DONOTHING;
+                actionqry->targetList = NIL;
+                continue;
+            }
+
+            /* if this kind of actions are already processed by rewriter, skip it.*/
+            if (merge_action_already_rewrite(actionqry->commandType, flag))
+            {
+                RewriteQuery(actionqry, NIL);
+                continue;
+            }
+
+            /* ok this action has not been processed before, let's do it now. */
+            queryList4action = RewriteQuery(actionqry, NIL);
+
+            /* this kind of actions has been processed, set the flag */
+            set_action_rewrite(actionqry->commandType, flag);
+
+            /*
+             * if the returning list is empty, this merge action
+             * is replaced by a do-nothing rule
+             */
+            if (queryList4action == NIL)
+            {
+                /* set the flag for other merge actions of the same type */
+                set_action_instead(actionqry->commandType, flag);
+                actionqry->commandType = CMD_DONOTHING;
+                actionqry->targetList = NIL;
+                continue;
+            }
+
+            /*
+             * if the rewriter return a non-NIL list, the merge action query
+             * could be one element in it.
+             * if so, it must be the head (for INSERT action)
+             * or tail (for UPDATE/DELETE action).
+             */
+
+            /* test the list head */
+            q = (Query *) linitial(queryList4action);
+            if (q->querySource == QSRC_ORIGINAL)
+            {
+                /*
+                 * the merge action is the head, the remaining part
+                 * of the list are the queries generated by rules
+                 * we put them in the post_qry list.
+                 */
+                if (querylist == NIL)
+                    querylist = list_make1(parsetree);
+
+                queryList4action = list_delete_first(queryList4action);
+                post_qry = list_concat(post_qry,queryList4action);
+
+                continue;
+            }
+
+            /*test the list tail*/
+            q = (Query *) llast(queryList4action);
+            if (q->querySource == QSRC_ORIGINAL)
+            {
+                /*
+                 * the merge action is the tail.
+                 * Put the rule queries in pre_qry list
+                 */
+                if (querylist == NIL)
+                    querylist = list_make1(parsetree);
+
+                queryList4action = list_truncate(queryList4action,
+                                                 list_length(queryList4action) - 1);
+
+                pre_qry = list_concat(pre_qry,queryList4action);
+                continue;
+            }
+
+            /*
+             * here, the merge action query is not in the rewritten query list,
+             * which means the action is replaced by INSTEAD rule(s).
+             * We need to change it into do noting action.
+             *
+             * For a INSERT action, we put the rule queries in the post list
+             * otherwise, in the pre list
+             */
+            if(actionqry->commandType == CMD_INSERT)
+                post_qry = list_concat(post_qry,queryList4action);
+            else
+                pre_qry = list_concat(pre_qry,queryList4action);
+
+            set_action_instead(actionqry->commandType, flag);
+            actionqry->commandType = CMD_DONOTHING;
+            actionqry->targetList = NIL;
+        }
+
+        /*
+         * finally, put the 3 lists into one.
+         * If all the merge actions are replaced by rules,
+         * the original merge query
+         * will not be involved in the querylist.
+         */
+        querylist = list_concat(pre_qry,querylist);
+        querylist = list_concat(querylist, post_qry);
+
+    }
+    else
+        querylist = RewriteQuery(parsetree, NIL);

     /*
      * Step 2
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index cba90a9..375923e 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -979,7 +979,7 @@ exec_simple_query(const char *query_string)
                                                 NULL, 0);

         plantree_list = pg_plan_queries(querytree_list, 0, NULL);
-
+//printf("the plann is \n%s\n", nodeToString(plantree_list));
         /* Done with the snapshot used for parsing/planning */
         if (snapshot_set)
             PopActiveSnapshot();
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 8eb02da..52ed285 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -225,6 +225,10 @@ ProcessQuery(PlannedStmt *plan,
                 snprintf(completionTag, COMPLETION_TAG_BUFSIZE,
                          "DELETE %u", queryDesc->estate->es_processed);
                 break;
+            case CMD_MERGE:
+                snprintf(completionTag, COMPLETION_TAG_BUFSIZE,
+                         "MERGE %u", queryDesc->estate->es_processed);
+                break;
             default:
                 strcpy(completionTag, "???");
                 break;
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index 75cb354..e583348 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -126,6 +126,7 @@ CommandIsReadOnly(Node *parsetree)
             case CMD_UPDATE:
             case CMD_INSERT:
             case CMD_DELETE:
+            case CMD_MERGE:
                 return false;
             default:
                 elog(WARNING, "unrecognized commandType: %d",
@@ -1412,6 +1413,10 @@ CreateCommandTag(Node *parsetree)
             tag = "SELECT";
             break;

+        case T_MergeStmt:
+            tag = "MERGE";
+            break;
+
             /* utility statements --- same whether raw or cooked */
         case T_TransactionStmt:
             {
@@ -2257,6 +2262,7 @@ GetCommandLogLevel(Node *parsetree)
         case T_InsertStmt:
         case T_DeleteStmt:
         case T_UpdateStmt:
+        case T_MergeStmt:
             lev = LOGSTMT_MOD;
             break;

diff --git a/src/include/commands/trigger.h b/src/include/commands/trigger.h
index 30dc314..a41a169 100644
--- a/src/include/commands/trigger.h
+++ b/src/include/commands/trigger.h
@@ -178,6 +178,8 @@ extern void ExecBSTruncateTriggers(EState *estate,
                        ResultRelInfo *relinfo);
 extern void ExecASTruncateTriggers(EState *estate,
                        ResultRelInfo *relinfo);
+extern void ExecBSMergeTriggers(ModifyTableState *mt_state);
+extern void ExecASMergeTriggers(ModifyTableState *mt_state);

 extern void AfterTriggerBeginXact(void);
 extern void AfterTriggerBeginQuery(void);
diff --git a/src/include/executor/nodeModifyTable.h b/src/include/executor/nodeModifyTable.h
index a8a28a4..96b46f0 100644
--- a/src/include/executor/nodeModifyTable.h
+++ b/src/include/executor/nodeModifyTable.h
@@ -16,6 +16,7 @@
 #include "nodes/execnodes.h"

 extern ModifyTableState *ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags);
+extern MergeActionState *ExecInitMergeAction(MergeAction *node, EState *estate, int eflags);
 extern TupleTableSlot *ExecModifyTable(ModifyTableState *node);
 extern void ExecEndModifyTable(ModifyTableState *node);
 extern void ExecReScanModifyTable(ModifyTableState *node);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 78188a3..19c0d43 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1032,9 +1032,23 @@ typedef struct ModifyTableState
     int            mt_whichplan;    /* which one is being executed (0..n-1) */
     EPQState    mt_epqstate;    /* for evaluating EvalPlanQual rechecks */
     bool        fireBSTriggers; /* do we need to fire stmt triggers? */
+    MergeActionSet **mt_mergeActPstates; /*the list of the planstate of merge command actions.
+                                            NULL if this is not a merge command.
+                                            The elements if it are still MergeActionSet nodes.
+                                            But the action list in these nodes are ModifyTableState */
 } ModifyTableState;

 /* ----------------
+ *     MergeActionState information
+ * ----------------
+ */
+typedef struct MergeActionState
+{
+    PlanState    ps;                /* its first field is NodeTag */
+    CmdType        operation;
+} MergeActionState;
+
+/* ----------------
  *     AppendState information
  *
  *        nplans            how many plans are in the array
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 15dabe3..293800c 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -44,6 +44,8 @@ typedef enum NodeTag
     T_Plan = 100,
     T_Result,
     T_ModifyTable,
+    T_MergeAction,
+    T_MergeActionSet,
     T_Append,
     T_MergeAppend,
     T_RecursiveUnion,
@@ -87,6 +89,7 @@ typedef enum NodeTag
     T_PlanState = 200,
     T_ResultState,
     T_ModifyTableState,
+    T_MergeActionState,
     T_AppendState,
     T_MergeAppendState,
     T_RecursiveUnionState,
@@ -350,6 +353,10 @@ typedef enum NodeTag
     T_AlterUserMappingStmt,
     T_DropUserMappingStmt,
     T_AlterTableSpaceOptionsStmt,
+    T_MergeStmt,
+    T_MergeConditionAction,
+    T_MergeDoNothing,
+    T_MergeError,
     T_SecLabelStmt,

     /*
@@ -515,6 +522,8 @@ typedef enum CmdType
     CMD_UPDATE,                    /* update stmt */
     CMD_INSERT,                    /* insert stmt */
     CMD_DELETE,
+    CMD_MERGE,                    /* merge stmt */
+    CMD_DONOTHING,                /*DO NOTHING action in MERGE command*/
     CMD_UTILITY,                /* cmds like create, destroy, copy, vacuum,
                                  * etc. */
     CMD_NOTHING                    /* dummy command for instead nothing rules
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index e0bdebd..3293b54 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -149,8 +149,24 @@ typedef struct Query

     List       *constraintDeps;    /* a list of pg_constraint OIDs that the query
                                  * depends on to be semantically valid */
+
+    /* fields for MERGE command */
+    bool    isMergeAction;        /* if this query is a merge action. */
+    bool    matched;            /* this is a MATCHED action or NOT */
+    int        sourceAttrNo; /*the number of attributes in source table*/
+    List   *mergeActQry;        /* the list of all the merge actions.
+                                 * used only for merge query statement */
 } Query;

+/*
+* In MERGE command, the initial MERGE query has only three range table entry
+*The first one is the source table; The second one is the target table; and The
+*third one is the left join entry of them.
+*During the whole process of a MEREGE command, the rt_index for the source table
+*entry and the join table entry will never change. We set them as constant here.
+*/
+#define Merge_SourceTableRTindex    1
+#define Merge_TopJoinTableRTindex     3

 /****************************************************************************
  *    Supporting data structures for Parse Trees
@@ -892,6 +908,7 @@ typedef struct CommonTableExpr
 typedef struct InsertStmt
 {
     NodeTag        type;
+    bool        isMergeAction;    /*if this is a merge insert action*/
     RangeVar   *relation;        /* relation to insert into */
     List       *cols;            /* optional: names of the target columns */
     Node       *selectStmt;        /* the source SELECT/VALUES, or NULL */
@@ -906,6 +923,7 @@ typedef struct InsertStmt
 typedef struct DeleteStmt
 {
     NodeTag        type;
+    bool        isMergeAction;    /*if this is a merge delete action*/
     RangeVar   *relation;        /* relation to delete from */
     List       *usingClause;    /* optional using clause for more tables */
     Node       *whereClause;    /* qualifications */
@@ -920,6 +938,7 @@ typedef struct DeleteStmt
 typedef struct UpdateStmt
 {
     NodeTag        type;
+    bool        isMergeAction;    /*if this is a merge delete action*/
     RangeVar   *relation;        /* relation to update */
     List       *targetList;        /* the target list (of ResTarget) */
     Node       *whereClause;    /* qualifications */
@@ -996,6 +1015,54 @@ typedef struct SelectStmt
     /* Eventually add fields for CORRESPONDING spec here */
 } SelectStmt;

+/* ----------------------
+ *        Merge Statement
+ * ----------------------
+ */
+typedef struct MergeStmt
+{
+    NodeTag        type;
+    RangeVar   *relation;        /* target relation for merge */
+
+    /*
+     * source relations for the merge.
+     * Currently, we only allow single-source merge,
+     * so the length of this list should always be 1.
+     */
+    Node        *source;
+    Node           *matchCondition;    /* qualifications of the merge */
+
+    /* list  of MergeConditionAction structure.
+     * It stores all the matched / not-matched
+     * conditions and the corresponding actions
+     * The elements of this list are MergeConditionAction
+     * nodes.
+     */
+    List           *actions;
+} MergeStmt;
+
+/*
+ * the structure for the actions of MERGE command.
+ * Holds info of the clauses like
+ * "WHEN MATCHED AND ... THEN UPDATE/DELETE/INSERT"
+ */
+typedef struct MergeConditionAction
+{
+    NodeTag        type;
+    bool         match;            /* WHEN MATCHED or WHEN NOT MATCHED? */
+    Node       *condition;        /* the AND condition for this action */
+    Node        *action;            /* the actions: delete, insert or update */
+} MergeConditionAction;
+
+typedef struct MergeDoNothing
+{
+    NodeTag     type;
+} MergeDoNothing;
+
+typedef struct MergeError
+{
+    NodeTag     type;
+} MergeError;

 /* ----------------------
  *        Set Operation node for post-analysis query trees
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 81038f6..6791b34 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -169,9 +169,38 @@ typedef struct ModifyTable
     List       *returningLists; /* per-target-table RETURNING tlists */
     List       *rowMarks;        /* PlanRowMarks (non-locking only) */
     int            epqParam;        /* ID of Param for EvalPlanQual re-eval */
+    List         *mergeActPlan;    /* the plans for merge actions,
+                                 * which are MergeActionSet nodes.
+                                 * one set for one target relation */
 } ModifyTable;

 /* ----------------
+ *     MergeActionSet node -
+ *        The node contains all actions of MERGE command for one specific target relation
+ * ----------------
+ */
+typedef struct MergeActionSet
+{
+    NodeTag type;
+    int result_relation;
+    List *actions;
+}MergeActionSet;
+
+/* ----------------
+ *     MergeAction node -
+ *        The plan node for the actions of MERGE command
+ * ----------------
+ */
+typedef struct MergeAction
+{
+    Plan        plan;
+    CmdType        operation;        /* INSERT, UPDATE, DELETE, DO_NOTHING or RAISE_ERR */
+    bool         matched;        /* is this a MATCHED or NOT MATCHED rule? */
+    List           *flattenedqual;    /* the flattened qual expression of action */
+    List        *flattenedtlist;/*the fattened target list*/
+} MergeAction;
+
+/* ----------------
  *     Append node -
  *        Generate the concatenation of the results of sub-plans.
  * ----------------
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index 5bb0e09..3e8bd0f 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -80,7 +80,7 @@ extern Result *make_result(PlannerInfo *root, List *tlist,
             Node *resconstantqual, Plan *subplan);
 extern ModifyTable *make_modifytable(CmdType operation, List *resultRelations,
                  List *subplans, List *returningLists,
-                 List *rowMarks, int epqParam);
+                 List *rowMarks, List *mergeActPlans, int epqParam);
 extern bool is_projection_capable_plan(Plan *plan);

 /*
diff --git a/src/include/optimizer/prep.h b/src/include/optimizer/prep.h
index f8dd542..9ed2a60 100644
--- a/src/include/optimizer/prep.h
+++ b/src/include/optimizer/prep.h
@@ -53,4 +53,6 @@ extern void expand_inherited_tables(PlannerInfo *root);

 extern Node *adjust_appendrel_attrs(Node *node, AppendRelInfo *appinfo);

+extern List *adjust_inherited_tlist(List * tlist, AppendRelInfo * context);
+
 #endif   /* PREP_H */
diff --git a/src/include/optimizer/var.h b/src/include/optimizer/var.h
index 8fd9977..e9f4e17 100644
--- a/src/include/optimizer/var.h
+++ b/src/include/optimizer/var.h
@@ -15,6 +15,7 @@
 #define VAR_H

 #include "nodes/relation.h"
+#include "nodes/plannodes.h"

 typedef enum
 {
@@ -32,5 +33,6 @@ extern int    locate_var_of_relation(Node *node, int relid, int levelsup);
 extern int    find_minimum_var_level(Node *node);
 extern List *pull_var_clause(Node *node, PVCPlaceHolderBehavior behavior);
 extern Node *flatten_join_alias_vars(PlannerInfo *root, Node *node);
+extern void push_up_merge_action_vars(MergeAction *actplan, Query *actqry);

 #endif   /* VAR_H */
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index d3ea04b..0863234 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -142,6 +142,7 @@ PG_KEYWORD("encoding", ENCODING, UNRESERVED_KEYWORD)
 PG_KEYWORD("encrypted", ENCRYPTED, UNRESERVED_KEYWORD)
 PG_KEYWORD("end", END_P, RESERVED_KEYWORD)
 PG_KEYWORD("enum", ENUM_P, UNRESERVED_KEYWORD)
+PG_KEYWORD("error", ERROR_P, UNRESERVED_KEYWORD)
 PG_KEYWORD("escape", ESCAPE, UNRESERVED_KEYWORD)
 PG_KEYWORD("except", EXCEPT, RESERVED_KEYWORD)
 PG_KEYWORD("exclude", EXCLUDE, UNRESERVED_KEYWORD)
@@ -231,7 +232,9 @@ PG_KEYWORD("lock", LOCK_P, UNRESERVED_KEYWORD)
 PG_KEYWORD("login", LOGIN_P, UNRESERVED_KEYWORD)
 PG_KEYWORD("mapping", MAPPING, UNRESERVED_KEYWORD)
 PG_KEYWORD("match", MATCH, UNRESERVED_KEYWORD)
+PG_KEYWORD("matched", MATCHED, UNRESERVED_KEYWORD)
 PG_KEYWORD("maxvalue", MAXVALUE, UNRESERVED_KEYWORD)
+PG_KEYWORD("merge", MERGE, UNRESERVED_KEYWORD)
 PG_KEYWORD("minute", MINUTE_P, UNRESERVED_KEYWORD)
 PG_KEYWORD("minvalue", MINVALUE, UNRESERVED_KEYWORD)
 PG_KEYWORD("mode", MODE, UNRESERVED_KEYWORD)
@@ -298,6 +301,7 @@ PG_KEYWORD("privileges", PRIVILEGES, UNRESERVED_KEYWORD)
 PG_KEYWORD("procedural", PROCEDURAL, UNRESERVED_KEYWORD)
 PG_KEYWORD("procedure", PROCEDURE, UNRESERVED_KEYWORD)
 PG_KEYWORD("quote", QUOTE, UNRESERVED_KEYWORD)
+PG_KEYWORD("raise", RAISE, UNRESERVED_KEYWORD)
 PG_KEYWORD("range", RANGE, UNRESERVED_KEYWORD)
 PG_KEYWORD("read", READ, UNRESERVED_KEYWORD)
 PG_KEYWORD("real", REAL, COL_NAME_KEYWORD)
diff --git a/src/include/parser/parse_clause.h b/src/include/parser/parse_clause.h
index 0e8a80e..831421b 100644
--- a/src/include/parser/parse_clause.h
+++ b/src/include/parser/parse_clause.h
@@ -19,6 +19,7 @@
 extern void transformFromClause(ParseState *pstate, List *frmList);
 extern int setTargetTable(ParseState *pstate, RangeVar *relation,
                bool inh, bool alsoSource, AclMode requiredPerms);
+extern void setTargetTableLock(ParseState *pstate, RangeVar *relation);
 extern bool interpretInhOption(InhOption inhOpt);
 extern bool interpretOidsOption(List *defList);

diff --git a/src/test/regress/expected/merge.out b/src/test/regress/expected/merge.out
new file mode 100644
index 0000000..7ee155a
--- /dev/null
+++ b/src/test/regress/expected/merge.out
@@ -0,0 +1,202 @@
+-- MERGE --
+--
+-- create basic tables for test, and insert some source rows to work from
+--
+--The Stock table, which records the amount of each item on hand.
+CREATE TABLE Stock(item_id int UNIQUE, balance int);
+NOTICE:  CREATE TABLE / UNIQUE will create implicit index "stock_item_id_key" for table "stock"
+INSERT INTO Stock VALUES (10, 2200);
+INSERT INTO Stock VALUES (20, 1900);
+SELECT * FROM Stock;
+ item_id | balance
+---------+---------
+      10 |    2200
+      20 |    1900
+(2 rows)
+
+--The Buy table, which records the amount we bought today for each item.
+CREATE TABLE Buy(item_id int, volume int);
+INSERT INTO Buy values(10, 1000);
+INSERT INTO Buy values(30, 300);
+SELECT * FROM Buy;
+ item_id | volume
+---------+--------
+      10 |   1000
+      30 |    300
+(2 rows)
+
+--The Sale table, which records the amount we sold today for each item.
+CREATE TABLE Sale(item_id int, volume int);
+INSERT INTO Sale VALUES (10, 2200);
+INSERT INTO Sale VALUES (20, 1000);
+SELECT * FROM Sale;
+ item_id | volume
+---------+--------
+      10 |   2200
+      20 |   1000
+(2 rows)
+
+--
+-- initial queries
+--
+-- do a simple equivalent of an UPDATE join
+BEGIN;
+MERGE INTO Stock USING Buy ON Stock.item_id = Buy.item_id
+ WHEN MATCHED THEN UPDATE SET balance = balance + Buy.volume
+;
+SELECT * FROM Stock ORDER BY item_id;
+ item_id | balance
+---------+---------
+      10 |    3200
+      20 |    1900
+(2 rows)
+
+ROLLBACK;
+-- do a simple equivalent of an INSERT SELECT
+BEGIN;
+MERGE INTO Stock USING Buy ON Stock.item_id = Buy.item_id
+ WHEN NOT MATCHED THEN INSERT VALUES (Buy.item_id, Buy.volume)
+;
+SELECT * FROM Stock ORDER BY item_id;
+ item_id | balance
+---------+---------
+      10 |    2200
+      20 |    1900
+      30 |     300
+(3 rows)
+
+ROLLBACK;
+-- now the classic UPSERT
+BEGIN;
+MERGE INTO Stock USING Buy ON Stock.item_id = Buy.item_id
+ WHEN MATCHED THEN UPDATE SET balance = balance + Buy.volume
+ WHEN NOT MATCHED THEN INSERT VALUES (Buy.item_id, Buy.volume)
+;
+SELECT * FROM Stock ORDER BY item_id;
+ item_id | balance
+---------+---------
+      10 |    3200
+      20 |    1900
+      30 |     300
+(3 rows)
+
+ROLLBACK;
+--
+-- Non-standard functionality
+--
+-- a MERGE with DELETE action, which is not allowed in Standard.
+-- Extra qualifications are allowed in each WHEN clause.
+BEGIN;
+MERGE INTO Stock USING Sale ON Stock.item_id = Sale.item_id
+   WHEN MATCHED AND balance - volume > 0 THEN UPDATE SET balance = balance - volume
+   WHEN MATCHED THEN DELETE;
+;
+SELECT * FROM Stock ORDER BY item_id;
+ item_id | balance
+---------+---------
+      20 |     900
+(1 row)
+
+ROLLBACK;
+-- The DO NOTHING action is another extension for MERGE.
+-- rows specified by DO NOTHING are ignored
+BEGIN;
+MERGE INTO Stock USING Sale ON Stock.item_id = Sale.item_id
+  WHEN MATCHED AND balance - volume > 0 THEN DO NOTHING
+  WHEN MATCHED THEN DELETE
+;
+SELECT * FROM Stock ORDER BY item_id;
+ item_id | balance
+---------+---------
+      20 |    1900
+(1 row)
+
+ROLLBACK;
+-- DO NOTHING is the default action for non-matching rows that do not
+-- activate any WHEN clause. They are just ignored
+BEGIN;
+MERGE INTO Stock USING Sale ON Stock.item_id = Sale.item_id
+   WHEN MATCHED AND balance - volume > 100000 THEN UPDATE SET balance = balance - volume
+;
+SELECT * FROM Stock ORDER BY item_id;
+ item_id | balance
+---------+---------
+      10 |    2200
+      20 |    1900
+(2 rows)
+
+ROLLBACK;
+-- Prepare the test data to generate multiple matching rows for a single target
+INSERT INTO Buy values(10, 400);
+SELECT * FROM Buy ORDER BY item_id;
+ item_id | volume
+---------+--------
+      10 |   1000
+      10 |    400
+      30 |    300
+(3 rows)
+
+-- we now have a duplicate key in Buy, so when we join to
+-- Stock we will generate 2 matching rows, not one.
+-- According to standard this command should fail.
+-- But it succeeds in PostgreSQL implementation by simply ignoring the second
+BEGIN;
+MERGE INTO Stock USING Buy ON Stock.item_id = Buy.item_id
+ WHEN MATCHED THEN UPDATE SET balance = balance + Buy.volume
+;
+SELECT * FROM Stock ORDER BY item_id;
+ item_id | balance
+---------+---------
+      10 |    3200
+      20 |    1900
+(2 rows)
+
+ROLLBACK;
+--
+-- More complicated query
+--
+-- The source table of MERGE could be a SELECT clause
+BEGIN;
+MERGE INTO Stock USING
+        (SELECT Buy.item_id, (Buy.volume - Sale.volume) as v
+        FROM Buy, Sale
+        WHERE Buy.item_id = Sale.item_id)
+            AS BS
+    ON Stock.item_id = BS.item_id
+ WHEN MATCHED THEN UPDATE SET balance = balance + BS.v
+;
+SELECT * FROM Stock ORDER BY item_id;
+ item_id | balance
+---------+---------
+      10 |    1000
+      20 |    1900
+(2 rows)
+
+ROLLBACK;
+-- Subplan/sublinks can be used in MERGE actions
+-- Create a table for sublink.
+CREATE TABLE Extra (item_id int, volume int);
+INSERT INTO Extra VALUES (10, 20);
+INSERT INTO Extra VALUES (10, -7);
+INSERT INTO Extra VALUES (20, 16);
+INSERT INTO Extra VALUES (20, 5);
+INSERT INTO Extra VALUES (30, 9);
+-- The following query sum-up the volumes in Extra table and upinsert the Stock.
+BEGIN;
+MERGE INTO Stock USING Buy ON Stock.item_id = Buy.item_id
+ WHEN MATCHED THEN UPDATE SET
+      balance = balance + Buy.volume +
+          (SELECT sum(volume) FROM Extra WHERE Extra.item_id = Buy.item_id)
+ WHEN NOT MATCHED THEN INSERT VALUES
+     (Buy.item_id, Buy.volume +
+         (SELECT sum(volume) FROM Extra WHERE Extra.item_id = Buy.item_id))
+;
+SELECT * FROM Stock ORDER BY item_id;
+ item_id | balance
+---------+---------
+      10 |    3213
+      20 |    1900
+      30 |     309
+(3 rows)
+
+ROLLBACK;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 3b99e86..9c2d298 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -91,7 +91,7 @@ test: select_views portals_p2 foreign_key cluster dependency guc bitmapops combo
 # NB: temp.sql does a reconnect which transiently uses 2 connections,
 # so keep this parallel group to at most 19 tests
 # ----------
-test: plancache limit plpgsql copy2 temp domain rangefuncs prepare without_oid conversion truncate alter_table
sequencepolymorphism rowtypes returning largeobject with xml 
+test: plancache limit plpgsql copy2 temp domain rangefuncs prepare without_oid conversion truncate alter_table
sequencepolymorphism rowtypes returning largeobject with xml merge 

 # run stats by itself because its delay may be insufficient under heavy load
 test: stats
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index b348f0e..fde54d3 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -124,4 +124,5 @@ test: returning
 test: largeobject
 test: with
 test: xml
+test: merge
 test: stats
diff --git a/src/test/regress/sql/merge.sql b/src/test/regress/sql/merge.sql
new file mode 100644
index 0000000..5cdb5db
--- /dev/null
+++ b/src/test/regress/sql/merge.sql
@@ -0,0 +1,135 @@
+-- MERGE --
+
+--
+-- create basic tables for test, and insert some source rows to work from
+--
+--The Stock table, which records the amount of each item on hand.
+CREATE TABLE Stock(item_id int UNIQUE, balance int);
+INSERT INTO Stock VALUES (10, 2200);
+INSERT INTO Stock VALUES (20, 1900);
+SELECT * FROM Stock;
+
+--The Buy table, which records the amount we bought today for each item.
+CREATE TABLE Buy(item_id int, volume int);
+INSERT INTO Buy values(10, 1000);
+INSERT INTO Buy values(30, 300);
+SELECT * FROM Buy;
+
+--The Sale table, which records the amount we sold today for each item.
+CREATE TABLE Sale(item_id int, volume int);
+INSERT INTO Sale VALUES (10, 2200);
+INSERT INTO Sale VALUES (20, 1000);
+SELECT * FROM Sale;
+
+--
+-- initial queries
+--
+-- do a simple equivalent of an UPDATE join
+BEGIN;
+MERGE INTO Stock USING Buy ON Stock.item_id = Buy.item_id
+ WHEN MATCHED THEN UPDATE SET balance = balance + Buy.volume
+;
+SELECT * FROM Stock ORDER BY item_id;
+ROLLBACK;
+
+-- do a simple equivalent of an INSERT SELECT
+BEGIN;
+MERGE INTO Stock USING Buy ON Stock.item_id = Buy.item_id
+ WHEN NOT MATCHED THEN INSERT VALUES (Buy.item_id, Buy.volume)
+;
+SELECT * FROM Stock ORDER BY item_id;
+ROLLBACK;
+
+-- now the classic UPSERT
+BEGIN;
+MERGE INTO Stock USING Buy ON Stock.item_id = Buy.item_id
+ WHEN MATCHED THEN UPDATE SET balance = balance + Buy.volume
+ WHEN NOT MATCHED THEN INSERT VALUES (Buy.item_id, Buy.volume)
+;
+SELECT * FROM Stock ORDER BY item_id;
+ROLLBACK;
+
+--
+-- Non-standard functionality
+--
+-- a MERGE with DELETE action, which is not allowed in Standard.
+-- Extra qualifications are allowed in each WHEN clause.
+BEGIN;
+MERGE INTO Stock USING Sale ON Stock.item_id = Sale.item_id
+   WHEN MATCHED AND balance - volume > 0 THEN UPDATE SET balance = balance - volume
+   WHEN MATCHED THEN DELETE;
+;
+SELECT * FROM Stock ORDER BY item_id;
+ROLLBACK;
+
+-- The DO NOTHING action is another extension for MERGE.
+-- rows specified by DO NOTHING are ignored
+BEGIN;
+MERGE INTO Stock USING Sale ON Stock.item_id = Sale.item_id
+  WHEN MATCHED AND balance - volume > 0 THEN DO NOTHING
+  WHEN MATCHED THEN DELETE
+;
+SELECT * FROM Stock ORDER BY item_id;
+ROLLBACK;
+
+-- DO NOTHING is the default action for non-matching rows that do not
+-- activate any WHEN clause. They are just ignored
+BEGIN;
+MERGE INTO Stock USING Sale ON Stock.item_id = Sale.item_id
+   WHEN MATCHED AND balance - volume > 100000 THEN UPDATE SET balance = balance - volume
+;
+SELECT * FROM Stock ORDER BY item_id;
+ROLLBACK;
+
+-- Prepare the test data to generate multiple matching rows for a single target
+INSERT INTO Buy values(10, 400);
+SELECT * FROM Buy ORDER BY item_id;
+
+-- we now have a duplicate key in Buy, so when we join to
+-- Stock we will generate 2 matching rows, not one.
+-- According to standard this command should fail.
+-- But it succeeds in PostgreSQL implementation by simply ignoring the second
+BEGIN;
+MERGE INTO Stock USING Buy ON Stock.item_id = Buy.item_id
+ WHEN MATCHED THEN UPDATE SET balance = balance + Buy.volume
+;
+SELECT * FROM Stock ORDER BY item_id;
+ROLLBACK;
+
+--
+-- More complicated query
+--
+-- The source table of MERGE could be a SELECT clause
+BEGIN;
+MERGE INTO Stock USING
+        (SELECT Buy.item_id, (Buy.volume - Sale.volume) as v
+        FROM Buy, Sale
+        WHERE Buy.item_id = Sale.item_id)
+            AS BS
+    ON Stock.item_id = BS.item_id
+ WHEN MATCHED THEN UPDATE SET balance = balance + BS.v
+;
+SELECT * FROM Stock ORDER BY item_id;
+ROLLBACK;
+
+-- Subplan/sublinks can be used in MERGE actions
+-- Create a table for sublink.
+CREATE TABLE Extra (item_id int, volume int);
+INSERT INTO Extra VALUES (10, 20);
+INSERT INTO Extra VALUES (10, -7);
+INSERT INTO Extra VALUES (20, 16);
+INSERT INTO Extra VALUES (20, 5);
+INSERT INTO Extra VALUES (30, 9);
+
+-- The following query sum-up the volumes in Extra table and upinsert the Stock.
+BEGIN;
+MERGE INTO Stock USING Buy ON Stock.item_id = Buy.item_id
+ WHEN MATCHED THEN UPDATE SET
+      balance = balance + Buy.volume +
+          (SELECT sum(volume) FROM Extra WHERE Extra.item_id = Buy.item_id)
+ WHEN NOT MATCHED THEN INSERT VALUES
+     (Buy.item_id, Buy.volume +
+         (SELECT sum(volume) FROM Extra WHERE Extra.item_id = Buy.item_id))
+;
+SELECT * FROM Stock ORDER BY item_id;
+ROLLBACK;

Re: ask for review of MERGE

От

Robert Haas

Дата:

18 октября 2010 г., 10:54:53

I think that MERGE is supposed to trigger one rule for each row in the
source data.  So:

On Sun, Oct 17, 2010 at 8:20 PM, Greg Smith <greg@2ndquadrant.com> wrote:
> MERGE INTO Stock t
>  USING (SELECT * FROM Stock WHERE item_id=10) AS s
>  ON s.item_id=t.item_id
>  WHEN MATCHED THEN UPDATE SET balance=s.balance + 1
>  WHEN NOT MATCHED THEN INSERT VALUES (10,1)
>  ;
>
> This works fine, and updates the matching row:
>
> item_id | balance
> ---------+---------
>     20 |    1900
>     10 |    2201

Here you have one row of source data, and you got one action (the WHEN
MATCHED case).

> But if I give it a key that doesn't exist instead:
>
> MERGE INTO Stock t
>  USING (SELECT * FROM Stock WHERE item_id=30) AS s
>  ON s.item_id=t.item_id
>  WHEN MATCHED THEN UPDATE SET balance=s.balance + 1
>  WHEN NOT MATCHED THEN INSERT VALUES (30,1)
>  ;
>
> This doesn't execute the NOT MATCHED case and INSERT the way I expected it
> to.  It just gives back "MERGE 0".

Here you have no rows of source data (the USING (SELECT ...) doesn't
return anything, since no rows exist) so nothing happens.

> Since I wasn't sure if the whole "subquery in the USING clause" case was
> really implemented fully, I then tried to do this with something more like
> the working regression test examples.  I expected this to do the same thing
> as the first example:
>
> MERGE INTO Stock t
>  USING Stock s
>  ON s.item_id=10 AND s.item_id=t.item_id
>  WHEN MATCHED THEN UPDATE SET balance=s.balance + 1
>  WHEN NOT MATCHED THEN INSERT VALUES (10,1)
>  ;
>
> But it gives back this:
>
> ERROR:  duplicate key value violates unique constraint "stock_item_id_key"
> DETAIL:  Key (item_id)=(10) already exists.

Here you have two rows of source data.  The ON clause represents the
join condition.  The item_id=10 row matches - so you get an update,
presumably, though we can't see that as things turn out - and the
item_id=20 row doesn't match - so you try to insert (10, 1), which is
a duplicate key, thus the error.

> Can't tell from that whether it's hitting the MATCHED or NOT MATCHED side of
> things to generate that.  But it doesn't work any better if you give it an
> example that doesn't exist:
>
> MERGE INTO Stock t
>  USING Stock s
>  ON s.item_id=30 AND s.item_id=t.item_id
>  WHEN MATCHED THEN UPDATE SET balance=s.balance + 1
>  WHEN NOT MATCHED THEN INSERT VALUES (30,1)
>  ;
>
> ERROR:  duplicate key value violates unique constraint "stock_item_id_key"
> DETAIL:  Key (item_id)=(30) already exists.

In this case neither row of the source data matches the join condition
(s.item_id=30 might as well be constant FALSE as far as the test data
is concerned) so you attempt to execute the NOT MATCHED side twice.
So this one also looks correct to me.

> The other thing I noticed that may take some work to sort out is that I
> haven't had any luck getting MERGE to execute from within a plpgsql
> function.  I was hoping I could use this to update the pgbench tables:

Good catch.  Considering the size of this patch, I have no problem
leaving this to the eventual committer to fix, or to a subsequent
commit.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: ask for review of MERGE

От

Boxuan Zhai

Дата:

18 октября 2010 г., 11:09:41

On Mon, Oct 18, 2010 at 9:54 PM, Robert Haas <robertmhaas@gmail.com> wrote:

I think that MERGE is supposed to trigger one rule for each row in the
source data. So:

On Sun, Oct 17, 2010 at 8:20 PM, Greg Smith <greg@2ndquadrant.com> wrote:
> MERGE INTO Stock t
> USING (SELECT * FROM Stock WHERE item_id=10) AS s
> ON s.item_id=t.item_id
> WHEN MATCHED THEN UPDATE SET balance=s.balance + 1
> WHEN NOT MATCHED THEN INSERT VALUES (10,1)
> ;
>
> This works fine, and updates the matching row:
>
> item_id | balance
> ---------+---------
> 20 | 1900
> 10 | 2201

Here you have one row of source data, and you got one action (the WHEN
MATCHED case).

> But if I give it a key that doesn't exist instead:
>
> MERGE INTO Stock t
> USING (SELECT * FROM Stock WHERE item_id=30) AS s
> ON s.item_id=t.item_id
> WHEN MATCHED THEN UPDATE SET balance=s.balance + 1
> WHEN NOT MATCHED THEN INSERT VALUES (30,1)
> ;
>
> This doesn't execute the NOT MATCHED case and INSERT the way I expected it
> to. It just gives back "MERGE 0".

Here you have no rows of source data (the USING (SELECT ...) doesn't
return anything, since no rows exist) so nothing happens.

Yes.

The MERGE process is based on a left join between the source table and target table.

Since here the source table is empty, no join is carried, and thus no MERGE action is taken.

But, is it correct logically? I mean, should we insert some rows in the above example rather than do nothing?

> Since I wasn't sure if the whole "subquery in the USING clause" case was
> really implemented fully, I then tried to do this with something more like
> the working regression test examples. I expected this to do the same thing
> as the first example:
>
> MERGE INTO Stock t
> USING Stock s
> ON s.item_id=10 AND s.item_id=t.item_id
> WHEN MATCHED THEN UPDATE SET balance=s.balance + 1
> WHEN NOT MATCHED THEN INSERT VALUES (10,1)
> ;
>
> But it gives back this:
>
> ERROR: duplicate key value violates unique constraint "stock_item_id_key"
> DETAIL: Key (item_id)=(10) already exists.

Here you have two rows of source data. The ON clause represents the
join condition. The item_id=10 row matches - so you get an update,
presumably, though we can't see that as things turn out - and the
item_id=20 row doesn't match - so you try to insert (10, 1), which is
a duplicate key, thus the error.

> Can't tell from that whether it's hitting the MATCHED or NOT MATCHED side of
> things to generate that. But it doesn't work any better if you give it an
> example that doesn't exist:
>
> MERGE INTO Stock t
> USING Stock s
> ON s.item_id=30 AND s.item_id=t.item_id
> WHEN MATCHED THEN UPDATE SET balance=s.balance + 1
> WHEN NOT MATCHED THEN INSERT VALUES (30,1)
> ;
>
> ERROR: duplicate key value violates unique constraint "stock_item_id_key"
> DETAIL: Key (item_id)=(30) already exists.

In this case neither row of the source data matches the join condition
(s.item_id=30 might as well be constant FALSE as far as the test data
is concerned) so you attempt to execute the NOT MATCHED side twice.
So this one also looks correct to me.

Yes, that is what happened in the above two examples.

> The other thing I noticed that may take some work to sort out is that I
> haven't had any luck getting MERGE to execute from within a plpgsql
> function. I was hoping I could use this to update the pgbench tables:

Good catch. Considering the size of this patch, I have no problem
leaving this to the eventual committer to fix, or to a subsequent
commit.

--

Robert Haas
EnterpriseDB: http://www.enterprisedb.com

The Enterprise PostgreSQL Company

Re: ask for review of MERGE

От

Robert Haas

Дата:

18 октября 2010 г., 11:17:46

On Mon, Oct 18, 2010 at 10:09 AM, Boxuan Zhai <bxzhai2010@gmail.com> wrote:
>
>
> On Mon, Oct 18, 2010 at 9:54 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>
>> I think that MERGE is supposed to trigger one rule for each row in the
>> source data.  So:
>>
>> On Sun, Oct 17, 2010 at 8:20 PM, Greg Smith <greg@2ndquadrant.com> wrote:
>> > MERGE INTO Stock t
>> >  USING (SELECT * FROM Stock WHERE item_id=10) AS s
>> >  ON s.item_id=t.item_id
>> >  WHEN MATCHED THEN UPDATE SET balance=s.balance + 1
>> >  WHEN NOT MATCHED THEN INSERT VALUES (10,1)
>> >  ;
>> >
>> > This works fine, and updates the matching row:
>> >
>> > item_id | balance
>> > ---------+---------
>> >     20 |    1900
>> >     10 |    2201
>>
>> Here you have one row of source data, and you got one action (the WHEN
>> MATCHED case).
>>
>> > But if I give it a key that doesn't exist instead:
>> >
>> > MERGE INTO Stock t
>> >  USING (SELECT * FROM Stock WHERE item_id=30) AS s
>> >  ON s.item_id=t.item_id
>> >  WHEN MATCHED THEN UPDATE SET balance=s.balance + 1
>> >  WHEN NOT MATCHED THEN INSERT VALUES (30,1)
>> >  ;
>> >
>> > This doesn't execute the NOT MATCHED case and INSERT the way I expected
>> > it
>> > to.  It just gives back "MERGE 0".
>>
>> Here you have no rows of source data (the USING (SELECT ...) doesn't
>> return anything, since no rows exist) so nothing happens.
>>
>
> Yes.
> The MERGE process is based on a left join between the source table and
> target table.
> Since here the source table is empty, no join is carried, and thus no MERGE
> action is taken.
> But, is it correct logically? I mean, should we insert some rows in the
> above example rather  than do nothing?

I don't think so.  I think the right way to write UPSERT is something
along the lines of:

MERGE INTO Stock t USING (VALUES (10, 1)) s(item_id, balance) ON
s.item_id = t.item_id ...

(untested)

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: ask for review of MERGE

От

Boxuan Zhai

Дата:

19 октября 2010 г., 22:20:22

Hi,

I considered the empty source table situation again. I think it is correct to upsert nothing in this case.

Back to the original logic of MERGE command, it is main purpose is to add the supplementary data from the source table into the target table. Thus, an empty source table means no input data is available, so no upsert is needed in target table.

Regards,

Boxuan

Re: ask for review of MERGE

От

Greg Smith

Дата:

21 октября 2010 г., 15:36:31

Robert Haas wrote:
> I think the right way to write UPSERT is something
> along the lines of:
>
> MERGE INTO Stock t USING (VALUES (10, 1)) s(item_id, balance) ON
> s.item_id = t.item_id ...
>   

That led in the right direction, after a bit more fiddling I was finally 
able to get something that does what I wanted:  a single table UPSERT 
implemented with this MERGE implementation.  Here's a log of a test 
session, suitable for eventual inclusion in the regression tests:

CREATE TABLE Stock(item_id int UNIQUE, balance int);
INSERT INTO Stock VALUES (10, 2200);
INSERT INTO Stock VALUES (20, 1900);
SELECT * FROM Stock ORDER BY item_id;
item_id | balance
---------+---------     10 |    2200     20 |    1900

MERGE INTO Stock tUSING (VALUES(10,100)) AS s(item_id,balance)ON s.item_id=t.item_idWHEN MATCHED THEN UPDATE SET
balance=t.balance+ s.balanceWHEN NOT MATCHED THEN INSERT VALUES(s.item_id,s.balance);
 

MERGE 1

SELECT * FROM Stock ORDER BY item_id;item_id | balance
---------+---------     10 |    2300     20 |    1900

MERGE INTO Stock tUSING (VALUES(30,2000)) AS s(item_id,balance)ON s.item_id=t.item_idWHEN MATCHED THEN UPDATE SET
balance=t.balance+ s.balanceWHEN NOT MATCHED THEN INSERT VALUES(s.item_id,s.balance);
 

MERGE 1
SELECT * FROM Stock ORDER BY item_id;item_id | balance
---------+---------     10 |    2300     20 |    1900     30 |    2000

I'm still a little uncertain as to whether any of my other examples 
should have worked under the spec but just didn't work here, but I'll 
worry about that later.

Here's what the query plan looks like on a MATCH:
Merge  (cost=0.00..8.29 rows=1 width=22) (actual time=0.166..0.166 
rows=0 loops=1)  Action 1: Update When Matched  Action 2: Insert When Not Mactched  MainPlan:  ->  Nested Loop Left
Join (cost=0.00..8.29 rows=1 width=22) (actual 
 
time=0.050..0.061 rows=1 loops=1)        ->  Values Scan on "*VALUES*"  (cost=0.00..0.01 rows=1 width=8) 
(actual time=0.009..0.010 rows=1 loops=1)        ->  Index Scan using stock_item_id_key on stock t  
(cost=0.00..8.27 rows=1 width=14) (actual time=0.026..0.030 rows=1 loops=1)              Index Cond:
("*VALUES*".column1= item_id)Total runtime: 0.370 ms
 


And here's a miss:
Merge  (cost=0.00..8.29 rows=1 width=22) (actual time=0.145..0.145 
rows=0 loops=1)  Action 1: Update When Matched  Action 2: Insert When Not Mactched  MainPlan:  ->  Nested Loop Left
Join (cost=0.00..8.29 rows=1 width=22) (actual 
 
time=0.028..0.033 rows=1 loops=1)        ->  Values Scan on "*VALUES*"  (cost=0.00..0.01 rows=1 width=8) 
(actual time=0.004..0.005 rows=1 loops=1)        ->  Index Scan using stock_item_id_key on stock t  
(cost=0.00..8.27 rows=1 width=14) (actual time=0.015..0.015 rows=0 loops=1)              Index Cond:
("*VALUES*".column1= item_id)Total runtime: 0.255 ms
 

Next steps here:
1) Performance/concurrency tests against trigger-based UPSERT approach.
2) Finish bit rot cleanup against HEAD.
3) Work out more complicated test cases to try and fine more unexpected 
behavior edge cases and general bugs.

-- 
Greg Smith, 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
PostgreSQL Training, Services and Support  www.2ndQuadrant.us

Re: ask for review of MERGE

От

Stefan Kaltenbrunner

Дата:

22 октября 2010 г., 12:35:55

On 10/21/2010 08:36 PM, Greg Smith wrote:
> Robert Haas wrote:
>> I think the right way to write UPSERT is something
>> along the lines of:
>>
>> MERGE INTO Stock t USING (VALUES (10, 1)) s(item_id, balance) ON
>> s.item_id = t.item_id ...

[...]

> Here's what the query plan looks like on a MATCH:
>
> Merge (cost=0.00..8.29 rows=1 width=22) (actual time=0.166..0.166 rows=0
> loops=1)
> Action 1: Update When Matched
> Action 2: Insert When Not Mactched

"Mactched"? - is this a c&p error or the actual output of EXPLAIN? :)


lg


Stefan

Re: ask for review of MERGE

От

Greg Smith

Дата:

23 октября 2010 г., 02:35:14

There are now two branches of MERGE code in review progress available.
http://github.com/greg2ndQuadrant/postgres/tree/merge-unstable has the
bit-rotted version that doesn't quite work against HEAD yet, while
http://github.com/greg2ndQuadrant/postgres/tree/merge is what I'm still
testing against until I get that sorted out.

Attached is a tar file containing an initial concurrent MERGE test
case.  What it does is create is a pgbench database with a scale of 1.
Then it runs a custom test that does an UPSERT using MERGE against
pgbench_accounts, telling pgbench the database scale is actually 2.
This means that approximately half of the statements executed will hit
the MATCHED side of the merge and update existing rows, while half will
hit the NOT MATCHED one and create new account records.  Did some small
tests using pgbench's debug mode where you see all the statements it
executes, and all the output looked correct.  Successive tests runs are
not deterministically perfect performance comparisons, but with enough
transactions I hope that averages out.

For comparison sake, there's an almost identical test case that does the
same thing using the pl/pgsql example function from the PostgreSQL
manual for the UPSERT instead also in there.  You just run test-merge.sh
or test-function.sh and it runs the whole test, presuming you built and
installed pgbench and don't mind that it will blow away any database
named pgbench you already have.

Since the current MERGE implementation is known not to be optimized for
concurrency, my main goal here wasn't to see how fast it was.  That
number is interesting though.  With the sole postgresql.conf change of:

checkpoint_settings=32

And a debug/assertion build using 8 concurrent clients, I got 1607 TPS
of UPSERT out of the trigger approach @ 6MB/s of writes to disk, while
the MERGE one delivered 988 TPS @ 4.5MB/s of writes.  Will explore this
more later.

This did seem to find a bug in the implementation after running for a while:

TRAP: FailedAssertion("!(epqstate->origslot != ((void *)0))", File:
"execMain.c", Line: 1732)

Line number there is relative to what you can see at
http://github.com/greg2ndQuadrant/postgres/blob/merge/src/backend/executor/execMain.c
and that's a null pointer check at the beginning of
EvalPlanQualFetchRowMarks.  Haven't explored why this happened or how
repeatable this is, but since Boxuan was looking for some bugs to chase
I wanted to deliver him one to chew on.

While the performance doesn't need to be great in V1, there needs to be
at least some minimal protection against concurrency issues before this
is commit quality.  Will continue to shake this code out looking for
them now that I have some basic testing that works for doing so.

--
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services and Support        www.2ndQuadrant.us

Вложения

merge-test-v1.tar

Re: ask for review of MERGE

От

Marko Tiikkaja

Дата:

23 октября 2010 г., 09:50:18

On 2010-10-23 8:34 AM +0300, Greg Smith wrote:
> While the performance doesn't need to be great in V1, there needs to be
> at least some minimal protection against concurrency issues before this
> is commit quality.

What's been bothering me is that so far there has not been an agreement 
on whether we need to protect against concurrency issues or not.  In 
fact, there has been almost no discussion about the concurrency issues 
which AIUI have been the biggest single reason we don't have MERGE 
already.  Right now, this patch fails in even the simplest scenario:

=# create table foo(a int unique);
NOTICE:  CREATE TABLE / UNIQUE will create implicit index "foo_a_key" 
for table "foo"
CREATE TABLE

T1=# begin;
BEGIN
T1=# merge into foo using (values (1)) s(i) on s.i = foo.a when matched 
then update set a=a+1 when not matched then insert values (s.i);
MERGE 1

T2=# merge into foo using (values (1)) s(i) on s.i = foo.a when matched 
then update set a=a+1 when not matched then insert values (s.i);
-- blocks

T1=# commit;
COMMIT

.. and T2 gets a unique constraint violation.

As far as I'm concerned, the reason people want MERGE is to avoid these 
problems; the nicer syntax is just a bonus.  Having to LOCK the target 
table makes this feature a lot less useful, even though there are a few 
use cases where locking the table would be OK.

Regards,
Marko Tiikkaja

Re: ask for review of MERGE

От

Greg Smith

Дата:

23 октября 2010 г., 11:31:17

Marko Tiikkaja wrote:
> What's been bothering me is that so far there has not been an 
> agreement on whether we need to protect against concurrency issues or 
> not.  In fact, there has been almost no discussion about the 
> concurrency issues which AIUI have been the biggest single reason we 
> don't have MERGE already.

Sure there were, they just happened a long time ago.  I tried to 
summarize the previous discussion at 
http://wiki.postgresql.org/wiki/SQL_MERGE with the following:

"PostgreSQL doesn't have a good way to lock access to a key value that 
doesn't exist yet--what other databases call key range 
locking...Improvements to the index implementation are needed to allow 
this feature."

You can sift through the other archive links in there, the discussion 
leading up to that conclusion is in there somewhere.  And we had a 
discussion of this at the last developer's meeting where this was 
identified as a key issue to sort out before this was done; see the 
table at the end of 
http://wiki.postgresql.org/wiki/PgCon_2010_Developer_Meeting and note 
the warning associated with this feature.

The deadlock on developing this feature has been that there's little 
benefit to building key range locking on its own, or even a good way to 
prove that it works, without a visible feature that uses it.  It's much 
easier to justify development time on if the rest of MERGE is known to 
work, and just needs that plugged in to improve concurrency.  And since 
Boxuan appeared at the right time to do the syntax and implementation 
part first as well, that's the order it's ended up happening in.  We're 
only making the concurrency part a second priority right now in order to 
break the development into managable chunks.

-- 
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services and Support        www.2ndQuadrant.us

Re: ask for review of MERGE

От

Tom Lane

Дата:

23 октября 2010 г., 13:14:15

Greg Smith <greg@2ndquadrant.com> writes:
> Marko Tiikkaja wrote:
>> What's been bothering me is that so far there has not been an 
>> agreement on whether we need to protect against concurrency issues or 
>> not.  In fact, there has been almost no discussion about the 
>> concurrency issues which AIUI have been the biggest single reason we 
>> don't have MERGE already.

> Sure there were, they just happened a long time ago.  I tried to 
> summarize the previous discussion at 
> http://wiki.postgresql.org/wiki/SQL_MERGE with the following:

> "PostgreSQL doesn't have a good way to lock access to a key value that 
> doesn't exist yet--what other databases call key range 
> locking...Improvements to the index implementation are needed to allow 
> this feature."

This seems to be presuming there's a consensus that that's the way to
implement it, which is news to me.  Why wouldn't we try the INSERT
first, and if we get a unique-key failure, do UPDATE instead?  I am not
at all in agreement that we ought to build key range locking, nor that
it'd be particularly helpful for this problem even if we had it.
        regards, tom lane

Re: ask for review of MERGE

От

Boxuan Zhai

Дата:

23 октября 2010 г., 17:30:19

This did seem to find a bug in the implementation after running for a while:

TRAP: FailedAssertion("!(epqstate->origslot != ((void *)0))", File: "execMain.c", Line: 1732)

Line number there is relative to what you can see at http://github.com/greg2ndQuadrant/postgres/blob/merge/src/backend/executor/execMain.c
and that's a null pointer check at the beginning of EvalPlanQualFetchRowMarks. Haven't explored why this happened or how repeatable this is, but since Boxuan was looking for some bugs to chase I wanted to deliver him one to chew on.

I am not sure how this bug comes. I will try to find out the reason for it.

Re: ask for review of MERGE

От

Greg Stark

Дата:

23 октября 2010 г., 19:59:44

On Sat, Oct 23, 2010 at 9:12 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> "PostgreSQL doesn't have a good way to lock access to a key value that
>> doesn't exist yet--what other databases call key range
>> locking...Improvements to the index implementation are needed to allow
>> this feature."
>
> This seems to be presuming there's a consensus that that's the way to
> implement it, which is news to me.  Why wouldn't we try the INSERT
> first, and if we get a unique-key failure, do UPDATE instead?  I am not
> at all in agreement that we ought to build key range locking, nor that
> it'd be particularly helpful for this problem even if we had it.

Except we *do* have range locking for the special case of inserting
equal values into a unique btree index. In that case the btree insert
locks the leaf page containing the conflicting key value --
effectively locking out anyone else from inserting the same key value.That lock is held until the index entry pointing
tothe newly 
inserted value is finished. That's what prevents two inserters from
inserting the same key value.

So the trick for MERGE on equality would be to refactor the btree api
so that you could find the matching leaf page and *keep* that page
locked. Then do an update against the conflicting row found there if
any without ever releasing that page lock. Someone else can come along
and delete the row (or have already deleted it) before the update
locks it but if they do you can insert your row without worrying about
anyone else inserting a conflicting entry since you're still holding a
lock on the page they'll need to insert the btree entry on and have
been holding it continuously for the whole operation.

I'm not sure how the new exclusion constraints stuff affects this. I
think they would work the same way except instead of holding a page
lock on the leaf page they would store the value being edited in the
in-memory array -- effectively another form of key range locking.

I think the blocker with MERGE has always been that the standard is
*so* general that it could apply to conditions that *aren't* covered
by a unique constraint or exclusion constriant. Personally, I'm fine
saying that those cases will perform poorly, locking the table, or
aren't allowed and print a warning explaining exactly what exclusion
constraint or btree unique index would be necessary to allow it. I
think virtually every case where people will want to use it will be on
a primary key anyways.

--
greg

Re: ask for review of MERGE

От

Josh Berkus

Дата:

23 октября 2010 г., 20:03:57

> So the trick for MERGE on equality would be to refactor the btree api
> so that you could find the matching leaf page and *keep* that page
> locked. Then do an update against the conflicting row found there if
> any without ever releasing that page lock. Someone else can come along
> and delete the row (or have already deleted it) before the update
> locks it but if they do you can insert your row without worrying about
> anyone else inserting a conflicting entry since you're still holding a
> lock on the page they'll need to insert the btree entry on and have
> been holding it continuously for the whole operation.

I think that such a lock would also be useful for improving the FK 
deadlock issues we have.

--                                   -- Josh Berkus                                     PostgreSQL Experts Inc.
                           http://www.pgexperts.com

Re: ask for review of MERGE

От

Greg Stark

Дата:

24 октября 2010 г., 04:32:42

On Sat, Oct 23, 2010 at 4:03 PM, Josh Berkus <josh@agliodbs.com> wrote:
> I think that such a lock would also be useful for improving the FK deadlock
> issues we have.

I don't see how. I think the problem you're referring to occurs when
different plans update rows in different orders and the resulting
locks on foreign key targets are taken in different orders. In which
case the problem isn't that we're unable to lock the resources --
they're locked using regular row locks -- but rather that there's
nothing controlling the ordering of the locks.

I don't think it would be acceptable to hold low level btree page
locks across multiple independent row operations on different rows and
I don't see how they would be any better than the row locks we have
now. Worse, the resulting deadlock would no longer be detectable (one
of the reasons it wouldn't be acceptable to hold the lock for so
long).

That does point out a problem with the logic I sketched. If you go to
do an update and find there's an uncommitted update pending you have
to wait on it. You can't do that while holding the index page lock. I
assume then you release it and wait on the uncommitted transaction and
when it's finished you start over doing the btree lookup and
reacquiring the lock. I haven't thought it through in detail though.

-- 
greg

Re: ask for review of MERGE

От

Martijn van Oosterhout

Дата:

24 октября 2010 г., 06:50:40

On Sat, Oct 23, 2010 at 03:59:13PM -0700, Greg Stark wrote:
> I think the blocker with MERGE has always been that the standard is
> *so* general that it could apply to conditions that *aren't* covered
> by a unique constraint or exclusion constriant. Personally, I'm fine
> saying that those cases will perform poorly, locking the table, or
> aren't allowed and print a warning explaining exactly what exclusion
> constraint or btree unique index would be necessary to allow it. I
> think virtually every case where people will want to use it will be on
> a primary key anyways.

Can we please not get MERGE hung up on trying to make it atomic. The
standard requires no such thing and there are plenty of uses of MERGE
that don't involve high concurrency updates of the same row by
different processes. If we want an atomic UPSERT, make that a seperate
command, MERGE without atomicity is useful enough already.

Simply, if the row is visible in the current snapshot you do an update
otherwise an insert. If you get an error, raise it. Icing can be added
later.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patriotism is when love of your own people comes first; nationalism,
> when hate for people other than your own comes first.
>                                       - Charles de Gaulle

Re: ask for review of MERGE

От

Robert Haas

Дата:

24 октября 2010 г., 09:05:46

On Sun, Oct 24, 2010 at 5:50 AM, Martijn van Oosterhout
<kleptog@svana.org> wrote:
> On Sat, Oct 23, 2010 at 03:59:13PM -0700, Greg Stark wrote:
>> I think the blocker with MERGE has always been that the standard is
>> *so* general that it could apply to conditions that *aren't* covered
>> by a unique constraint or exclusion constriant. Personally, I'm fine
>> saying that those cases will perform poorly, locking the table, or
>> aren't allowed and print a warning explaining exactly what exclusion
>> constraint or btree unique index would be necessary to allow it. I
>> think virtually every case where people will want to use it will be on
>> a primary key anyways.
>
> Can we please not get MERGE hung up on trying to make it atomic. The
> standard requires no such thing and there are plenty of uses of MERGE
> that don't involve high concurrency updates of the same row by
> different processes. If we want an atomic UPSERT, make that a seperate
> command, MERGE without atomicity is useful enough already.
>
> Simply, if the row is visible in the current snapshot you do an update
> otherwise an insert. If you get an error, raise it. Icing can be added
> later.

Long term, I'm wondering if the permanent fix for this problem might
involve something similar to what EXCLUDE does - use a dirty snapshot
for the scan of the target table, and block on in=d

Yeah, I couldn't have said it better.  It's an annoying implementation
restriction but only that.  I think it's a perfectly acceptable
restriction for an initial commit.  It's no secret that our process
has trouble absorbing large patches.

I am also wondering exactly what the semantics are supposed to be
under concurrency.  We can't assume that it's OK to treat WHEN NOT
MATCHED as WHEN MATCHED if a unique-key violation is encountered.  The
join condition could be something else entirely.  I guess we could
scan the target table with a dirty snapshot and block on any in-doubt
tuples, similar to what EXCLUDE constraints do internally, but
throwing MVCC out the window doesn't seem right either.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: ask for review of MERGE

От

Greg Smith

Дата:

24 октября 2010 г., 13:17:19

Robert Haas wrote:
> I am also wondering exactly what the semantics are supposed to be
> under concurrency.  We can't assume that it's OK to treat WHEN NOT
> MATCHED as WHEN MATCHED if a unique-key violation is encountered.  The
> join condition could be something else entirely.  I guess we could
> scan the target table with a dirty snapshot and block on any in-doubt
> tuples, similar to what EXCLUDE constraints do internally, but
> throwing MVCC out the window doesn't seem right either.
>   

You've answered Tom's question of why not to just INSERT and trap the 
error here.  Presuming a unique key violation is what will kick back in 
this situation and to just follow the other side doesn't cover all the 
ways this can get used.

I'm thinking of this problem now as being like the one that happens when 
a concurrent UPDATE happens.  To quote from the docs here:

"a target row might have already been updated (or deleted or locked) by 
another concurrent transaction by the time it is found. In this case, 
the would-be updater will wait for the first updating transaction to 
commit or roll back (if it is still in progress). If the first updater 
rolls back, then its effects are negated and the second updater can 
proceed with updating the originally found row. If the first updater 
commits, the second updater will ignore the row if the first updater 
deleted it, otherwise it will attempt to apply its operation to the 
updated version of the row. The search condition of the command (the 
WHERE clause) is re-evaluated to see if the updated version of the row 
still matches the search condition. If so, the second updater proceeds 
with its operation using the updated version of the row."

What I think we can do with adding a key reservation is apply the same 
sort of logic to INSERTs too, given a way to lock an index entry before 
the row itself is complete.  If MERGE hits a WHERE condition that finds 
a key lock entry in the index, it has to sit and wait for that other 
command to finish.  There isn't any other sensible behavior I can see in 
that situation, just like there isn't one for UPDATE right now.

If the transaction that was locking the index entry commits, it has to 
start over with re-evaluating the WHEN MATCHED part (the equivilent of 
the WHERE in the UPDATE case).  But it shouldn't need to do a new JOIN 
again, because that condition was proven to be met already by the lock 
squatting on that index key, without even having the rest of the row 
there yet to check its data.  If the other entry commits, it would then 
follow the WHEN MATCHED path and in the UPSERT case do an UPDATE.  If 
the transaction who had the key reservervation rolls back, the 
re-evaluation of the MERGE WHEN MATCHED will fail, at which point it 
follows the WHERE FOUND INSERT path.  As it will now be the locker of 
the key reservation it had been waiting on earlier at that point, it 
will be free to do the INSERT with no concerns about race conditions or 
retries.  In the SERIALIZABLE case, again just try to follow the same 
slightly different rules that exist for concurrent UPDATE commands.

There may be a tricky point here that we will need to warn people that 
UPSERT implementations need to make sure they only reference the columns 
of the primary key in the MERGE conditions for this to work as expected, 
because otherwise they might get back a unique key violation error 
anyway.  I don't know how other databases deal with that particular 
problem.  I have considered following the somewhat distasteful path of 
installing SQL Server or Oracle here just to see how they handle some of 
the pathological test cases I'm now thinking about for this command.

This particular weak spot in MVCC is already there for UPDATE, and this 
approach seems to me to be the most logical way to extend the same basic 
implementation to also eliminate some concurrent MERGE race conditions, 
including the most common one the UPSERT case is stuck with.  But I have 
no intention of halting work on the basic MERGE to wait for this problem 
to be solved even if I'm completely wrong about what I've outlined 
above.  That style of thinking--don't even start until every a perfect 
solution for every possible problem is known--is something I consider a 
problem in this community's development model, one that has contributed 
to why no work has been done on this feature until now.  This one splits 
nicely into "get the implemenation working" and "improve the 
concurrency" parts, and I haven't heard a good reason yet why those need 
to coupled together.

-- 
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services and Support        www.2ndQuadrant.us

Re: ask for review of MERGE

От

Robert Haas

Дата:

24 октября 2010 г., 14:43:46

On Sun, Oct 24, 2010 at 12:15 PM, Greg Smith <greg@2ndquadrant.com> wrote:
> Robert Haas wrote:
>> I am also wondering exactly what the semantics are supposed to be
>> under concurrency.  We can't assume that it's OK to treat WHEN NOT
>> MATCHED as WHEN MATCHED if a unique-key violation is encountered.  The
>> join condition could be something else entirely.  I guess we could
>> scan the target table with a dirty snapshot and block on any in-doubt
>> tuples, similar to what EXCLUDE constraints do internally, but
>> throwing MVCC out the window doesn't seem right either.
>
> [discussion of EPQ behavior for UPDATE statements]
>
> What I think we can do with adding a key reservation is apply the same sort
> of logic to INSERTs too, given a way to lock an index entry before the row
> itself is complete.  If MERGE hits a WHERE condition that finds a key lock
> entry in the index, it has to sit and wait for that other command to finish.
>  There isn't any other sensible behavior I can see in that situation, just
> like there isn't one for UPDATE right now.

Well, there's no guarantee that any index at all exists on the target
table, so any solution based on index locks can't be fully general.

But let's back up and talk about MVCC for a minute.  Suppose we have
three source tuples, (1), (2), and (3); and the target table contains
tuples (1) and (2), of which only (1) is visible to our MVCC snapshot;
suppose also an equijoin.  Clearly, source tuple (1) should fire the
MATCHED rule and source tuple (3) should fire the NOT MATCHED rule,
but what in the world should source tuple (2) do?  AFAICS, the only
sensible behavior is to throw a serialization error, because no matter
what you do the results won't be equivalent to a serial execution of
the transaction that committed target tuple (2) and the transaction
that contains the MERGE.

Even with predicate locks, it's not obvious how to me how solve this
problem.  Target tuple (2) may already be there, and its transaction
already committed, by the time the MERGE statement gets around to
looking at the source data.  In fact, even taking an
AccessExclusiveLock on the target table doesn't fix this, because the
snapshot would be taken before the lock.

> been done on this feature until now.  This one splits nicely into "get the
> implemenation working" and "improve the concurrency" parts, and I haven't
> heard a good reason yet why those need to coupled together.

Sounds like we're all on the same page.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: ask for review of MERGE

От

Greg Smith

Дата:

24 октября 2010 г., 18:40:38

Robert Haas wrote:
> Well, there's no guarantee that any index at all exists on the target
> table, so any solution based on index locks can't be fully general.
>   

Sure, but in the most common use case I think we're targeting at first, 
no indexes means there's also no unique constraints either.  I don't 
think people can reasonable expect to MERGE or anything else to 
guarantee they won't accidentally create two rows that conflict via the 
terms of some non-database enforced key.

I am too brain-dead this particular Sunday to think of exactly how to 
deal with the 3-tuple case you described, except to say that I think 
it's OK for complicated situations to give up and throw a serialization 
error.  I'm already collecting a list of pathological tests and will try 
to add something based on your problem case, then see what we can do 
with it later.

-- 
Greg Smith, 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
PostgreSQL Training, Services and Support  www.2ndQuadrant.us

Re: ask for review of MERGE

От

Greg Stark

Дата:

24 октября 2010 г., 20:12:26

On Sun, Oct 24, 2010 at 2:50 AM, Martijn van Oosterhout
<kleptog@svana.org> wrote:
> Can we please not get MERGE hung up on trying to make it atomic. The
> standard requires no such thing and there are plenty of uses of MERGE
> that don't involve high concurrency updates of the same row by
> different processes. If we want an atomic UPSERT, make that a seperate
> command, MERGE without atomicity is useful enough already.

Really? I don't really see any point in a non-atomic MERGE. Nor in a
non-atomic UPDATE or INSERT or any other operation. The A in ACID is
as important as any other letter.

For that matter if you don't care about atomicity then this is a
problem already solvable easily solvable in the application. Why
bother providing a special command for it. The whole reason to want a
special command is precisely because the database can implement it
atomically more easily and more efficiently than any application
implementation.

Moreover having a case which is non-atomic and allows inconsistent
results or errors in the face of concurrent updates is a foot-gun.
Someone will come along and try to use it and it will appear to work
in their application but introduce nasty hidden race conditions.

-- 
greg

Re: ask for review of MERGE

От

Greg Stark

Дата:

24 октября 2010 г., 20:17:26

On Sun, Oct 24, 2010 at 2:39 PM, Greg Smith <greg@2ndquadrant.com> wrote:
> Sure, but in the most common use case I think we're targeting at first, no
> indexes means there's also no unique constraints either.  I don't think
> people can reasonable expect to MERGE or anything else to guarantee they
> won't accidentally create two rows that conflict via the terms of some
> non-database enforced key.

I'm fine with either a) having the merge constraint be required to
match exactly either a exclusion constraint or unique index or throw
an error or b) lock the table if you perform a merge against a table
on a non-indexed condition like foreign keys do if you're missing the
relevant index.

Either way I expect this case to be a rare use case where users are
either doing data loads and locking the table against concurrent
updates is fine or they will immediately realize the error of their
ways and create the corresponding unique or exclusion constraint
index.

Other than bulk data loads I don't see any useful use case that
doesn't have the corresponding exclusion constraint or unique index as
a hard requirement anyways. It'll be necessary for both correctness
and performance.

--
greg

Re: ask for review of MERGE

От

Robert Haas

Дата:

25 октября 2010 г., 00:19:41

On Sun, Oct 24, 2010 at 7:11 PM, Greg Stark <gsstark@mit.edu> wrote:
> On Sun, Oct 24, 2010 at 2:50 AM, Martijn van Oosterhout
> <kleptog@svana.org> wrote:
>> Can we please not get MERGE hung up on trying to make it atomic. The
>> standard requires no such thing and there are plenty of uses of MERGE
>> that don't involve high concurrency updates of the same row by
>> different processes. If we want an atomic UPSERT, make that a seperate
>> command, MERGE without atomicity is useful enough already.
>
> Really? I don't really see any point in a non-atomic MERGE. Nor in a
> non-atomic UPDATE or INSERT or any other operation. The A in ACID is
> as important as any other letter.

I think we're really discussing the "I" in ACID, not the "A".  There's
no question that the MERGE transaction will either commit or fail.
What we're talking about is what happens when there are concurrent
table modifications in progress; and the answer is that you might get
serialization anomalies.  But we have serialization anomalies without
MERGE, too - see the discussions around Kevin Grittner's SSI patch
which, come to think of it, might be useful for this case, too.

I posted an example upthread which I think demonstrates very clearly
that MERGE will result in unavoidable serialization failures - so  if
the standard is that we mustn't have any serialization failures then
the standard can never be met.  The best we can hope for is that we'll
detect them and roll back if they occur, rather than going on to
commit but perhaps with some anomaly.  And I'm pretty sure that's what
KG's SSI patch is going to give us.  So I'm not sure there's really
anything to get worked up about here in terms of concurrency issues.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: ask for review of MERGE

От

"Kevin Grittner"

Дата:

25 октября 2010 г., 09:12:46

Robert Haas  wrote:
> What we're talking about is what happens when there are concurrent
> table modifications in progress; and the answer is that you might
> get serialization anomalies. But we have serialization anomalies
> without MERGE, too - see the discussions around Kevin Grittner's
> SSI patch which, come to think of it, might be useful for this
> case, too.
I've been thinking about that as I read the discussion.  If both
transactions are serializable, there would tend to be a default
behavior of preventing anomalies.  ("Tend to be" because it might
actually require the addition of a few calls to predicate locking
functions from new MERGE code to be consistent with other code under
the patch.)
On the other hand, where there is a more targeted way of protecting
integrity, I've tried to keep SSI out of the way and let it function
as it has.  For example, foreign key enforcement already manages
this, so the SSI implementation intentionally ignores those reads and
writes.  From the discussion on MERGE I've been thinking that where
there is an appropriate unique index the SSI code might want to stay
out of the way, similar to foreign keys; but it might be used to
handle the cases where there is no appropriate index.  Or possibly
the predicate locking portion of it could be used in a novel way by
MERGE code to implement the MERGE logic.  The API for SSI basically
involves three types of functions:- Acquire a predicate lock on an object.- Check whether a given object is predicate
locked.-Check for rw-conflict.
 
To be useful for MERGE, that second category would probably need to
be expanded, and we might need to upgrade btree index locking to
support range locking rather than stopping at index page locks.  Dan
is planning to try this once he has sufficient benchmarks as a base
to confirm the performance impact.
> I posted an example upthread which I think demonstrates very
> clearly that MERGE will result in unavoidable serialization
> failures - so if the standard is that we mustn't have any
> serialization failures then the standard can never be met. The best
> we can hope for is that we'll detect them and roll back if they
> occur, rather than going on to commit but perhaps with some
> anomaly. And I'm pretty sure that's what KG's SSI patch is going to
> give us. So I'm not sure there's really anything to get worked up
> about here in terms of concurrency issues.
Given the standard's emphasis on data integrity and the known
concurrency issues with relational theory, I find it hard to believe
that the standard requires "no serialization failures duing merge" --
but I haven't had time to dig into the standard's specifications yet.
-Kevin

Re: ask for review of MERGE

От

Greg Stark

Дата:

25 октября 2010 г., 14:43:08

On Sun, Oct 24, 2010 at 10:43 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> But let's back up and talk about MVCC for a minute.  Suppose we have
> three source tuples, (1), (2), and (3); and the target table contains
> tuples (1) and (2), of which only (1) is visible to our MVCC snapshot;
> suppose also an equijoin.  Clearly, source tuple (1) should fire the
> MATCHED rule and source tuple (3) should fire the NOT MATCHED rule,
> but what in the world should source tuple (2) do?  AFAICS, the only
> sensible behavior is to throw a serialization error, because no matter
> what you do the results won't be equivalent to a serial execution of
> the transaction that committed target tuple (2) and the transaction
> that contains the MERGE.

So the behaviour we get with UPDATE in this situation is that we
update (2) so I would expect this to execute the MATCHED rule. The key
distinction is that since we're not returning the data to the user the
user sees we want to update the most recent version and it's "almost"
as if we ran "after" all the other transactions. It's not really
serializable and I think in serializable mode we throw a serialization
failure instead but in most usage patterns it's precisely what the
user wants.

Here "bbb" contained two records when we began with values "1" and "2"
but the "2" was inserted in a transaction which hadn't committed yet.
It commited after the update.

postgres=> begin;
BEGIN
postgres=> select * from bbb;i
---1
(1 row)

postgres=> update bbb set i = i+1;
UPDATE 2
postgres=> commit;
COMMIT
postgres=> select * from bbb;i
---23
(2 rows)

--
greg

Re: ask for review of MERGE

От

"Kevin Grittner"

Дата:

25 октября 2010 г., 15:08:11

Greg Stark <gsstark@mit.edu> wrote:
> Robert Haas <robertmhaas@gmail.com> wrote:
>> But let's back up and talk about MVCC for a minute.  Suppose we
>> have three source tuples, (1), (2), and (3); and the target table
>> contains tuples (1) and (2), of which only (1) is visible to our
>> MVCC snapshot; suppose also an equijoin.  Clearly, source tuple
>> (1) should fire the MATCHED rule and source tuple (3) should fire
>> the NOT MATCHED rule, but what in the world should source tuple
>> (2) do?  AFAICS, the only sensible behavior is to throw a
>> serialization error, because no matter what you do the results
>> won't be equivalent to a serial execution of the transaction that
>> committed target tuple (2) and the transaction that contains the
>> MERGE.
> 
> So the behaviour we get with UPDATE in this situation is that we
> update (2) so I would expect this to execute the MATCHED rule.
Certainly that would be consistent with the behavior of READ
COMMITTED -- wait for commit or rollback of the concurrent
transaction, and then proceed with whatever data is there after
completion of the other transaction.  With REPEATABLE READ or
SERIALIZABLE you would block until commit of the other transaction
and terminate with a write conflict -- a form of serialization
failure.  If the other transaction rolls back you INSERT.
At least, that would be the least surprising behavior to me.
-Kevin

Re: ask for review of MERGE

От

Robert Haas

Дата:

25 октября 2010 г., 15:10:04

On Mon, Oct 25, 2010 at 1:42 PM, Greg Stark <gsstark@mit.edu> wrote:
> On Sun, Oct 24, 2010 at 10:43 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> But let's back up and talk about MVCC for a minute.  Suppose we have
>> three source tuples, (1), (2), and (3); and the target table contains
>> tuples (1) and (2), of which only (1) is visible to our MVCC snapshot;
>> suppose also an equijoin.  Clearly, source tuple (1) should fire the
>> MATCHED rule and source tuple (3) should fire the NOT MATCHED rule,
>> but what in the world should source tuple (2) do?  AFAICS, the only
>> sensible behavior is to throw a serialization error, because no matter
>> what you do the results won't be equivalent to a serial execution of
>> the transaction that committed target tuple (2) and the transaction
>> that contains the MERGE.
>
> So the behaviour we get with UPDATE in this situation is that we
> update (2) so I would expect this to execute the MATCHED rule.

Not exactly.  Consider this example:

rhaas=# create table concurrent (x integer primary key);
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index
"concurrent_pkey" for table "concurrent"
CREATE TABLE
rhaas=# insert into x values (1);
rhaas=# begin;
BEGIN
rhaas=# insert into concurrent values (2);
INSERT 0 1

<switch to a different window>

rhaas=# update concurrent set x=x where x=2;
UPDATE 0

> The key
> distinction is that since we're not returning the data to the user the
> user sees we want to update the most recent version and it's "almost"
> as if we ran "after" all the other transactions. It's not really
> serializable and I think in serializable mode we throw a serialization
> failure instead but in most usage patterns it's precisely what the
> user wants.

I think it would be perfectly reasonable to have a transaction
isolation level that does not use a snapshot at all and instead runs
everything relative to SnapshotNow, and people could use it with MERGE
if they were so inclined.  I think this would correspond more or less
to the READ COMMITTED isolation level specified in the standard; what
we now call READ COMMITTED is actually better than READ COMMITTED but
not quite as good as REPEATABLE READ.  That, combined with an
exclusive lock on the table (or, perhaps, some kind of predicate
locking mechanism) would be sufficient to prevent serialization
anomalies.

However, I don't think that implementing those semantics for just this
one command (or part of it) makes a whole lot of sense.  The EPQ
behavior of our current default isolation level is really pretty
strange, and adding a random wart that the target table (but not the
source table) in a MERGE query gets scanned using SnapshotNow would be
one more piece of strangeness atop the strangeness we already have.
And, as we just saw with the enum stuff, SnapshotNow can lead to some
awfully strange behavior - you could end up processing half of the
data from a concurrent transaction and missing the other half.

> Here "bbb" contained two records when we began with values "1" and "2"
> but the "2" was inserted in a transaction which hadn't committed yet.
> It commited after the update.
>
> postgres=> begin;
> BEGIN
> postgres=> select * from bbb;
>  i
> ---
>  1
> (1 row)
>
> postgres=> update bbb set i = i+1;
> UPDATE 2
> postgres=> commit;
> COMMIT
> postgres=> select * from bbb;
>  i
> ---
>  2
>  3
> (2 rows)

Well, at least on my system, if the transaction inserting (2) hasn't
committed yet, that UPDATE statement will block until it does, because
trying to change i from 1 to 2 causes the update of the unique index
to block, since there's an in-doubt tuple with (2) already.  Then it
will continue on as you've shown here, due to EPQ.  But if you do the
same statement with i = i + 10 instead of + 1, then it doesn't block,
and only updates the one row that's visible.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: ask for review of MERGE

От

"Kevin Grittner"

Дата:

25 октября 2010 г., 16:15:31

Robert Haas <robertmhaas@gmail.com> wrote:
> rhaas=# create table concurrent (x integer primary key);
> NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index
> "concurrent_pkey" for table "concurrent"
> CREATE TABLE
> rhaas=# insert into x values (1);
> rhaas=# begin;
> BEGIN
> rhaas=# insert into concurrent values (2);
> INSERT 0 1
> 
> <switch to a different window>
> 
> rhaas=# update concurrent set x=x where x=2;
> UPDATE 0
That surprised me.  I would have thought that the INSERT would have
created an "in doubt" tuple which would block the UPDATE.  What is
the reason for not doing so?
FWIW I did a quick test and REPEATABLE READ also lets this pass but
with the SSI patch SERIALIZABLE seems to cover this correctly,
generating a serialization failure where such access is involved in
write skew:
test=# create table concurrent (x integer primary key);
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index
"concurrent_pkey" for table "concurrent"
CREATE TABLE
test=# insert into concurrent select generate_series(1, 20000);
INSERT 0 20000
test=# begin isolation level serializable;
BEGIN
test=# insert into concurrent values (0);
INSERT 0 1
test=# update concurrent set x = 30001 where x = 30000;
UPDATE 0
<different session>
test=# begin isolation level serializable;
BEGIN
test=# insert into concurrent values (30000);
INSERT 0 1
test=# update concurrent set x = -1 where x = 0;
UPDATE 0
test=# commit;
ERROR:  could not serialize access due to read/write dependencies
among transactions
HINT:  The transaction might succeed if retried.
I'll need to add a test to cover this, because it might have broken
with one of the optimizations on my list, had you not point out this
behavior.
On the other hand:
<session 1>
test=# drop table concurrent ;
DROP TABLE
test=# create table concurrent (x integer primary key);
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index
"concurrent_pkey" for table "concurrent"
CREATE TABLE
test=# insert into concurrent select generate_series(1, 20000);
INSERT 0 20000
test=# begin isolation level serializable;
BEGIN
test=# insert into concurrent values (0);
INSERT 0 1
<session 2>
test=# begin isolation level serializable;
BEGIN
test=# select * from concurrent where x = 0;x
---
(0 rows)

test=# insert into concurrent values (0);
<blocks>
<session 1>
test=# commit;
COMMIT
<session 2>
ERROR:  duplicate key value violates unique constraint
"concurrent_pkey"
DETAIL:  Key (x)=(0) already exists.
Anyway, I thought this might be of interest in terms of the MERGE
patch concurrency issues, since the SSI patch has been mentioned.
-Kevin

Re: ask for review of MERGE

От

Robert Haas

Дата:

25 октября 2010 г., 16:40:15

On Mon, Oct 25, 2010 at 3:15 PM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:
> Robert Haas <robertmhaas@gmail.com> wrote:
>
>> rhaas=# create table concurrent (x integer primary key);
>> NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index
>> "concurrent_pkey" for table "concurrent"
>> CREATE TABLE
>> rhaas=# insert into x values (1);
>> rhaas=# begin;
>> BEGIN
>> rhaas=# insert into concurrent values (2);
>> INSERT 0 1
>>
>> <switch to a different window>
>>
>> rhaas=# update concurrent set x=x where x=2;
>> UPDATE 0
>
> That surprised me.  I would have thought that the INSERT would have
> created an "in doubt" tuple which would block the UPDATE.  What is
> the reason for not doing so?

This is just standard MVCC - readers don't block writers, nor writers
readers.  You might also think about what would happen if the UPDATE
were run before the INSERT of (2).  There's no serialization anomaly
here, because either concurrent case is equivalent to the serial
schedule where the update precedes the insert.

In the case of a MERGE that matches a just-inserted invisible tuple
but no visible tuple, things are a bit stickier.  Let's suppose we're
trying to use MERGE to get UPSERT semantics.  If the MERGE command has
the obvious behavior of ignoring the invisible tuple just as UPDATE or
DELETE would do, then clearly any equivalent serial schedule must run
the MERGE before the INSERT (because if it were run after the INSERT,
it would fire the MATCHED action rather than the NOT MATCHED action).
But if the merge were run before the INSERT, then the INSERT would
have failed with a unique key violation; instead, the merge fails with
a unique key violation.  On the other hand, if the MERGE sees the
invisible tuple, essentially using SnapshotNow semantics, as Greg
Stark proposed, you get a different (and probably worse) class of
serialization anomalies.  For example, suppose the table has rows
1-100 and you do an update adding 1000 to each value concurrently with
merging in the values 51-100.  You might get something like this:

- MERGE scans rows 1-75, firing MATCHED for rows 51-75.
- UPDATE commits.
- MERGE scans rows 76-100, firing NOT MATCHED for each.

Now, as Greg says, that might be what some people want, but it's
certainly monumentally unserializable.  In a serial execution
schedule, the MERGE will either run before the UPDATE, in which case
MATCHED will fire for rows 51-100, or else the UPDATE will run before
the MERGE, in which case NOT MATCHED will fire for rows 51-100.  No
serial schedule is going to fire MATCHED for some rows and NOT MATCHED
for others.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: ask for review of MERGE

От

Greg Stark

Дата:

25 октября 2010 г., 17:13:34

On Mon, Oct 25, 2010 at 12:40 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> Now, as Greg says, that might be what some people want, but it's
> certainly monumentally unserializable.

To be clear when I said it's what people want what I meant was that in
the common cases it's doing exactly what people want. As opposed to
getting closer to what people want in general but not quite hitting
the mark in the common cases.

Just as an example I think it's important that in the simplest case,
upsert of a single record, it be 100% guaranteed to do the naive
upsert. If two users are doing the merge of a single key at the same
time one of them had better insert and one of them had better update
or else users are going to be monumentally surprised.

I guess I hadn't considered all the cases and I agree it's important
that our behaviour make some kind of sense and be consistent with how
we handle updates and of existing in-doubt tuples. I wasn't trying to
introduce a whole new mode of operation, just work from analogy from
the way update works. It's clear that even with our existing semantics
there are strange corner cases once you get to multiple updates
happening in a single transaction. But we get the simple cases right
and even in the more complex cases, while it's not truly serializable
we should be able to come up with some basic smell tests that we pass.

My understanding is that currently we generally treat DML in one of
two ways depending on whether it's returning data to the user or
updating data in the table (include select for share). If it's
returning data to the user we use a snapshot to give the user a
consistent view of the database. If it's altering data in the database
we use the snapshot to get a consistent set of records and then apply
the updates to the most recent version.

The anomaly you showed with update and the problem with MERGE are both
because the operation was simultaneously doing a "read" -- the WHERE
clause and the uniqueness check in the MERGE -- and a write. This is
already the kind of case where we do weird things -- what kind of
behaviour would be consistent with our existing, somewhat weird,
behaviour?

-- 
greg

Re: ask for review of MERGE

От

"Kevin Grittner"

Дата:

25 октября 2010 г., 17:23:57

Robert Haas <robertmhaas@gmail.com> wrote:
> Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote:
>> I would have thought that the INSERT would have
>> created an "in doubt" tuple which would block the UPDATE.
> This is just standard MVCC - readers don't block writers, nor
> writers readers.
Sure, but I tend to think of both INSERT and UPDATE as writes.  ;-)
> You might also think about what would happen if the UPDATE
> were run before the INSERT
> either concurrent case is equivalent to the serial
> schedule where the update precedes the insert.
I guess that's persuasive enough.  It feels funny, but the argument
looks sound, so I guess it's just a case of my intuition being
faulty.
> In the case of a MERGE that matches a just-inserted invisible
> tuple but no visible tuple, things are a bit stickier.
Well, more generally it can lead to anomalies in a more complex
combination of actions, since it creates, as you imply above, a
rw-dependency from the transaction doing the UPDATE to the
transaction doing the INSERT, so the combination can form part of a
cycle in apparent order of execution which can produce an anomaly. 
The more I look at it, the more clear it is that current behavior is
correct and what the implications are.  I've just missed that detail
until now, wrongly assuming that it would be a write conflict.
> [example of MERGE which can not serialize with a concurrent
> transaction, and possible outcomes if there is no serialization
> failure]
> Now, as Greg says, that might be what some people want, but it's
> certainly monumentally unserializable.
Yeah.  MERGE should probably be sensitive to the transaction
isolation level, at least to the extent that MERGE in a SERIALIZABLE
transaction plays nice with other SERIALIZABLE transactions.  That
would be necessary to allow business rules enforced through triggers
to be able to guarantee data integrity.  It would mean that a MERGE
involving tables which were the target of modifications from
concurrent SERIALIZABLE transactions would be likely to fail and/or
to cause other transactions to fail.
-Kevin

Re: ask for review of MERGE

От

Robert Haas

Дата:

25 октября 2010 г., 17:58:24

On Mon, Oct 25, 2010 at 4:10 PM, Greg Stark <gsstark@mit.edu> wrote:
> On Mon, Oct 25, 2010 at 12:40 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> Now, as Greg says, that might be what some people want, but it's
>> certainly monumentally unserializable.
>
> To be clear when I said it's what people want what I meant was that in
> the common cases it's doing exactly what people want. As opposed to
> getting closer to what people want in general but not quite hitting
> the mark in the common cases.
>
> Just as an example I think it's important that in the simplest case,
> upsert of a single record, it be 100% guaranteed to do the naive
> upsert. If two users are doing the merge of a single key at the same
> time one of them had better insert and one of them had better update
> or else users are going to be monumentally surprised.

Hmm, so let's think about that case.

The first merge comes along and finds no match so it fires the NOT
MATCHED rule, which inserts a tuple.  The second merge comes along and
finds no match, so it also fires the NOT MATCHED rule and tries to
insert a tuple.  But upon consulting the PRIMARY KEY index it finds
that an in-doubt tuple exists so it goes to sleep waiting for the
first transaction to commit or abort.  If the first transaction
commits it then decides that the jig is up and fails.  We could
(maybe) fix this by doing something similar to what EPQ does for
updates: when the first transaction commits, instead of redoing the
insert, we back up and recheck whether the new tuple would have
matched the join clause and, if so, we instead fire the MATCHED action
on the updated tuple.  If not, we fire NOT MATCHED anyway.  I'm not
sure how hard that would be, or whether it would introduce any other
nasty anomalies in more complex cases.

Alternatively, we could introduce an UPSERT or REPLACE statement
intended to handle exactly this case and leave MERGE for more complex
situations.  It's pretty easy to imagine what the coding of that
should look like: if we encounter an in-doubt tuple in we wait on its
xmin.  If the transaction aborts, we insert.  If it commits, and we're
in READ COMMITTED mode, we update it; but if we're in REPEATABLE READ
or SERIALIZABLE mode, we abort with a serialization error.  That's a
lot simpler to understand and reason about than MERGE in its full
generality.

I think it's pretty much hopeless to think that MERGE is going to work
in complex concurrent scenarios without creating serialization
anomalies, or at least rollbacks.  I think that's baked into the
nature of what the statement does.  To simulate MERGE, you need to
read from the target table and then do writes that depend on what you
read.  If you do that with the commands that are available today,
you're going to get serialization anomalies and/or rollbacks under
concurrency.  The mere fact of that logic being inside the database
rather than outside isn't going to make that go away.  Now sometimes,
as with exclusion constraints, you can play games with dirty snapshots
to get the semantics you want, but whether that's possible in a
particular case depends on the details of the operation being
performed, and here I think it can't be done.  Some operations are
*fundamentally* unserializable.

A very simple example of this is a sequence that is guaranteed not to
have gaps (a feature we've occasionally been requested to provide).
If N processes request a sequence number simultaneously, you have to
hand out a value to the first guy and wait and see whether he commits
or aborts before deciding what number to give the second guy.  That
sucks, so usually we just design our applications not to require that
sequences be gap-free.  Similarly here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: ask for review of MERGE

От

Simon Riggs

Дата:

26 октября 2010 г., 17:01:33

On Mon, 2010-10-25 at 16:58 -0400, Robert Haas wrote:
> On Mon, Oct 25, 2010 at 4:10 PM, Greg Stark <gsstark@mit.edu> wrote:
> > On Mon, Oct 25, 2010 at 12:40 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> >> Now, as Greg says, that might be what some people want, but it's
> >> certainly monumentally unserializable.
> >
> > To be clear when I said it's what people want what I meant was that in
> > the common cases it's doing exactly what people want. As opposed to
> > getting closer to what people want in general but not quite hitting
> > the mark in the common cases.
> >
> > Just as an example I think it's important that in the simplest case,
> > upsert of a single record, it be 100% guaranteed to do the naive
> > upsert. If two users are doing the merge of a single key at the same
> > time one of them had better insert and one of them had better update
> > or else users are going to be monumentally surprised.
> 
> Hmm, so let's think about that case.
> 
> The first merge comes along and finds no match so it fires the NOT
> MATCHED rule, which inserts a tuple.  The second merge comes along and
> finds no match, so it also fires the NOT MATCHED rule and tries to
> insert a tuple.  But upon consulting the PRIMARY KEY index it finds
> that an in-doubt tuple exists so it goes to sleep waiting for the
> first transaction to commit or abort.  If the first transaction
> commits it then decides that the jig is up and fails.  We could
> (maybe) fix this by doing something similar to what EPQ does for
> updates: when the first transaction commits, instead of redoing the
> insert, we back up and recheck whether the new tuple would have
> matched the join clause and, if so, we instead fire the MATCHED action
> on the updated tuple.  If not, we fire NOT MATCHED anyway.  I'm not
> sure how hard that would be, or whether it would introduce any other
> nasty anomalies in more complex cases.
> 
> Alternatively, we could introduce an UPSERT or REPLACE statement
> intended to handle exactly this case and leave MERGE for more complex
> situations.  It's pretty easy to imagine what the coding of that
> should look like: if we encounter an in-doubt tuple in we wait on its
> xmin.  If the transaction aborts, we insert.  If it commits, and we're
> in READ COMMITTED mode, we update it; but if we're in REPEATABLE READ
> or SERIALIZABLE mode, we abort with a serialization error.  That's a
> lot simpler to understand and reason about than MERGE in its full
> generality.
> 
> I think it's pretty much hopeless to think that MERGE is going to work
> in complex concurrent scenarios without creating serialization
> anomalies, or at least rollbacks.  I think that's baked into the
> nature of what the statement does.  To simulate MERGE, you need to
> read from the target table and then do writes that depend on what you
> read.  If you do that with the commands that are available today,
> you're going to get serialization anomalies and/or rollbacks under
> concurrency.  The mere fact of that logic being inside the database
> rather than outside isn't going to make that go away.  Now sometimes,
> as with exclusion constraints, you can play games with dirty snapshots
> to get the semantics you want, but whether that's possible in a
> particular case depends on the details of the operation being
> performed, and here I think it can't be done.  Some operations are
> *fundamentally* unserializable.

I agree with your analysis. Let me review...

There is a case that locking alone won't resolve, however that locking
works. The transaction history for that is:

T1: takes snapshot
T2: INSERT row1
T2: COMMIT;
T1: attempts to determine if MATCHED or NOT MATCHED. 

The answer is neither of those two answers. If we say it is NOT MATCHED
then we will just fail on any INSERT, or if we say it is MATCHED then
technically we can't see the row so the UPDATE should fail. The COMMIT
of T2 releases any locks that would have helped resolve this, and note
that even if T1 attempts to lock that row as early as possible, only a
table level lock would prevent T2 from doing that action.

Two options for resolution are

1) Throw SERIALIZABLE error

2) Implement something similar to EvalPlanQual
As you say, we already resolve this situation for concurrent updates by
following the update chain from a row that is visible to the latest row.
For MERGE the semantics need to be subtely different here: we need to
follow the update chain to the latest row, but from a row that we
*can't* see.

MERGE is still very useful without the need for 2), though fails in some
cases for concurrent use. The failure rate would increase as the number
of concurrent MERGErs and/or number of rows in source table. Those
errors are no more serious than are possible now.

So IMHO we should implement 1) now and come back later to implement 2).
Given right now there may be other issues with 2) it seems unsafe to
rely on the assumption that we'll fix them by end of release.

-- Simon Riggs           http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services

Re: ask for review of MERGE

От

Robert Haas

Дата:

26 октября 2010 г., 17:08:41

On Tue, Oct 26, 2010 at 8:10 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> I agree with your analysis. Let me review...
> [review]

Sounds like we're on the same page.

> Two options for resolution are
>
> 1) Throw SERIALIZABLE error
>
> 2) Implement something similar to EvalPlanQual
> As you say, we already resolve this situation for concurrent updates by
> following the update chain from a row that is visible to the latest row.
> For MERGE the semantics need to be subtely different here: we need to
> follow the update chain to the latest row, but from a row that we
> *can't* see.
>
> MERGE is still very useful without the need for 2), though fails in some
> cases for concurrent use. The failure rate would increase as the number
> of concurrent MERGErs and/or number of rows in source table. Those
> errors are no more serious than are possible now.
>
> So IMHO we should implement 1) now and come back later to implement 2).
> Given right now there may be other issues with 2) it seems unsafe to
> rely on the assumption that we'll fix them by end of release.

Yeah.  In fact, I'm not sure we're ever going to want to implement #2
- I think that needs more study to determine whether there's even
something there that makes sense to implement at all.  But certainly I
wouldn't count on it happening for 9.1.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: ask for review of MERGE

От

Simon Riggs

Дата:

26 октября 2010 г., 18:01:08

On Tue, 2010-10-26 at 16:08 -0400, Robert Haas wrote:
> On Tue, Oct 26, 2010 at 8:10 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> > I agree with your analysis. Let me review...
> > [review]
> 
> Sounds like we're on the same page.
> 
> > Two options for resolution are
> >
> > 1) Throw SERIALIZABLE error
> >
> > 2) Implement something similar to EvalPlanQual
> > As you say, we already resolve this situation for concurrent updates by
> > following the update chain from a row that is visible to the latest row.
> > For MERGE the semantics need to be subtely different here: we need to
> > follow the update chain to the latest row, but from a row that we
> > *can't* see.
> >
> > MERGE is still very useful without the need for 2), though fails in some
> > cases for concurrent use. The failure rate would increase as the number
> > of concurrent MERGErs and/or number of rows in source table. Those
> > errors are no more serious than are possible now.
> >
> > So IMHO we should implement 1) now and come back later to implement 2).
> > Given right now there may be other issues with 2) it seems unsafe to
> > rely on the assumption that we'll fix them by end of release.
> 
> Yeah.  In fact, I'm not sure we're ever going to want to implement #2
> - I think that needs more study to determine whether there's even
> something there that makes sense to implement at all.  But certainly I
> wouldn't count on it happening for 9.1.

2) sounds weird, until you realise it is exactly how you would need to
code a PL/pgSQL procedure to do the equivalent of MERGE. Not doing it
just makes no sense in the longer term. I agree it will take a while to
think it through in sufficient detail.

In the meantime it's a good argument for the ELSE construct at the end
of the WHEN clauses, so we can do something other than skip a row or
throw an ERROR.

-- Simon Riggs           http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services

Re: ask for review of MERGE

От

Bruce Momjian

Дата:

12 ноября 2010 г., 14:05:34

Kevin Grittner wrote:
> Robert Haas <robertmhaas@gmail.com> wrote:
>  
> > rhaas=# create table concurrent (x integer primary key);
> > NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index
> > "concurrent_pkey" for table "concurrent"
> > CREATE TABLE
> > rhaas=# insert into x values (1);
> > rhaas=# begin;
> > BEGIN
> > rhaas=# insert into concurrent values (2);
> > INSERT 0 1
> > 
> > <switch to a different window>
> > 
> > rhaas=# update concurrent set x=x where x=2;
> > UPDATE 0
>  
> That surprised me.  I would have thought that the INSERT would have
> created an "in doubt" tuple which would block the UPDATE.  What is
> the reason for not doing so?

When Kevin gets surprised, I get worried.  LOL

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: ask for review of MERGE

Вложения

Вложения

Вложения

Вложения