Re: MERGE Specification

Поиск
Список
Период
Сортировка
От	Heikki Linnakangas
Тема	Re: MERGE Specification
Дата	15 августа 2010 г. 04:57:10
Msg-id	4C679DC2.2020508@enterprisedb.com обсуждение исходный текст
Ответ на	Re: MERGE Specification (Boxuan Zhai <bxzhai2010@gmail.com>)
Список	pgsql-hackers
Дерево обсуждения
On 15/08/10 10:31, Boxuan Zhai wrote:
> On Sun, Aug 15, 2010 at 4:05 AM, Heikki Linnakangas<
> heikki.linnakangas@enterprisedb.com>  wrote:
>> Thanks. I went through this, fixing all the spurious end-of-line whitespace
>> first with "git apply --whitespace=fix", and then manually fixing comment
>> and whitespace style, typos, and doing some other small comment editing.
>> Resulting patch attached. It is also available at my git repository at
>> git://git.postgresql.org/git/users/heikki/postgres.git, branch
>> 'mergestmt'. Please base any further patch versions on this patch, so that
>> we don't need to redo this cleanup.
>>
> Thanks for the cleanup. The codes looks better now. My problem is that I
> have done some more modifications after merge_v102. Is there any way to
> apply the cleanup patch without erasing my new codes?

Yeah, there's multiple ways. You can create a patch of what you have 
now, and run interdiff against your previous patch that I worked 
against. That gives you a diff of changes since the last patch you sent 
to the list. You can then apply that patch to the codetree from my git 
repository.

Or you can just generate a new patch of what you have as usual, and I'll 
incorporate the changes to the cleaned-up patch.

>> I'll continue reviewing this sometime next week, but here's few
>> miscellaneous issues for starters;
>>
>> * Explain output of actions needs some work:
>>
>>>   Merge  (cost=246.37..272.96 rows=1770 width=38)
>>>     ACTION: UPDATE WHEN NOT MATCHED
>>>     ACTION: INSERT WHEN NOT MATCHED
>>>     ->   Merge Left Join  (cost=246.37..272.96 rows=1770 width=38)
>>
>> Should print out more details of the action, like for normal
>> updates/inserts/deletes.
>
>
> Well, I think, in normal UPDATE/INSERT/DELETE explanation, there are no more
> details than what we have here except the costs, rows and width, which is
> print out at the MERGE command line.
>
> For example:
>
> Explain
> update buy set volume = volume + 1;
>                            QUERY PLAN
> --------------------------------------------------------------
>   Update  (cost=0.00..36.75 rows=2140 width=14)
>     ->   Seq Scan on buy  (cost=0.00..36.75 rows=2140 width=14)
> (2 rows)
>
> For the explanation of an UPDATE command, we only have a Update title
> followed by the costs, then, after the arrow ->  there comes the plan tree.
> In a MERGE command, no action has its real private plan. They all share the
> main plan.  Thus, the costs for a merge command is that of the main plan.
> What is useful for a merge action is only the target list and quals. So my
> design is to show the merge command costs in first line. Then print out the
> actions and their qualifications in a list, followed by the main plan tree.

It's more more interesting with more complex statement:

postgres=# explain UPDATE target SET id = (SELECT COUNT(*) FROM 
generate_series(1,10));                                      QUERY PLAN 

-------------------------------------------------------------------------------------- Update  (cost=12.51..52.52
rows=2400width=6)   InitPlan 1 (returns $0)     ->  Aggregate  (cost=12.50..12.51 rows=1 width=0)           ->
FunctionScan on generate_series  (cost=0.00..10.00 
 
rows=1000 width=0)   ->  Seq Scan on target  (cost=0.00..40.00 rows=2400 width=6)
(5 rows)

> Is there any other thing you suggest to print out for each action?

It should match the output of a normal Update/Insert as closely as possible.

>> * Is the behavior now SQL-compliant? We had long discussions about the
>> default action being insert or do nothing or raise error, but I never got a
>> clear picture of what the SQL spec says and whether we're compliant.
>>
>> * What does the "one tuple is error" notice mean? We'll have to do
>> something about that.. I don't think we've properly thought through the
>> meaning of RAISE ERROR. Let's cut it out from the patch until then. It's
>> only a few dozen lines to put back when we know what to do about it.
>>
> I find that we have not reached an agreement on MERGE's syntax yet.
> Personally, I support Simon's idea, that is the default action should be
> RAISE ERROR. However, I am no sure what RAISE ERROR should do when we
> get a missing tuple. Here I just use a simple elog(NOTICE, "a tuple is
> error"); for this situation.
>
> I leave this for further extension when a more specific design for RAISE
> ERROR is available.
>
> Well, I have to say the current RAISE ERROR elog is a little bit ugly. Do
> you want me to chage the default action back to DO NOTHING? Or any other
> suggetions? (In fact, my personal thinking is to add a non-omissible "ELSE"
> clause as the end of the action list which forces the user to specify the
> default action for merge).

Whatever the spec says is what we should do.

>> * Do you need the MergeInsert/MergeUpdate/MergeDelete alias nodes? Would it
>> be simpler to just use InsertStmt/UpdateStmt/DeleteStmt directly?
>
> I need one flag in these statement to differentiate them from normal
> InsertStmt/UpdateStmt/DeleteStmt. There are slight difference for the
> process of these two kinds of statements in the transfromStmt() function. I
> define a set of alias nodes which have different nodetags. This can make
> the code simpler.

Ok. You might also consider just adding a "isMergeAction" boolean to 
InsertStmt/UpdateStmt/DeleteStmt instead, or if the difference between 
the regular statements and merge actions grow big, create a completely 
separate node struct for them.

>> * Do you need the out-function for DeleteStmt? Why not for UpdateStmt and
>> InsertStmt?
>
> Ah, I add this function because I want to have a look of the content of
> DeleteStmt. I did this long ago, just after I started the project. If you
> don't want this function, I will remove it.

Ok.

>> * Instead of scanning the list of actions in ExecBS/ExecASMergeTriggers
>> every time, should set flags at plan time to mark what kind of actions there
>> is in the statement.
>
> First of all, I need to say that the functions of ExecBSMergeTriggers() and
> ExecASMergeTriggers() are for per-statement triggers, and are invoked only
> once in the whole process of a MERGE command. Setting flags at plan time can
> save the scanning. I may add it to next patch.  But don't expect it save
> much execution time.

Yeah, I'm not so much concerned about performance, it just seems like it 
might make the code slightly simpler.

>> Or should we defer firing the statement triggers until we hit the first
>> matching row and execute the action for the first time?
>
> No, we cannot do this. These are Per-statement triggers. They should be
> fired before/after the main plan is executed. The statement triggers are
> fired even no tuple is really matched with the actions.

Ok.

>> * This crashes:
>>
>> postgres=# CREATE TABLE target AS (SELECT 1 AS id);
>> SELECT 1
>> postgres=# MERGE into target t
>> USING (select 1 AS id) AS s
>>
>> ON t.id = s.id
>> WHEN MATCHED THEN
>>         UPDATE SET id = (SELECT COUNT(*) FROM generate_series(1,10))
>> ;
>> server closed the connection unexpectedly
>>         This probably means the server terminated abnormally
>>         before or while processing the request.
>> The connection to the server was lost. Attempting reset: Failed.
>
> Oh, a really serious bug. I have not considered this case (user series
> generator in actions) before. I will see what I can do for it.

It's not specific to generate_series() but subqueries in general that 
are the problem.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com
В списке pgsql-hackers по дате отправления:
Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: MERGE Specification