Re: refresh materialized view concurrently
От | Kevin Grittner |
---|---|
Тема | Re: refresh materialized view concurrently |
Дата | |
Msg-id | 1372235913.24059.YahooMailNeo@web162901.mail.bf1.yahoo.com обсуждение исходный текст |
Ответ на | Re: refresh materialized view concurrently (Hitoshi Harada <umi.tanuki@gmail.com>) |
Ответы |
Re: refresh materialized view concurrently
|
Список | pgsql-hackers |
Hitoshi Harada <umi.tanuki@gmail.com> wrote: > I spent a few hours to review the patch. Thanks! > As far as I can tell, the overall approach is as follows. > > - create a new temp heap as non-concurrent does, but with > ExclusiveLock on the matview, so that reader wouldn't be blocked Non-concurrent creates the heap in the matview's tablespace and namespace, so the "temp" part is different in concurrent generation. This difference is why concurrent can be faster when few rows change. Also, before the next step there is an ANALYZE of the temp table, so the planner can make good choices in the next step. > - with this temp table and the matview, query FULL JOIN and > extract difference between original matview and temp heap (via SPI) Right; into another temp table. > - this operation requires unique index for performance reason (or > correctness reason too) It is primarily for correctness in the face of duplicate rows which have no nulls. Do you think the reasons need to be better documented with comments? > - run ANALYZE on this diff table > > - run UPDATE, INSERT and DELETE via SPI, to do the merge > > - close these temp heap Right, then drop them. > Assuming I'm asking something wrong and going for the current > approach, here's what I've found in the patch. > - I'm not sure if unique key index requirement is reasonable or > not, because users only add CONCURRENTLY option and not merging > or incremental update. The patch would need to be about an order of magnitude more complex without that requirement due to the problems handling duplicate rows. This patch uses set logic, and set logic falls down badly in the face of duplicate rows. Consider how you would try to make this technique work if the "old" data has 3 versions of a row and the "new" data in the temp table has 4 versions of that same row. You can play around with that by modifying the examples of the logic using regular tables I included with the first version of the patch. A UNIQUE index is the only way to prevent duplicate rows. The fact that there can be duplicates with NULL in one or more of the columns which are part of a duplicate index is not a fatal problem; it just means that those cases much be handled through DELETE and re-INSERT of the rows containing NULL in any column which is part of a duplicate index key. > - As others pointed out, quoteOneName can be replaced with > quote_identifier OK, I did it that way. The quote_identifier and quote_qualified_identifier functions seem rather clumsy for the case where you already know you have an identifier (because you are dealing with an open Relation). I think we should have different functions for that case, but that should probably not be material for this patch, so changed to use the existing tools. > - This looks harmless, but I wonder if you need to change relkind. > > *** 665,672 **** make_new_heap(Oid OIDOldHeap, Oid NewTableSpace) > OldHeap->rd_rel->relowner, > OldHeapDesc, > NIL, > ! OldHeap->rd_rel->relkind, > ! OldHeap->rd_rel->relpersistence, > false, > RelationIsMapped(OldHeap), > true, > --- 680,687 ---- > OldHeap->rd_rel->relowner, > OldHeapDesc, > NIL, > ! RELKIND_RELATION, > ! relpersistence, > false, > RelationIsMapped(OldHeap), > true, > > > Since OldHeap->rd_rel->relkind has been working with 'm', too, > not only 'r'? No, the relation created by this is not going to be around when we're done; we're either going to move its heap onto the existing matview or drop the temp table. Either way, it's OK for it to be a table until we do. I don't see the benefit of complicating the code to do otherwise. > - I found two additional parameters on make_new_heap ugly, but > couldn't come up with better solution. Maybe we can pass > Relation of old heap to the function instead of Oid.. This was the cleanest way I could see. Opening the relation elsewhere and passing it in would not only be uglier IMO, it would do nothing to tell the function that it should create a temporary table. I also modified the confusing error message to something close to the suggestion from Robert. What to do about the Assert that the matview is not a system relation seems like material for a separate patch. After review, I'm inclined to remove the test altogether, so that extensions can create matviews in pg_catalog. New version attached. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Вложения
В списке pgsql-hackers по дате отправления:
Предыдущее
От: Szymon GuzДата:
Сообщение: Re: [PATCH] Fix conversion for Decimal arguments in plpython functions
Следующее
От: Pavel StehuleДата:
Сообщение: Re: FILTER for aggregates [was Re: Department of Redundancy Department: makeNode(FuncCall) division]