Re: INSERT ... ON CONFLICT UPDATE and logical decoding
От | Peter Geoghegan |
---|---|
Тема | Re: INSERT ... ON CONFLICT UPDATE and logical decoding |
Дата | |
Msg-id | CAM3SWZT7sSR26+XFLHhaso6KqRTOUWthOBVJxA517X1ioY+WgA@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: INSERT ... ON CONFLICT UPDATE and logical decoding (Andres Freund <andres@2ndquadrant.com>) |
Список | pgsql-hackers |
On Tue, Mar 3, 2015 at 3:13 AM, Andres Freund <andres@2ndquadrant.com> wrote: >> toast_save_datum() is called with a heap_insert() call before heap >> insertion for the tuple proper. We're relying on the assumption that >> if there is no immediate super deletion record, things are fine. We >> cannot speculatively insert into catalogs, BTW. > > I fail to see what prohibits e.g. another catalog modifying transaction > to commit inbetween and distribute a new snapshot. SnapBuildDistributeNewCatalogSnapshot()/REORDER_BUFFER_CHANGE_INTERNAL_SNAPSHOT does look like a problematic case. It's problematic specifically because it involves some xact queuing a change to *some other* xact - we cannot assume that this won't happen between WAL-logging within heap_insert() and heap_delete(). Can you think of any other such cases? I think that REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID should be fine. REORDER_BUFFER_CHANGE_INTERNAL_TUPLECID is not an issue either. In both cases I believe my assumption does not break because there won't be writes to system catalogs between the relevant heap_insert() and heap_delete() calls, which are tightly coupled, and also because speculative insertion into catalogs is unsupported. That just leaves the non-"*CHANGE_INTERAL_* "(regular DML) cases, which should also be fine. As for SnapBuildDistributeNewCatalogSnapshot(), I'm open to suggestions. Do you see any opportunity around making assumptions about heavyweight locking making the interleaving of some REORDER_BUFFER_CHANGE_INTERNAL_SNAPSHOT change between a REORDER_BUFFER_CHANGE_INTERNAL_INSERT change and a REORDER_BUFFER_CHANGE_INTERNAL_DELETE change? AFAICT, that's all this approach really needs to worry about (or the interleaving of something else not under the control of speculative insertion - doesn't have to be a REORDER_BUFFER_CHANGE_INTERNAL_SNAPSHOT, that's just the most obvious problematic case). >> Why should we see a REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID case >> next, in the case that we need to care about (when the tuple was super >> deleted immediately afterwards)? > > It's irrelevant whether you care about it or not. Your > ReorderBufferIterTXNNext() consumes a change that needs to actually be > processed. It could very well be something important like a new > snapshot. But it is actually processed, in the next iteration (we avoid calling ReorderBufferIterTXNNext() at the top of the loop if it has already been called for that iteration as part of peeking ahead). AFAICT all that is at issue is the safety of one particular assumption I've made: that it is safe to assume that there will never be a super deletion in the event of not seeing a super deletion change immediately following a speculative insertion change within some xact when decoding. It's still going to be processed if it's something else. The implementation will, however, fail to consolidate the speculative insertion change and super deletion change if my assumption is ever faulty. This whole approach doesn't seem to be all that different to how there is currently coordination within ReorderBufferCommit() between TOAST-related changes, and changes to the relation proper. In fact, I've now duplicated the way the IsToastRelation() case performs "dlist_delete(&change->node)" in order to avoid chunk reuse in the event of spilling to disk. Stress testing by decreasing "max_changes_in_memory" to 1 does not exhibit broken behavior, I believe (although that does break the test_decoding regression tests on the master branch, FWIW). Also, obviously I have not considered the SnapBuildDistributeNewCatalogSnapshot() case too closely. I attach an updated patch that I believe at least handles the serialization aspects correctly, FYI. Note that I assert that REORDER_BUFFER_CHANGE_INTERNAL_DELETE follows a REORDER_BUFFER_CHANGE_INTERNAL_INSERT, so if you can find a way to break the assumption it should cause an assertion failure, which is something to look out for during testing. >> While what I have here *is* very ugly, I see no better alternative. By >> doing what you suggest, we'd be finding a special case way of safely >> violating the general "WAL log-before-data" rule. > > No, it won't. We don't WAL log modifications that don't need to persist > in a bunch of places. It'd be perfectly fine to start upsert with a > 'acquiring a insertion lock' record that pretty much only contains the > item pointer (and a FPI if necessary to prevent torn pages). And then > only log the full new record once it's sure that the whole thing needs > to survive. Redo than can simply clean out the size touched by the > insertion lock. That seems like a lot of work in the general case to not do something that will only very rarely need to occur anyway. The optimistic approach of value locking scheme #2 has a small race that is detected (which necessitates super deletion), but that will only very rarely be required. Also, there is value in making super deletion (and speculative insertion) as close as possible to regular deletion and insertion. -- Peter Geoghegan
Вложения
В списке pgsql-hackers по дате отправления:
Предыдущее
От: Kouhei KaigaiДата:
Сообщение: Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)