Обсуждение: Scanning for insert
Greets, [Ugh, sent one with uncompressed patch. Seems to be swallowed(No failure msg?). Ignore it if it surfaces.] The attached patch extends pg_am by adding two methods, "scan for insert" and "insert from scan". These methods are meant to provide an index user with the ability to do conditional insertions based on the results of a conclusive locking scan--a scan that not only finds a match, but resolves its actual existence(what _bt_check_unique does), and makes any necessary locks to warrant later insertion by a call to insertfromscan. (It should also be noted that insertions are aborted by giving the insertfromscan method an InvalidItemPointer instead of adding another method, ie "abortinsertscan".) I'm sending this to hackers instead of patches, because it's not complete(I know it's broken in the some of the new places), and, most importantly, I want to know if this is actually the appropriate general direction to travel in. Not to mention that I have probably missed something and I hope that someone will give me my due whackings with The Clue Stick. :P These new index interfaces are meant to provide the necessary functionality for doing conditional index insertions that will likely aid in any implementation of error logging, and other features(merge?). I have failed to find much discussion on this, so if there is a good thread about, please point me at it. The most recent discussion of something along these lines was Martijn's thread about providing non-btree indexes with uniqueness: http://archives.postgresql.org/pgsql-hackers/2006-01/msg00541.php (However, I doubt that this patch would actually help with deferred constraints or giving GiST/hash uniqueness.) This patch is mostly incremental, and doesn't make many alterations to existing code for the time being; the most significant alteration was restructuring a bit of _bt_check_unique(_bt_evaluate in this patch) to work for both scanforinsert and doinsert. (This is probably a good thing as this code is all pretty new to me.) [Sorry about some of the whitespace changes in the patch, remnants from an overzealous version.] Any thoughts and directions would be appreciated. -- Regards, James William Pye
Вложения
James William Pye <pgsql@jwp.name> writes: > The attached patch extends pg_am by adding two methods, "scan for insert" and > "insert from scan". These methods are meant to provide an index user with the > ability to do conditional insertions based on the results of a conclusive > locking scan--a scan that not only finds a match, but resolves its actual > existence(what _bt_check_unique does), and makes any necessary locks to warrant > later insertion by a call to insertfromscan. Is this really a good idea? The fundamental problem I see with it is that it takes away the index AM's ability to make any guarantees about its locking behavior, ie, how long locks will be held or what other operations might intervene while holding them. It'll also require the AM to save potentially large amounts of state between the two calls (eg, an entire search path might be needed for GiST). Currently any such state can live on the stack in local variables, but that won't work if it has to be remembered until a later AM call. Lastly, what happens if the caller loses control (due to elog) and never makes the followup AM call? > These new index interfaces are meant to provide the necessary functionality > for doing conditional index insertions that will likely aid in any > implementation of error logging, and other features(merge?). If that's what you want, maybe a better answer is to simply allow aminsert to return a uniqueness-conflict indication, rather than raising the ereport for itself. regards, tom lane
On Mon, Feb 27, 2006 at 05:44:20PM -0500, Tom Lane wrote: > Is this really a good idea? The fundamental problem I see with it is > that it takes away the index AM's ability to make any guarantees about > its locking behavior, ie, how long locks will be held or what other > operations might intervene while holding them. Yeah, I saw that as well. :( My only thoughts on that issue so far have been that the user must tread carefully while holding these scans; it becomes the user's responsibility. Weak at best, perhaps, so I can understand if that does not move you or anyone else into thinking positively of these proposed interfaces. ;) > It'll also require the > AM to save potentially large amounts of state between the two calls > (eg, an entire search path might be needed for GiST). Currently any > such state can live on the stack in local variables, but that won't > work if it has to be remembered until a later AM call. Hrm, certainly, implementing these methods for AMs that use such state keeping techniques may be extra difficult. Massive changes may be necessary. However, I don't imagine that making such a change would be impossible, nor would it necessarily be required at all. Similar to uniqueness, the index need not be forced to implement these new interfaces--surely not upon this patch's introduction into the source(if/when). If a feature were to demand the use of scan insertions on a specific index that does not provide the interfaces, it could easily raise an informative exception about the shortcoming. Hehe, perhaps a Truly Foolish Notion, but could we use siglongjmp for such cases(state living on the stack)? Seems questionable and evil, tho; I know. ;) > Lastly, what > happens if the caller loses control (due to elog) and never makes the > followup AM call? I imagine/hope it would get cleaned up similar to how nbtree gets cleaned up at the end of the transaction that was rolled back due to a unique constraint violation. [I seem to recall having to do some special _bt_wrtbuf to get my experimental insert unless patch working, so I guess that at eox some cleanup is done w.r.t. to those locks and whatnot. (yeah, technical term, "whatnot" ;)] Additionally, if the caller is very concerned with potential exceptions, perhaps a PG_TRY() block should be exercised in those areas of worry. I guess in most cases it simply comes back to becoming the scan's user's responsibility to be sure to keep things kosher. :\ > If that's what you want, maybe a better answer is to simply allow > aminsert to return a uniqueness-conflict indication, rather than > raising the ereport for itself. Maybe so. However, I guess I was thinking along lines that many times multiple insert scans may need to be done before the final decision to actually do the insertion is made(yeah, uh, the insert unless thing is what I've had in mind ;). Also, I think part of this point is to be able to avoid the actual insertion into heap, so as to avoid a superfluous heap_insert & heap_delete, and other "unwinding code" if a uniqueness-conflict indication were made to a user that needs to respond to such signals for the feature being implemented. (The more unique constraint violations that occur, the more I/O that gets saved with insert scans. I imagine this could be a very good thing for certain applications.) -- Regards, James William Pye