Обсуждение: Scanning for insert

Поиск
Список
Период
Сортировка

Scanning for insert

От
James William Pye
Дата:
Greets,
[Ugh, sent one with uncompressed patch. Seems to be swallowed(No failure msg?).
 Ignore it if it surfaces.]

The attached patch extends pg_am by adding two methods, "scan for insert" and
"insert from scan". These methods are meant to provide an index user with the
ability to do conditional insertions based on the results of a conclusive
locking scan--a scan that not only finds a match, but resolves its actual
existence(what _bt_check_unique does), and makes any necessary locks to warrant
later insertion by a call to insertfromscan. (It should also be noted that
insertions are aborted by giving the insertfromscan method an InvalidItemPointer
instead of adding another method, ie "abortinsertscan".)

I'm sending this to hackers instead of patches, because it's not complete(I know
it's broken in the some of the new places), and, most importantly, I want to
know if this is actually the appropriate general direction to travel in. Not to
mention that I have probably missed something and I hope that someone will give
me my due whackings with The Clue Stick. :P

These new index interfaces are meant to provide the necessary functionality
for doing conditional index insertions that will likely aid in any
implementation of error logging, and other features(merge?). I have failed to
find much discussion on this, so if there is a good thread about, please point
me at it.

The most recent discussion of something along these lines was Martijn's thread
about providing non-btree indexes with uniqueness:
http://archives.postgresql.org/pgsql-hackers/2006-01/msg00541.php
(However, I doubt that this patch would actually help with deferred constraints
or giving GiST/hash uniqueness.)

This patch is mostly incremental, and doesn't make many alterations to existing
code for the time being; the most significant alteration was restructuring a bit
of _bt_check_unique(_bt_evaluate in this patch) to work for both scanforinsert
and doinsert.
(This is probably a good thing as this code is all pretty new to me.)

[Sorry about some of the whitespace changes in the patch, remnants from an
overzealous version.]

Any thoughts and directions would be appreciated.
--
Regards, James William Pye

Вложения

Re: Scanning for insert

От
Tom Lane
Дата:
James William Pye <pgsql@jwp.name> writes:
> The attached patch extends pg_am by adding two methods, "scan for insert" and
> "insert from scan". These methods are meant to provide an index user with the
> ability to do conditional insertions based on the results of a conclusive
> locking scan--a scan that not only finds a match, but resolves its actual
> existence(what _bt_check_unique does), and makes any necessary locks to warrant
> later insertion by a call to insertfromscan.

Is this really a good idea?  The fundamental problem I see with it is
that it takes away the index AM's ability to make any guarantees about
its locking behavior, ie, how long locks will be held or what other
operations might intervene while holding them.  It'll also require the
AM to save potentially large amounts of state between the two calls
(eg, an entire search path might be needed for GiST).  Currently any
such state can live on the stack in local variables, but that won't
work if it has to be remembered until a later AM call.  Lastly, what
happens if the caller loses control (due to elog) and never makes the
followup AM call?

> These new index interfaces are meant to provide the necessary functionality
> for doing conditional index insertions that will likely aid in any
> implementation of error logging, and other features(merge?).

If that's what you want, maybe a better answer is to simply allow
aminsert to return a uniqueness-conflict indication, rather than
raising the ereport for itself.
        regards, tom lane


Re: Scanning for insert

От
James William Pye
Дата:
On Mon, Feb 27, 2006 at 05:44:20PM -0500, Tom Lane wrote:
> Is this really a good idea?  The fundamental problem I see with it is
> that it takes away the index AM's ability to make any guarantees about
> its locking behavior, ie, how long locks will be held or what other
> operations might intervene while holding them.

Yeah, I saw that as well. :(

My only thoughts on that issue so far have been that the user must tread
carefully while holding these scans; it becomes the user's responsibility.
Weak at best, perhaps, so I can understand if that does not move you or
anyone else into thinking positively of these proposed interfaces. ;)

> It'll also require the
> AM to save potentially large amounts of state between the two calls
> (eg, an entire search path might be needed for GiST).  Currently any
> such state can live on the stack in local variables, but that won't
> work if it has to be remembered until a later AM call.

Hrm, certainly, implementing these methods for AMs that use such state
keeping techniques may be extra difficult. Massive changes may be necessary.
However, I don't imagine that making such a change would be impossible, nor
would it necessarily be required at all. Similar to uniqueness, the index need
not be forced to implement these new interfaces--surely not upon this patch's
introduction into the source(if/when). If a feature were to demand the use of scan
insertions on a specific index that does not provide the interfaces, it could
easily raise an informative exception about the shortcoming.

Hehe, perhaps a Truly Foolish Notion, but could we use siglongjmp for such
cases(state living on the stack)? Seems questionable and evil, tho; I know. ;)

> Lastly, what
> happens if the caller loses control (due to elog) and never makes the
> followup AM call?

I imagine/hope it would get cleaned up similar to how nbtree gets cleaned up at
the end of the transaction that was rolled back due to a unique constraint
violation. [I seem to recall having to do some special _bt_wrtbuf to get my
experimental insert unless patch working, so I guess that at eox some cleanup is
done w.r.t. to those locks and whatnot. (yeah, technical term, "whatnot" ;)]

Additionally, if the caller is very concerned with potential exceptions,
perhaps a PG_TRY() block should be exercised in those areas of worry.

I guess in most cases it simply comes back to becoming the scan's user's
responsibility to be sure to keep things kosher. :\


> If that's what you want, maybe a better answer is to simply allow
> aminsert to return a uniqueness-conflict indication, rather than
> raising the ereport for itself.

Maybe so. However, I guess I was thinking along lines that many times multiple
insert scans may need to be done before the final decision to actually do the
insertion is made(yeah, uh, the insert unless thing is what I've had in mind ;).

Also, I think part of this point is to be able to avoid the actual insertion
into heap, so as to avoid a superfluous heap_insert & heap_delete, and other
"unwinding code" if a uniqueness-conflict indication were made to a user that
needs to respond to such signals for the feature being implemented.
(The more unique constraint violations that occur, the more I/O that gets saved
with insert scans. I imagine this could be a very good thing for certain
applications.)
-- 
Regards, James William Pye