Re: idea for concurrent seqscans
| От | Jeff Davis | 
|---|---|
| Тема | Re: idea for concurrent seqscans | 
| Дата | |
| Msg-id | 1109356015.4089.164.camel@jeff обсуждение исходный текст | 
| Ответ на | Re: idea for concurrent seqscans (Tom Lane <tgl@sss.pgh.pa.us>) | 
| Ответы | Re: idea for concurrent seqscans | 
| Список | pgsql-hackers | 
On Fri, 2005-02-25 at 12:54 -0500, Tom Lane wrote: > Jeff Davis <jdavis-pgsql@empires.org> writes: > > (1) Do we care about reverse scans being done with synchronized > > scanning? If so, is there a good way to know in advance whether it is > > going to be a forward or reverse scan (i.e. before heap_getnext())? > > There are no reverse heapscans --- the only case where you'll see > direction = backwards is while backing up a cursor with FETCH BACKWARD. > I don't think you need to optimize that case. > Ok, I was wondering about that. > What I'm more concerned about is your use of shared memory. I didn't > have time to look at the patch, but how are you determining an upper > bound on the amount of memory you need? What sort of locking and > contention issues will there be? Right now a scanning backend puts the page it's scanning into shared memory when it gets a new page (so it's not every tuple). I haven't determined whether this will be a major point of locking contention. However, one possible implementation seems to solve both problems at once: Let's say we just had a static hash table of size 100*sizeof(oid)*sizeof(blocknumber) (to hold the relation's oid and the page number it's currently scanning). The relid would predetermine the placement in the table. If there's a collision, overwrite. I don't think much is lost in that case, unless, for example, two tables in an important join have oids that hash to the same value. In that case the effectiveness of synchronized scanning will be lost, but not worse than the current behavior. Let's say we didn't use any locks at all. Are there any real dangers there? If there's a race, and one backend gets some garbage data, it can just say "this is out of bounds, start the scan at 0". Since it's a static hash table, we don't have to worry about following a bad pointer, etc. If that looks like it will be a problem, I can test with locking also to see what kind of contention there is. The current patch I sent was very much a proof of concept, but all it did was have a shared mem segment of size 8 bytes (only holds info for one relid at a time). That would probably be somewhat effective in many cases, but of course we want it larger than that (800? 8KB?). In short, I tried to overcome these problems with simplicity. Where simplicity doesn't work I default to starting the scan at 0. Hopefully those non-simple cases (like hash collisions and shared memory races) are rare enough that we don't lose all that we gain. > Another point is that this will render the results from heapscans > unstable, since different executions of the same query might start > at different points. This would for example probably break many > of the regression tests. We can deal with that if we have to, but > it raises the bar of how much benefit I'd want to see from the patch. > I didn't consider that. Is there a reason the regression tests assume the results will be returned in a certain order (or a consistent order)? > One detail that might or might not be significant: different scans are > very likely to have slightly different ideas about where the end of the > table is, since they determine this with an lseek(SEEK_END) at the > instant they start the scan. I don't think this invalidates your idea > but you need to watch out for corner-case bugs in the coding. > I only see that as an issue in initscan(), where it sets the start page. A simple bounds check would cure that, no? If it was out of bounds, set the start page to zero, and we didn't lose much. I need a bounds check there anyway, since the data we get from shared memory needs to be validated. That bounds check would be comparing against the current backend's scan->rs_nblocks, which should be the correct number for that backend. Regards, Jeff Davis
В списке pgsql-hackers по дате отправления: