On Tue, 2006-12-05 at 11:45 -0500, Tom Lane wrote:
> "Heikki Linnakangas" <heikki@enterprisedb.com> writes:
> > Florian G. Pflug wrote:
> >> I don't see why a single process wouldn't be reading sequentially - As far
> >> as I understood the original proposal, the current blocknumber from the
> >> hashtable is only used as a starting point for sequential scans. After
> >> that,
> >> each backend reads sequentiall until the end of the table I believe, no?
>
> > When the read is satisfies from shared mem cache, it won't make it to
> > the kernel.
>
> Right, and the *whole point* of this proposal is that only one of the N
> processes doing a synchronized scan actually does a read of any
> particular block. The problem is that they're not synchronized well
> enough to ensure that it's always the same one.
If readahead is per-process (rather than for the entire system), my
implementation would probably fall short. I would like to try to find
out for sure whether this is the case, or not, or whether it's system-
dependent.
> It strikes me that there's still another thing we'd have to deal with
> to make this work nicely. If you have N processes doing a synchronized
> scan, then each block that reaches shared memory is going to be hit N
> times in fairly short succession --- which is going to be enough to
> convince the bufmgr to keep it in memory for awhile. Thus a
> synchronized seqscan is likely to end up flushing buffer cache in a way
> that independent seqscans could not.
Interesting. That may be an important consideration. If a bunch of
backends read the block though, I don't see it as a major loss if it
hangs around to see if one more backend needs it.
> This could probably be dealt with by changing the ReadBuffer API to
> allow the caller to say "don't increment the refcount on this page",
> or some such. But it's some more work to be done if we intend to
> take this idea seriously.
>
Thank you for the input.
Regards,Jeff Davis