> I've been having a look at this and I'm wondering about a certain scenario:
>
> In tbm_add_tuples, if tbm_page_is_lossy() returns true for a given block, and on
> the next iteration of the loop we have the same block again, have you
> benchmarked any caching code to store if tbm_page_is_lossy() returned true for
> that block on the previous iteration of the loop? This would save from having to
> call tbm_page_is_lossy() again for the same block. Or are you just expecting
> that tbm_page_is_lossy() returns true so rarely that you'll end up caching the
> page most of the time, and gain on skipping both hash lookups on the next loop,
> since page will be set in this case?
I believe that if we fall in lossy pages then tidbitmap will not have a
significant impact on preformance because postgres will spend a lot of time on
tuple rechecking on page. If work_mem is to small to keep exact tidbitmap then
postgres will significantly slowdown. I implemented it, (v2.1 in attachs) but
I don't think that is an improvement, at least significant improvement.
>
> It would be nice to see a comment to explain why it might be a good idea to
> cache the page lookup. Perhaps something like:
>
added, see attachment (v2)
>
> I also wondered if there might be a small slowdown in the case where the index
> only finds 1 matching tuple. So I tried the following:
> avg.2372.4456 2381.909 99.6%
> med.2371.224 2359.494 100.5%
>
> It appears that if it does, then it's not very much.
I believe, that's unmeasurable because standard deviation of your tests is about
2% what is greater that difference between pathed and master versions.
--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/