Re: TABLESAMPLE patch

Поиск
Список
Период
Сортировка
От Petr Jelinek
Тема Re: TABLESAMPLE patch
Дата
Msg-id 54FEC1EB.9070503@2ndquadrant.com
обсуждение исходный текст
Ответ на Re: TABLESAMPLE patch  (Amit Kapila <amit.kapila16@gmail.com>)
Ответы Re: TABLESAMPLE patch  (Petr Jelinek <petr@2ndquadrant.com>)
Список pgsql-hackers
On 10/03/15 10:54, Amit Kapila wrote:
> On Tue, Mar 10, 2015 at 3:03 PM, Petr Jelinek <petr@2ndquadrant.com
> <mailto:petr@2ndquadrant.com>> wrote:
>  >
>  > Ok now I think I finally understand what you are suggesting - you are
> saying let's go over whole page while tsmnexttuple returns something,
> and do the visibility check and other stuff in that code block under the
> buffer lock and cache the resulting valid tuples in some array and then
> return those tuples one by one from that cache?
>  >
>
> Yes, this is what I am suggesting.
>
>  >>  > And if the caller will try to do it in one step and cache the
>  >> visibility info then we'll end up with pretty much same structure as
>  >> rs_vistuples - there isn't saner way to cache this info other than
>  >> ordered vector of tuple offsets, unless we assume that most pages have
>  >> close to MaxOffsetNumber of tuples which they don't, so why not just use
>  >> the heapgetpage directly and do the binary search over rs_vistuples.
>  >>  >
>  >>
>  >> The downside of doing it via heapgetpage is that it will do
>  >> visibility test for tuples which we might not even need (I think
>  >> we should do visibility test for tuples retrurned by tsmnexttuple).
>  >>
>  >
>  > Well, heapgetpage can either read visibility data for whole page or
> not, depending on if we want pagemode reading or not. So we can use the
> pagemode for sampling methods where it's feasible (like system) and not
> use pagemode where it's not (like bernoulli) and then either use the
> rs_vistuples or call HeapTupleSatisfiesVisibility individually again
> depending if the method is using pagemode or not.
>  >
>
> Yeah, but as mentioned above, this has some downside, but go
> for it only if you feel that above suggestion is making code complex,
> which I think should not be the case as we are doing something similar
> in acquire_sample_rows().
>

I think your suggestion is actually simpler code wise, I am just 
somewhat worried by the fact that no other scan node uses that kind of 
caching and there is probably reason for that.


--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Kapila
Дата:
Сообщение: Re: TABLESAMPLE patch
Следующее
От: Kyotaro HORIGUCHI
Дата:
Сообщение: Re: Reduce pinning in btree indexes