Re: TABLESAMPLE patch

Поиск
Список
Период
Сортировка
От Petr Jelinek
Тема Re: TABLESAMPLE patch
Дата
Msg-id 54C8B7B6.9020201@2ndquadrant.com
обсуждение исходный текст
Ответ на Re: TABLESAMPLE patch  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
Ответы Re: TABLESAMPLE patch  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
On 28/01/15 09:41, Kyotaro HORIGUCHI wrote:
>
> As an issue related to size esmation, I got a explain result as
> following,
>
> =# explain (analyze on, buffers on) select a from t1 tablesample system(10) where a < 50000;
>                                 QUERY PLAN
> --------------------------------------------------------------------------------
>   Sample Scan on t1  (cost=0.00..301.00 rows=10000 width=4) (actual time=0.015..2
> .920 rows=4294 loops=1)
>     Filter: (a < 50000)
>     Rows Removed by Filter: 5876
>     Buffers: shared hit=45
>
> actual rows is large as twice as the estimated. tsm_system_cost
> estimates the number of the result rows using baserel->tuples,
> not using baserel->rows so it doesn't reflect the scan quals. Did
> the code come from some other intention?
>

No, that's a bug.

>>>          4.
>>>          SampleNext()
>>>          a.
>>>          Isn't it better to code SampleNext() similar to SeqNext() and
>>>          IndexNext(), which encapsulate the logic to get the tuple in
>>>          another function(tbs_next() or something like that)?
>>>
>>>
>>>      Possibly, my thinking was that unlike the index_getnext() and
>>>      heap_getnext(), this function would not be called from any other
>>>      place so adding one more layer of abstraction didn't seem useful.
>>>      And it would mean creating new ScanDesc struct, etc.
>>>
>>>
>>> I think adding additional abstraction would simplify the function
>>> and reduce the chance of discrepency, I think we need to almost
>>> create a duplicate version of code for heapgetpage() method.
>>> Yeah, I agree that we need to build structure like ScanDesc, but
>>> still it will be worth to keep code consistent.
>>
>> Well similar, not same as we are not always fetching whole page or doing
>> visibility checks on all tuples in the page, etc. But I don't see why it
>> can't be inside nodeSamplescan. If you look at bitmap heap scan, that
>> one is also essentially somewhat modified sequential scan and does
>> everything within the node nodeBitmapHeapscan because the algorithm is
>> not used anywhere else, just like sample scan.
>
> As a general opinion, I'll vote for Amit's comment, since three
> or four similar instances seems to be a enough reason to abstract
> it. On the other hand, since the scan processes are distributed
> in ExecProcNode by the nodeTag of scan nodes, reunioning of the
> control by abstraction layer after that could be said to
> introducing useless annoyance. In short, bastraction at the level
> of *Next() would bring the necessity of additional changes around
> the execution node distribution.
>

Yes, that's my view too. I would generally be for that change also and 
it would be worth it if the code was used in more than one place, but as 
it is it seems like it will just add code/complexity for no real 
benefit. It would make sense in case we used sequential scan node 
instead of the new node as Amit also suggested, but I remain unconvinced 
that mixing sampling and sequential scan into single scan node would be 
a good idea.

>>>
>>> How will it get smoothen for cases when let us say
>>> more than 50% of tuples are not visible.  I think its
>>> due to Postgres non-overwritting storage architecture
>>> that we maintain multiple version of rows in same heap,
>>> otherwise for different kind of architecture (like mysql/oracle)
>>> where multiple row versions are not maintained in same
>>> heap, the same function (with same percentage) can return
>>> entirely different number of rows.
>>>
>>
>> I don't know how else to explain, if we loop over every tuple in the
>> relation and return it with equal probability then visibility checks
>> don't matter as the percentage of visible tuples will be same in the
>> result as in the relation.
>
> Surely it migh yield the effectively the same result. Even so,
> this code needs exaplation comment about the nature aroud the
> code, or write code as MMVC people won't get confused, I suppose.
>

Yes it does, but as I am failing to explain even here, it's not clear to 
me what to write there. From my point of view it's just effect of the 
essential property of bernoulli sampling which is that every element has 
equal probability of being included in the sample. It comes from the 
fact that we do bernoulli trial (in the code it's the while 
(sampler_random_fract(sampler->randstate) > probability) loop) on every 
individual tuple.

This means that the ratio of visible and invisible tuples should be same 
in the sample as it is in the relation. We then just skip the invisible 
tuples and get the correct percentage of the visible ones (this has 
performance benefit of not having to do visibility check on all tuples).

If that wasn't true than the bernoulli sampling would just simply not 
work as intended as the same property is the reason why it's used in 
statistics - the trends should look the same in sample as they look in 
the source population.

Obviously there is some variation in practice as we don't have perfect 
random generator but that's independent of the algorithm.


--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Etsuro Fujita
Дата:
Сообщение: Re: EvalPlanQual behaves oddly for FDW queries involving system columns
Следующее
От: Petr Jelinek
Дата:
Сообщение: Re: TABLESAMPLE patch