Re: TABLESAMPLE patch
От | Petr Jelinek |
---|---|
Тема | Re: TABLESAMPLE patch |
Дата | |
Msg-id | 54C8B7B6.9020201@2ndquadrant.com обсуждение исходный текст |
Ответ на | Re: TABLESAMPLE patch (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>) |
Ответы |
Re: TABLESAMPLE patch
(Robert Haas <robertmhaas@gmail.com>)
|
Список | pgsql-hackers |
On 28/01/15 09:41, Kyotaro HORIGUCHI wrote: > > As an issue related to size esmation, I got a explain result as > following, > > =# explain (analyze on, buffers on) select a from t1 tablesample system(10) where a < 50000; > QUERY PLAN > -------------------------------------------------------------------------------- > Sample Scan on t1 (cost=0.00..301.00 rows=10000 width=4) (actual time=0.015..2 > .920 rows=4294 loops=1) > Filter: (a < 50000) > Rows Removed by Filter: 5876 > Buffers: shared hit=45 > > actual rows is large as twice as the estimated. tsm_system_cost > estimates the number of the result rows using baserel->tuples, > not using baserel->rows so it doesn't reflect the scan quals. Did > the code come from some other intention? > No, that's a bug. >>> 4. >>> SampleNext() >>> a. >>> Isn't it better to code SampleNext() similar to SeqNext() and >>> IndexNext(), which encapsulate the logic to get the tuple in >>> another function(tbs_next() or something like that)? >>> >>> >>> Possibly, my thinking was that unlike the index_getnext() and >>> heap_getnext(), this function would not be called from any other >>> place so adding one more layer of abstraction didn't seem useful. >>> And it would mean creating new ScanDesc struct, etc. >>> >>> >>> I think adding additional abstraction would simplify the function >>> and reduce the chance of discrepency, I think we need to almost >>> create a duplicate version of code for heapgetpage() method. >>> Yeah, I agree that we need to build structure like ScanDesc, but >>> still it will be worth to keep code consistent. >> >> Well similar, not same as we are not always fetching whole page or doing >> visibility checks on all tuples in the page, etc. But I don't see why it >> can't be inside nodeSamplescan. If you look at bitmap heap scan, that >> one is also essentially somewhat modified sequential scan and does >> everything within the node nodeBitmapHeapscan because the algorithm is >> not used anywhere else, just like sample scan. > > As a general opinion, I'll vote for Amit's comment, since three > or four similar instances seems to be a enough reason to abstract > it. On the other hand, since the scan processes are distributed > in ExecProcNode by the nodeTag of scan nodes, reunioning of the > control by abstraction layer after that could be said to > introducing useless annoyance. In short, bastraction at the level > of *Next() would bring the necessity of additional changes around > the execution node distribution. > Yes, that's my view too. I would generally be for that change also and it would be worth it if the code was used in more than one place, but as it is it seems like it will just add code/complexity for no real benefit. It would make sense in case we used sequential scan node instead of the new node as Amit also suggested, but I remain unconvinced that mixing sampling and sequential scan into single scan node would be a good idea. >>> >>> How will it get smoothen for cases when let us say >>> more than 50% of tuples are not visible. I think its >>> due to Postgres non-overwritting storage architecture >>> that we maintain multiple version of rows in same heap, >>> otherwise for different kind of architecture (like mysql/oracle) >>> where multiple row versions are not maintained in same >>> heap, the same function (with same percentage) can return >>> entirely different number of rows. >>> >> >> I don't know how else to explain, if we loop over every tuple in the >> relation and return it with equal probability then visibility checks >> don't matter as the percentage of visible tuples will be same in the >> result as in the relation. > > Surely it migh yield the effectively the same result. Even so, > this code needs exaplation comment about the nature aroud the > code, or write code as MMVC people won't get confused, I suppose. > Yes it does, but as I am failing to explain even here, it's not clear to me what to write there. From my point of view it's just effect of the essential property of bernoulli sampling which is that every element has equal probability of being included in the sample. It comes from the fact that we do bernoulli trial (in the code it's the while (sampler_random_fract(sampler->randstate) > probability) loop) on every individual tuple. This means that the ratio of visible and invisible tuples should be same in the sample as it is in the relation. We then just skip the invisible tuples and get the correct percentage of the visible ones (this has performance benefit of not having to do visibility check on all tuples). If that wasn't true than the bernoulli sampling would just simply not work as intended as the same property is the reason why it's used in statistics - the trends should look the same in sample as they look in the source population. Obviously there is some variation in practice as we don't have perfect random generator but that's independent of the algorithm. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
В списке pgsql-hackers по дате отправления:
Предыдущее
От: Etsuro FujitaДата:
Сообщение: Re: EvalPlanQual behaves oddly for FDW queries involving system columns