Re: TABLESAMPLE patch

Поиск
Список
Период
Сортировка
От Petr Jelinek
Тема Re: TABLESAMPLE patch
Дата
Msg-id 54BAB517.40308@2ndquadrant.com
обсуждение исходный текст
Ответ на Re: TABLESAMPLE patch  (Amit Kapila <amit.kapila16@gmail.com>)
Ответы Re: TABLESAMPLE patch  (Amit Kapila <amit.kapila16@gmail.com>)
Список pgsql-hackers
On 17/01/15 13:46, Amit Kapila wrote:
> On Sun, Jan 11, 2015 at 1:29 AM, Petr Jelinek <petr@2ndquadrant.com
> <mailto:petr@2ndquadrant.com>> wrote:
>  >
>  >
>  > In second patch which implements the TABLESAMPLE itself I changed the
> implementation of random generator because when I looked at the code
> again I realized the old one would produce wrong results if there were
> multiple TABLESAMPLE statements in same query or multiple cursors in
> same transaction.
>  >
>
> I have looked into this patch and would like to share my
> findings with you.

That's a lot for this.

>
> 1.
> +static void
> +set_tablesample_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
> RangeTblEntry *rte)
> {
> ..
> +/*
> +* There is only one plan to consider but we still need to set
> +* parameters for RelOptInfo.
> +*/
> +set_cheapest(rel);
> }
>
> It seems we already call set_cheapest(rel) in set_rel_pathlist()
> which is a caller of set_tablesample_rel_pathlist(), so why do
> we need it inside set_tablesample_rel_pathlist()?

Ah, this changed after I started working on this patch and I didn't 
notice - previously all the set_<something>_rel_pathlist called 
set_cheapest() individually. I will change the code.

>
> 2.
> +static void
> +set_tablesample_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
> RangeTblEntry *rte)
> {
> ..
> +/* We only do sample scan if it was requested */
> +add_path(rel, (Path *) create_samplescan_path(root, rel, required_outer));
> }
>
> Do we need to add_path, if there is only one path?

Good point, we can probably just set the pathlist directly in this case.

>
> 3.
> @@ -332,6 +334,11 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
> /* Foreign table */
> set_foreign_pathlist(root, rel, rte);
> }
> +else if (rte->tablesample != NULL)
> +{
> +/* Build sample scan on relation */
> +set_tablesample_rel_pathlist(root, rel, rte);
> +}
>
> Have you consider to have a different RTE for this?
>

I assume you mean different RTEKind, yes I did, but the fact is that 
it's still a relation, just with different scan type and I didn't want 
to modify every piece of code which deals with RTE_RELATION to also deal 
with the new RTE for sample scan as it seems unnecessary. I am not 
strongly opinionated about this one though.

> 4.
> SampleNext()
> a.
> Isn't it better to code SampleNext() similar to SeqNext() and
> IndexNext(), which encapsulate the logic to get the tuple in
> another function(tbs_next() or something like that)?

Possibly, my thinking was that unlike the index_getnext() and 
heap_getnext(), this function would not be called from any other place 
so adding one more layer of abstraction didn't seem useful. And it would 
mean creating new ScanDesc struct, etc.

>
> b.
> Also don't we want to handle pruning of page while
> scanning (heap_page_prune_opt()) and I observed
> in heap scan API's after visibility check we do check
> for serializable conflict (CheckForSerializableConflictOut()).

Both good points, will add.

> Another thing is don't we want to do anything for sync scans
> for these method's as they are doing heap scan?

I don't follow this one tbh.

>
> c.
> for bernoulli method, it will first get the tupoffset with
> the help of function and then apply visibility check, it seems
> that could reduce the sample set way lower than expected
> in case lot of tuples are not visible, shouldn't it be done in reverse
> way(first visibility check, then call function to see if we want to
> include it in the sample)?
> I think something similar is done in acquire_sample_rows which
> seems right to me.

I don't think so, the way bernoulli works is that it returns every tuple 
with equal probability, so the visible tuples have same chance of being 
returned as the invisible ones so the issue should be smoothed away 
automatically (IMHO).

The acquire_sample_rows has limit on number of rows it returns so that's 
why it has to do the visibility check before as the problem you 
described applies there.

The reason for using the probability instead of tuple limit is the fact 
that there is no way to accurately guess number of tuples in the table 
at the beginning of scan. This method should actually be better at 
returning the correct number of tuples without dependence on how many of 
them are visible or not and how much space in blocks is used.

>
> 5.
>
> CREATE TABLE test_tablesample (id INT, name text) WITH (fillfactor=10);
> INSERT INTO test_tablesample SELECT i, repeat(i::text, 200) FROM
> generate_series(0, 9) s(i) ORDER BY i;
>
> postgres=# SELECT id FROM test_tablesample TABLESAMPLE BERNOULLI (80);
>   id
> ----
>    1
>    3
>    4
>    7
>    8
>    9
> (6 rows)
>
>
> postgres=# SELECT id FROM test_tablesample TABLESAMPLE BERNOULLI (80);
>   id
> ----
>    0
>    1
>    2
>    3
>    4
>    5
>    6
>    7
>    8
>    9
> (10 rows)
>
> So above test yield 60% rows first time and 100% rows next time,
> when the test has requested 80%.
> I understand that sample percentage will result an approximate
> number of rows, however I just wanted that we should check if the
> implementation has any problem or not?

I think this is caused by random generator not producing smooth enough 
random distribution on such a small sample (or possibly in general?).

>
> In this regard, I have one question related to below code:
>
> +Datum
> +tsm_bernoulli_nexttuple(PG_FUNCTION_ARGS)
> +{
> ..
> +/* Every tuple has percent chance of being returned */
> +while (sampler_random_fract(sampler->randstate) > samplesize)
> +{
> +tupoffset++;
> +
> +if (tupoffset > maxoffset)
> +break;
> +}
>
> Why are we not considering tuple in above code
> if tupoffset is less than maxoffset?
>

We consider it, I will rename the samplesize to probability or something 
to make it more clear what it actually is and maybe expand the comment 
above the loop.

What the loop does is that it basically considers each offset on a page 
and does "coin flip" - generates random number using 
sampler_random_fract and if the random number falls within the 
probability (= is smaller or equal to samplesize) it will be returned, the
if (tupoffset > maxoffset)    break;
is there just because we need to give control back to scan so it can 
move to another block.

> 6.
> One larger question about the approach used in patch, why do you
> think it is better to have a new node (SampleScan/SamplePath) like
> you have used in patch instead of doing it as part of Sequence Scan.
> I agree that we need some changes in different parts of Sequence Scan
> execution, but still sounds like a viable way.  One simple thing that
> occurs to me is that in ExecSeqScan(), we can use something like
> SampleSeqNext instead of SeqNext to scan heap in a slightly different
> way, probably doing it as part of sequence scan will have advantage that
> we can use most of the existing infrastructure (sequence node path)
> and have less discrepancies as mentioned in point-4.
>

I originally started from SeqScan but well, it requires completely 
different State structure, it needs to call sampling methods in various 
places (not just for next tuple), it needs different handling in explain 
and in optimizer (costing). If we add all this to sequential scan it 
will not be that much different from new scan node (we'd save the couple 
of new copy functions and one struct, but I'd rather have clearly 
distinct scan).

It would also not help with the discrepancies you mentioned because 
those are in heapam and SampleScan would not use that even if it was 
merged with SeqScan - I don't think we want to implement the sampling on 
heapam level, it's too much of a hotspot to be good idea to add 
additional cycles there.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Pavel Stehule
Дата:
Сообщение: Re: proposal: searching in array function - array_position
Следующее
От: Pavel Stehule
Дата:
Сообщение: Re: proposal: searching in array function - array_position