Re: TABLESAMPLE patch

Поиск
Список
Период
Сортировка
От Petr Jelinek
Тема Re: TABLESAMPLE patch
Дата
Msg-id 54CD2823.6080803@2ndquadrant.com
обсуждение исходный текст
Ответ на Re: TABLESAMPLE patch  (Amit Kapila <amit.kapila16@gmail.com>)
Ответы Re: TABLESAMPLE patch  (Petr Jelinek <petr@2ndquadrant.com>)
Список pgsql-hackers
On 31/01/15 14:27, Amit Kapila wrote:
> On Fri, Jan 23, 2015 at 5:39 AM, Petr Jelinek <petr@2ndquadrant.com
> <mailto:petr@2ndquadrant.com>> wrote:
>
>     On 19/01/15 07:08, Amit Kapila wrote:
>
>         On Sun, Jan 18, 2015 at 12:46 AM, Petr Jelinek
>         <petr@2ndquadrant.com <mailto:petr@2ndquadrant.com>
>         <mailto:petr@2ndquadrant.com <mailto:petr@2ndquadrant.com>>> wrote:
>
>
>     I think that's actually good to have, because we still do costing
>     and the partial index might help produce better estimate of number
>     of returned rows for tablesample as well.
>
>
> I don't understand how will it help, because for tablesample scan
> it doesn't consider partial index at all as per patch.
>

Oh I think we were talking abut different things then, I thought you 
were talking about the index checks that planner/optimizer sometimes 
does to get more accurate statistics. I'll take another look then.

>
>     Well similar, not same as we are not always fetching whole page or
>     doing visibility checks on all tuples in the page, etc. But I don't
>     see why it can't be inside nodeSamplescan. If you look at bitmap
>     heap scan, that one is also essentially somewhat modified sequential
>     scan and does everything within the node nodeBitmapHeapscan because
>     the algorithm is not used anywhere else, just like sample scan.
>
>
> I don't mind doing everything in nodeSamplescan, however if
> you can split the function, it will be easier to read and understand,
> if you see in nodeBitmapHeapscan, that also has function like
> bitgetpage().
> This is just a suggestion and if you think that it can be splitted,
> then it's okay, otherwise leave it as it is.

Yeah I can split it to separate function within the nodeSamplescan, that 
sounds reasonable.

>
>
>         Refer parameter (HeapScanDesc->rs_syncscan) and syncscan.c.
>         Basically during sequiantial scan on same table by different
>         backends, we attempt to keep them synchronized to reduce the I/O.
>
>
>     Ah this, yes, it makes sense for bernoulli, not for system though. I
>     guess it should be used for sampling methods that use SAS_SEQUENTIAL
>     strategy.
>
>
> Have you taken care of this in your latest patch?

Not yet. I think I will need to make the strategy property of the 
sampling method instead of returning it by costing function so that the 
info can be used by the scan.

>
>     Oh and BTW when I delete 50k of tuples (make them invisible) the
>     results of 20 runs are between 30880 and 40063 rows.
>
>
> This is between 60% to 80%, lower than what is expected,
> but I guess we can't do much for this except for trying with
> reverse order for visibility test and sample tuple call,
> you can decide if you want to try that once just to see if that
> is better.
>

No, that's because I can't write properly, the lower number was supposed 
to be 39880 which is well within the tolerance, sorry for the confusion  (9 and 0 are just too close).

>     Anyway, attached is new version with some updates that you mentioned
>     (all_visible, etc).
>     I also added optional interface for the sampling method to access
>     the tuple contents as I can imagine sampling methods that will want
>     to do that.
>
> +/*
> +* Let the sampling method examine the actual tuple and decide if we
> +* should return it.
> +*
> +* Note that we let it examine even invisible tuples.
> +*/
> +if (OidIsValid(node->tsmreturntuple.fn_oid))
> +{
> +found = DatumGetBool(FunctionCall4(&node->tsmreturntuple,
> +  PointerGetDatum
> (node),
> +  UInt32GetDatum
> (blockno),
> +  PointerGetDatum
> (tuple),
> +  BoolGetDatum
> (visible)));
> +/* XXX: better error */
> +if (found && !visible)
> +elog(ERROR, "Sampling method wanted to return invisible tuple");
> +}
>
> You have mentioned in comment that let it examine invisible tuple,
> but it is not clear why you want to allow examining invisible tuple
> and then later return error, why can't it skips invisible tuple.
>

Well I think returning invisible tuples to user is bad idea and that's 
why the check, but I also think it might make sense to examine the 
invisible tuple for the sampling function in case it wants to create 
some kind of stats about the scan and wants to use those for making the 
decision about returning other tuples. The interface should be probably 
called tsmexaminetuple instead to make it more clear what the intention is.

> 1.
> How about statistics (pgstat_count_heap_getnext()) during
> SampleNext as we do in heap_getnext?

Right, will add.

>
> 2.
> +static TupleTableSlot *
> +SampleNext(SampleScanState *node)
> +{
> ..
> +/*
> +* Lock the buffer so we can safely assess tuple
> +* visibility later.
> +*/
> +LockBuffer(buffer, BUFFER_LOCK_SHARE);
> ..
> }
>
> When is this content lock released, shouldn't we release it after
> checking visibility of tuple?

Here,
+        if (!OffsetNumberIsValid(tupoffset))
+        {
+            UnlockReleaseBuffer(buffer);

but yes you are correct, it should be just released there and we can 
unlock already after visibility check.

>
> 3.
> +static TupleTableSlot *
> +SampleNext(SampleScanState *node)
> {
> ..
> }
>
> Currently in this function as soon as it sees one valid tuple,
> it return's the same, however isn't it better to do some caching
> for tuples on same page like we do in heapgetpage()
> (scan->rs_vistuples[ntup++] = lineoff;).  Basically that can avoid
> taking content lock and some other overhead of operating on a
> page.
>

That's somewhat hard question, it would make sense in cases where we 
read most of the page (which is true for system sampling for example) 
but it would probably slow things down in case where we select small 
number of tuples (like say 1) which is true for bernoulli with small 
percentage parameter, it's actually quite easy to imagine that on really 
big tables (which is where TABLESAMPLE makes sense) we might get blocks 
where we don't actually read any tuples. This is where optimizing for 
one sampling method will hurt another so I don't know what's better 
here. Perhaps the sampling method should have option that says if it 
prefers page mode reading or not, because only the author knows this.

Anyway, I am thinking of making the heapgetpage() public and using it 
directly. It will mean that we have to initialize HeapScanDesc which 
might add couple of lines but we anyway already have to keep last buffer 
and last tuple and position info in the scan info so we can instead use 
HeapScanDesc for that. There will couple of properties of HeapScanDesc 
we don't use but I don't think we care.

BTW I don't expect to have time to work on this patch in next ~10 days 
so I will move it to Feb commitfest.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Erik Rijkers"
Дата:
Сообщение: Re: File based Incremental backup v8
Следующее
От: Josh Berkus
Дата:
Сообщение: Re: Providing catalog view to pg_hba.conf file - Patch submission