Re: TABLESAMPLE patch

Поиск
Список
Период
Сортировка
От Amit Kapila
Тема Re: TABLESAMPLE patch
Дата
Msg-id CAA4eK1LzRFxnzHff+-O-JB=mELuVDjqeGwkOz-vHUvaV2gSmKg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: TABLESAMPLE patch  (Petr Jelinek <petr@2ndquadrant.com>)
Ответы Re: TABLESAMPLE patch  (Petr Jelinek <petr@2ndquadrant.com>)
Список pgsql-hackers
On Fri, Jan 23, 2015 at 5:39 AM, Petr Jelinek <petr@2ndquadrant.com> wrote:
On 19/01/15 07:08, Amit Kapila wrote:
On Sun, Jan 18, 2015 at 12:46 AM, Petr Jelinek <petr@2ndquadrant.com
<mailto:petr@2ndquadrant.com>> wrote:

I think that's actually good to have, because we still do costing and the partial index might help produce better estimate of number of returned rows for tablesample as well.

I don't understand how will it help, because for tablesample scan
it doesn't consider partial index at all as per patch.

CREATE TABLE test_tablesample (id INT, name text) WITH (fillfactor=10);
INSERT INTO test_tablesample SELECT i, repeat(i::text, 200) FROM generate_series(0, 9) s(i) ORDER BY i;
INSERT INTO test_tablesample values(generate_series(10,10000);

create index idx_tblsample on test_tablesample(id) where id>9999;

Analyze test_tablesample;

postgres=# explain SELECT id FROM test_tablesample where id > 9999;
                                        QUERY PLAN

--------------------------------------------------------------------------------
-----------
 Index Only Scan using idx_tblsample on test_tablesample  (cost=0.13..8.14 rows=
1 width=4)
   Index Cond: (id > 9999)
(2 rows)


postgres=# explain SELECT id FROM test_tablesample TABLESAMPLE BERNOULLI (80) wh
ere id > 9999;
                               QUERY PLAN
------------------------------------------------------------------------
 Sample Scan on test_tablesample  (cost=0.00..658.00 rows=8000 width=4)
   Filter: (id > 9999)
(2 rows)

The above result also clearly indicate that when TABLESAMPLE
clause is used, it won't use partial index.


Well similar, not same as we are not always fetching whole page or doing visibility checks on all tuples in the page, etc. But I don't see why it can't be inside nodeSamplescan. If you look at bitmap heap scan, that one is also essentially somewhat modified sequential scan and does everything within the node nodeBitmapHeapscan because the algorithm is not used anywhere else, just like sample scan.


I don't mind doing everything in nodeSamplescan, however if
you can split the function, it will be easier to read and understand,
if you see in nodeBitmapHeapscan, that also has function like
bitgetpage().
This is just a suggestion and if you think that it can be splitted,
then it's okay, otherwise leave it as it is.


Refer parameter (HeapScanDesc->rs_syncscan) and syncscan.c.
Basically during sequiantial scan on same table by different
backends, we attempt to keep them synchronized to reduce the I/O.


Ah this, yes, it makes sense for bernoulli, not for system though. I guess it should be used for sampling methods that use SAS_SEQUENTIAL strategy.


Have you taken care of this in your latest patch?
 

I don't know how else to explain, if we loop over every tuple in the relation and return it with equal probability then visibility checks don't matter as the percentage of visible tuples will be same in the result as in the relation.

For example if you have 30% visible tuples and you are interested in 10% of tuples overall it will return 10% of all tuples and since every tuple has 10% chance of being returned, 30% of those should be visible (if we assume smooth distribution of random numbers generated). So in the end you are getting 10% of visible tuples because the scan will obviously skip the invisible ones and that's what you wanted.

As I said problem of analyze is that it uses tuple limit instead of probability.


Okay, got the point.
 

Yes the differences is smaller (in relative numbers) for bigger tables when I test this. On 1k row table the spread of 20 runs was between 770-818 and on 100k row table it's between 79868-80154. I think that's acceptable variance and it's imho indeed the random generator that produces this.


Sure, I think this is acceptable.
 
Oh and BTW when I delete 50k of tuples (make them invisible) the results of 20 runs are between 30880 and 40063 rows.


This is between 60% to 80%, lower than what is expected,
but I guess we can't do much for this except for trying with
reverse order for visibility test and sample tuple call,
you can decide if you want to try that once just to see if that
is better.
 
I am sure it could be done with sequence scan. Not sure if it would be pretty and more importantly, the TABLESAMPLE is *not* sequential scan. The fact that bernoulli method looks similar should not make us assume that every other method does as well, especially when system method is completely different.


Okay, lets keep it as separate.



Anyway, attached is new version with some updates that you mentioned (all_visible, etc).
I also added optional interface for the sampling method to access the tuple contents as I can imagine sampling methods that will want to do that. 
 
+ /*
+ * Let the sampling method examine the actual tuple and decide if we
+ * should return it.
+ *
+ * Note that we let it examine even invisible tuples.
+ */
+ if (OidIsValid(node->tsmreturntuple.fn_oid))
+ {
+ found = DatumGetBool(FunctionCall4(&node->tsmreturntuple,
+   PointerGetDatum
(node),
+   UInt32GetDatum
(blockno),
+   PointerGetDatum
(tuple),
+   BoolGetDatum
(visible)));
+ /* XXX: better error */
+ if (found && !visible)
+ elog(ERROR, "Sampling method wanted to return invisible tuple");
+ }

You have mentioned in comment that let it examine invisible tuple,
but it is not clear why you want to allow examining invisible tuple
and then later return error, why can't it skips invisible tuple.

1. 
How about statistics (pgstat_count_heap_getnext()) during
SampleNext as we do in heap_getnext?

2.
+static TupleTableSlot *
+SampleNext(SampleScanState *node)
+{
..
+ /*
+ * Lock the buffer so we can safely assess tuple
+ * visibility later.
+ */
+ LockBuffer(buffer, BUFFER_LOCK_SHARE);
..
}

When is this content lock released, shouldn't we release it after
checking visibility of tuple?

3.
+static TupleTableSlot *
+SampleNext(SampleScanState *node)
{
..
}

Currently in this function as soon as it sees one valid tuple,
it return's the same, however isn't it better to do some caching
for tuples on same page like we do in heapgetpage()
(scan->rs_vistuples[ntup++] = lineoff;).  Basically that can avoid
taking content lock and some other overhead of operating on a
page.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: Streaming replication and WAL archive interactions
Следующее
От: Marco Nenciarini
Дата:
Сообщение: Re: pg_check_dir comments and implementation mismatch