Re: TABLESAMPLE patch

Поиск
Список
Период
Сортировка
От Petr Jelinek
Тема Re: TABLESAMPLE patch
Дата
Msg-id 551FFB45.6060707@2ndquadrant.com
обсуждение исходный текст
Ответ на Re: TABLESAMPLE patch  (Amit Kapila <amit.kapila16@gmail.com>)
Ответы Re: TABLESAMPLE patch  (Amit Kapila <amit.kapila16@gmail.com>)
Список pgsql-hackers
On 04/04/15 14:57, Amit Kapila wrote:
>
> 1.
> +tablesample_clause:
> +TABLESAMPLE ColId '(' func_arg_list ')' opt_repeatable_clause
>
>
> Why do you want to allow func_arg_list?
>
> Basically if user tries to pass multiple arguments in
> TABLESAMPLE method's clause like (10,20), then I think
> that should be caught in grammer level as an error.

It will be reported as error during parse analysis if the TABLESAMPLE 
method does not accept two parameters, same as when the expression used 
wrong type for example.

>
> SQL - 2003 specs says that argument to REPAEATABLE and TABLESAMPLE
> method is same <numeric value expr>
>
> It seems to me that you want to allow it to make it extendable
> to user defined Tablesample methods, but not sure if it is
> right to use func_arg_list for the same because sample method
> arguments can have different definition than function arguments.
> Now even if we want to keep sample method arguments same as
> function arguments that sounds like a separate discussion.
>

Yes I want extensibility here. And I think the tablesample method 
arguments are same thing as function arguments given that in the end 
they are arguments for the init function of tablesampling method.

I would be ok with just expr_list, naming parameters here isn't overly 
important, but ability to have different types and numbers of parameters 
for custom TABLESAMPLE methods *is* important.

> In general, I feel you already have good basic infrastructure for
> supportting other sample methods, but it is better to keep the new
> DDL's for doing the same as a separate patch than this patch, as that
> way we can reduce the scope of this patch, OTOH if you or others
> feel that it is mandatory to have new DDL's support for other
> tablesample methods then we have to deal with this now itself.

Well I did attach it as separate diff mainly for that reason. I agree 
that DDL does not have to be committed immediately with the rest of the 
patch (although it's the simplest part of the patch IMHO).

>
> 2.
> postgres=# explain update test_tablesample TABLESAMPLE system(30) set id
> = 2;
> ERROR:  syntax error at or near "TABLESAMPLE"
> LINE 1: explain update test_tablesample TABLESAMPLE system(30) set i...
>
> postgres=# Delete from test_tablesample TABLESAMPLE system(30);
> ERROR:  syntax error at or near "TABLESAMPLE"
> LINE 1: Delete from test_tablesample TABLESAMPLE system(30);
>
> Isn't TABLESAMPLE clause suppose to work with Update/Delete
> statements?
>

It's supported in the FROM part of UPDATE and USING part of DELETE. I 
think that that's sufficient.

Standard is somewhat useless for UPDATE and DELETE as it only defines 
quite limited syntax there. From what I've seen when doing research 
MSSQL also only supports it in their equivalent of FROM/USING list, 
Oracle does not seem to support their SAMPLING clause outside of SELECTs 
at all and if I got the cryptic DB2 manual correctly I think they don't 
support it outside of (sub)SELECTs either.

> 4.
> SampleTupleVisible()
> {
> ..
> else
> {
> /* No pagemode, we have to check the tuple itself. */
> Snapshot
> snapshot = scan->rs_snapshot;
> Bufferbuffer = scan->rs_cbuf;
>
> bool visible =
> HeapTupleSatisfiesVisibility(tuple, snapshot, buffer);
> ..
> }
>
> I think it is better to check if PageIsAllVisible() in above
> code before visibility check as that can avoid visibility checks.

It's probably even better to do that one level up in the samplenexttup() 
and only call the SampleTupleVisible if page is not allvisible 
(PageIsAllVisible() is cheap).

>
> 6.
> extern TableSampleClause *
> ParseTableSample(ParseState *pstate, char *samplemethod, Node *repeatable,
> List *sampleargs)
>
> It is better to expose function (usage of extern) via header file.
> Is there a need to mention extern here?
>

Eh, stupid copy-paste error when copying function name from header to 
actual source file.

> 7.
> ParseTableSample()
> {
> ..
> arg = coerce_to_specific_type(pstate, arg, INT4OID, "REPEATABLE");
> ..
> }
>
> What is the reason for coercing value of REPEATABLE clause to INT4OID
> when numeric value is expected for the clause.  If user gives the
> input value as -2.3, it will generate a seed which doesn't seem to
> be right.
>

Because the REPEATABLE is numeric expression so it can produce whatever 
number but we need int4 internally (well float4 would also work just the 
code would be slightly uglier). And we do this type of coercion even for 
table data (you can insert -2.3 into integer column and it will work) so 
I don't see what's wrong with it here.

>
> 8.
> +DATA(insert OID = 3295 (  tsm_system_initPGNSP PGUID 12 1 0 0 0 f f f f
> t f v 3 0 2278 "2281
> 23 700" _null_ _null_ _null_ _null_tsm_system_init _null_ _null_ _null_ ));
> +DATA(insert OID = 3301 (  tsm_bernoulli_initPGNSP PGUID 12 1 0 0 0 f f
> f f t f v 3 0 2278 "2281
> 23 700" _null_ _null_ _null_ _null_tsm_bernoulli_init _null_ _null_
> _null_ ));
>
> Datatype for second argument is kept as  700 (Float4), shouldn't
> it be 1700 (Numeric)?
>

Why is that? Given the sampling error I think the float4 is enough for 
specifying the percentage and it makes the calculations much easier and 
faster than dealing with Numeric would.

> 9.
> postgres=# explain SELECT t1.id <http://t1.id> FROM test_tablesample as
> t1 TABLESAMPLE SYSTEM (
> 10), test_tablesample as t2 TABLESAMPLE BERNOULLI (20);
>                                   QUERY PLAN
> ----------------------------------------------------------------------------
>   Nested Loop  (cost=0.00..4.05 rows=100 width=4)
>     ->  Sample Scan on test_tablesample t1  (cost=0.00..0.01 rows=1 width=4)
>     ->  Sample Scan on test_tablesample t2  (cost=0.00..4.02 rows=2 width=0)
> (3 rows)
>
> Isn't it better to display sample method name in explain.
> I think it can help in case of join queries.
> We can use below format to display:
> Sample Scan (System) on test_tablesample ...

Good idea.


--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andrew Gierth
Дата:
Сообщение: Re: Abbreviated keys for Numeric
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Compile warnings on OSX 10.10 clang 6.0