Re: TABLESAMPLE patch

Поиск
Список
Период
Сортировка
От Amit Kapila
Тема Re: TABLESAMPLE patch
Дата
Msg-id CAA4eK1J85f+Q3WP-3N_Vp2z9p6seRhQj5F6yhJXXKcnJ-0bGpA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: TABLESAMPLE patch  (Petr Jelinek <petr@2ndquadrant.com>)
Ответы Re: TABLESAMPLE patch  (Petr Jelinek <petr@2ndquadrant.com>)
Список pgsql-hackers
On Fri, Apr 3, 2015 at 3:06 AM, Petr Jelinek <petr@2ndquadrant.com> wrote:
>
> Hi,
>
> so here is version 11.
>

Thanks, patch looks much better, but I think still few more
things needs to discussed/fixed.

1.
+tablesample_clause:
+ TABLESAMPLE ColId '(' func_arg_list ')' opt_repeatable_clause


Why do you want to allow func_arg_list?

Basically if user tries to pass multiple arguments in
TABLESAMPLE method's clause like (10,20), then I think
that should be caught in grammer level as an error.

SQL - 2003 specs says that argument to REPAEATABLE and TABLESAMPLE
method is same <numeric value expr>

It seems to me that you want to allow it to make it extendable
to user defined Tablesample methods, but not sure if it is
right to use func_arg_list for the same because sample method
arguments can have different definition than function arguments.
Now even if we want to keep sample method arguments same as
function arguments that sounds like a separate discussion.

In general, I feel you already have good basic infrastructure for
supportting other sample methods, but it is better to keep the new
DDL's for doing the same as a separate patch than this patch, as that
way we can reduce the scope of this patch, OTOH if you or others
feel that it is mandatory to have new DDL's support for other
tablesample methods then we have to deal with this now itself.

2.
postgres=# explain update test_tablesample TABLESAMPLE system(30) set id = 2;
ERROR:  syntax error at or near "TABLESAMPLE"
LINE 1: explain update test_tablesample TABLESAMPLE system(30) set i...

postgres=# Delete from test_tablesample TABLESAMPLE system(30);
ERROR:  syntax error at or near "TABLESAMPLE"
LINE 1: Delete from test_tablesample TABLESAMPLE system(30);

Isn't TABLESAMPLE clause suppose to work with Update/Delete
statements?


3.
+static RangeTblEntry *
+transformTableSampleEntry(ParseState *pstate, RangeTableSample *r)
..
+ parser_errposition(pstate,
+ exprLocation((Node *) r))));

Better to align exprLocation() with pstate

4.
SampleTupleVisible()
{
..
else
{
/* No pagemode, we have to check the tuple itself. */
Snapshot
snapshot = scan->rs_snapshot;
Buffer buffer = scan->rs_cbuf;

bool visible = 
HeapTupleSatisfiesVisibility(tuple, snapshot, buffer);
..
}

I think it is better to check if PageIsAllVisible() in above
code before visibility check as that can avoid visibility checks.

5.
ParseTableSample(ParseState *pstate, char *samplemethod, Node *repeatable,
List 
*sampleargs)
{
..
if (con->val.type == T_Null)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("REPEATABLE 
clause must be NOT NULL numeric value"),
parser_errposition
(pstate, con->location)));
..
}

InitSamplingMethod(SampleScanState *scanstate, TableSampleClause *tablesample)
{
..
if (fcinfo.argnull[1])
ereport(ERROR,
(errcode
(ERRCODE_NULL_VALUE_NOT_ALLOWED),
errmsg("REPEATABLE clause cannot be 
NULL")));
..
}

I think it would be better if we can have same error message
and probably the same error code in both of the above cases.

6.
extern TableSampleClause *
ParseTableSample(ParseState *pstate, char *samplemethod, Node *repeatable,
List *sampleargs)

It is better to expose function (usage of extern) via header file.
Is there a need to mention extern here?

7.
ParseTableSample()
{
..
arg = coerce_to_specific_type(pstate, arg, INT4OID, "REPEATABLE");
..
}

What is the reason for coercing value of REPEATABLE clause to INT4OID
when numeric value is expected for the clause.  If user gives the
input value as -2.3, it will generate a seed which doesn't seem to
be right.


8.
+DATA(insert OID = 3295 (  tsm_system_init PGNSP PGUID 12 1 0 0 0 f f f f t f v 3 0 2278 "2281 
23 700" _null_ _null_ _null_ _null_ tsm_system_init _null_ _null_ _null_ ));
+DATA(insert OID = 3301 (  tsm_bernoulli_init PGNSP PGUID 12 1 0 0 0 f f f f t f v 3 0 2278 "2281 
23 700" _null_ _null_ _null_ _null_ tsm_bernoulli_init _null_ _null_ _null_ ));

Datatype for second argument is kept as  700 (Float4), shouldn't
it be 1700 (Numeric)?

9.
postgres=# explain SELECT t1.id FROM test_tablesample as t1 TABLESAMPLE SYSTEM (
10), test_tablesample as t2 TABLESAMPLE BERNOULLI (20);
                                 QUERY PLAN
----------------------------------------------------------------------------
 Nested Loop  (cost=0.00..4.05 rows=100 width=4)
   ->  Sample Scan on test_tablesample t1  (cost=0.00..0.01 rows=1 width=4)
   ->  Sample Scan on test_tablesample t2  (cost=0.00..4.02 rows=2 width=0)
(3 rows)

Isn't it better to display sample method name in explain.
I think it can help in case of join queries.
We can use below format to display:
Sample Scan (System) on test_tablesample ...

10. Bernoulli.c

+/* State */
+typedef struct
+{
+ uint32 seed; /* random seed */
+ BlockNumber startblock; /* starting block, we use ths for syncscan support */

typo.
/ths/this


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

В списке pgsql-hackers по дате отправления:

Предыдущее
От: David Rowley
Дата:
Сообщение: Re: Parallel Seq Scan
Следующее
От: Sawada Masahiko
Дата:
Сообщение: Re: Proposal: knowing detail of config files via SQL