Re: TABLESAMPLE patch

Поиск
Список
Период
Сортировка
От Simon Riggs
Тема Re: TABLESAMPLE patch
Дата
Msg-id CA+U5nMKjU=KkKLSKUXQr9LrSfWD0maLuyaa_DZ-9c_7Fdn7gBg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: TABLESAMPLE patch  (Peter Eisentraut <peter_e@gmx.net>)
Ответы Re: TABLESAMPLE patch  (Peter Eisentraut <peter_e@gmx.net>)
Список pgsql-hackers
On 9 April 2015 at 15:30, Peter Eisentraut <peter_e@gmx.net> wrote:
> On 4/9/15 5:02 AM, Michael Paquier wrote:
>> Just to be clear, the example above being misleading... Doing table
>> sampling using SYSTEM at physical level makes sense. In this case I
>> think that we should properly error out when trying to use this method
>> on something not present at physical level. But I am not sure that
>> this restriction applies to BERNOUILLI: you may want to apply it on
>> other things than physical relations, like views or results of WITH
>> clauses. Also, based on the fact that we support custom sampling
>> methods, I think that it should be up to the sampling method to define
>> on what kind of objects it supports sampling, and where it supports
>> sampling fetching, be it page-level fetching or analysis from an
>> existing set of tuples. Looking at the patch, TABLESAMPLE is just
>> allowed on tables and matviews, this limitation is too restrictive
>> IMO.
>
> In the SQL standard, the TABLESAMPLE clause is attached to a table
> expression (<table primary>), which includes table functions,
> subqueries, CTEs, etc.  In the proposed patch, it is attached to a table
> name, allowing only an ONLY clause.  So this is a significant deviation.

There is no deviation from the standard in the current patch.
Currently we are 100% unimplemented feature; the patch would move us
directly towards a fully implemented feature, perhaps reduce to fully
implemented.

> Obviously, doing block sampling on a physical table is a significant use
> case

Very significant use case, which this patch addresses. Query result
sampling would not be a very interesting use case and was not even
thought of without the SQL Standard.

>, but we should be clear about which restrictions and tradeoffs were
> are making now and in the future, especially if we are going to present
> extension interfaces.  The fact that physical tables are interchangeable
> with other relation types, at least in data-reading contexts, is a
> feature worth preserving.

Agreed.

This patch does nothing to change that interchangeability. There is no
restriction or removal of current query capability.

It looks trivial to make it work for query results also, but if it is
not, ISTM something that can be added in a later release.

> It may be worth thinking about some examples of other sampling methods,
> in order to get a better feeling for whether the interfaces are appropriate.
>
> Earlier in the thread, someone asked about supporting specifying a
> number of rows instead of percents.  While not essential, that seems
> pretty useful, but I wonder how that could be implemented later on if we
> take the approach that the argument to the sampling method can be an
> arbitrary quantity that is interpreted only by the method.

Not sure I understand that. The method could allow parameters of any unit.

Having a function-base implementation allows stratified sampling or
other approaches suited directly to user's data.

I don't think its reasonable to force all methods to offer both limits
on numbers of rows or percentages. They may not be applicable.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, RemoteDBA, Training &
Services



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: FPW compression leaks information
Следующее
От: Petr Jelinek
Дата:
Сообщение: Re: TABLESAMPLE patch