Обсуждение: Wording in TABLESAMPLE documentation

Поиск
Список
Период
Сортировка

Wording in TABLESAMPLE documentation

От
paddor@gmail.com
Дата:
The following documentation comment has been logged on the website:

Page: https://www.postgresql.org/docs/9.6/static/sql-select.html
Description:

Regarding the TABLESAMPLE documentation on [1], I think in the following
sentence

> If REPEATABLE is not given then a new random sample is selected for each
query.

the word "sample" should be "seed". Of course it results in a new random
sample as well, but IMHO this sentence is about what happens to the seed in
case REPEATABLE (seed) is omitted.

Best regards,
Patrik Wenger

[1] https://www.postgresql.org/docs/9.6/static/sql-select.html

Re: Wording in TABLESAMPLE documentation

От
Simon Riggs
Дата:
On 11 August 2016 at 17:21,  <paddor@gmail.com> wrote:
> The following documentation comment has been logged on the website:
>
> Page: https://www.postgresql.org/docs/9.6/static/sql-select.html
> Description:
>
> Regarding the TABLESAMPLE documentation on [1], I think in the following
> sentence
>
> > If REPEATABLE is not given then a new random sample is selected for each
> query.
>
> the word "sample" should be "seed". Of course it results in a new random
> sample as well, but IMHO this sentence is about what happens to the seed in
> case REPEATABLE (seed) is omitted.

Corrected, thanks.

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Wording in TABLESAMPLE documentation

От
Tom Lane
Дата:
Simon Riggs <simon@2ndquadrant.com> writes:
> On 11 August 2016 at 17:21,  <paddor@gmail.com> wrote:
>> > If REPEATABLE is not given then a new random sample is selected for each
>> query.
>>
>> the word "sample" should be "seed". Of course it results in a new random
>> sample as well, but IMHO this sentence is about what happens to the seed in
>> case REPEATABLE (seed) is omitted.

> Corrected, thanks.

I do not think this is an improvement.  The sentence was specifically about
whether the sample (that is, the set of rows selected) would change.  This
rewording essentially removes that user-visible behavior guarantee, and
for what?  It's certainly not any clearer.

            regards, tom lane


Re: Wording in TABLESAMPLE documentation

От
Simon Riggs
Дата:
On 12 August 2016 at 15:24, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
>> On 11 August 2016 at 17:21,  <paddor@gmail.com> wrote:
>>> > If REPEATABLE is not given then a new random sample is selected for each
>>> query.
>>>
>>> the word "sample" should be "seed". Of course it results in a new random
>>> sample as well, but IMHO this sentence is about what happens to the seed in
>>> case REPEATABLE (seed) is omitted.
>
>> Corrected, thanks.
>
> I do not think this is an improvement.  The sentence was specifically about
> whether the sample (that is, the set of rows selected) would change.  This
> rewording essentially removes that user-visible behavior guarantee, and
> for what?  It's certainly not any clearer.

It was supposed to be a correction, rather than an improvement. I saw
the use of the word "sample" as an error.

But now you mention it, I agree with you. Let's put it back to say
"sample" but also explain where that new sample comes from... my
attempt to explain this better is in square brackets

"If REPEATABLE is not given then a new random sample will be taken for
each query [based upon the global seed value for the current user.]"

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Wording in TABLESAMPLE documentation

От
Tom Lane
Дата:
Simon Riggs <simon@2ndquadrant.com> writes:
> But now you mention it, I agree with you. Let's put it back to say
> "sample" but also explain where that new sample comes from... my
> attempt to explain this better is in square brackets

> "If REPEATABLE is not given then a new random sample will be taken for
> each query [based upon the global seed value for the current user.]"

I think "global" might have implications we don't want.  How about
adding ", based on a system-generated seed"?

            regards, tom lane


Re: Wording in TABLESAMPLE documentation

От
Simon Riggs
Дата:
On 12 August 2016 at 16:23, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
>> But now you mention it, I agree with you. Let's put it back to say
>> "sample" but also explain where that new sample comes from... my
>> attempt to explain this better is in square brackets
>
>> "If REPEATABLE is not given then a new random sample will be taken for
>> each query [based upon the global seed value for the current user.]"
>
> I think "global" might have implications we don't want.  How about
> adding ", based on a system-generated seed"?

What I was trying to express was that

SELECT setseed(dp);
SELECT * FROM foo TABLESAMPLE ...;
SELECT * FROM foo TABLESAMPLE ...;
SELECT * FROM foo TABLESAMPLE ...;

would yield a repeatable set of samples, similarly repeatable but not
same samples as

SELECT * FROM foo TABLESAMPLE ... REPEATABLE;
SELECT * FROM foo TABLESAMPLE ... REPEATABLE;
SELECT * FROM foo TABLESAMPLE ... REPEATABLE;

so that people understand there is some predictability even without REPEATABLE.

So I don't understand the "based on a system-generated seed", but
maybe I'm missing information.

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Wording in TABLESAMPLE documentation

От
Tom Lane
Дата:
Simon Riggs <simon@2ndquadrant.com> writes:
> On 12 August 2016 at 16:23, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I think "global" might have implications we don't want.  How about
>> adding ", based on a system-generated seed"?

> What I was trying to express was that

> SELECT setseed(dp);
> SELECT * FROM foo TABLESAMPLE ...;
> SELECT * FROM foo TABLESAMPLE ...;
> SELECT * FROM foo TABLESAMPLE ...;

> would yield a repeatable set of samples, similarly repeatable but not
> same samples as

> SELECT * FROM foo TABLESAMPLE ... REPEATABLE;
> SELECT * FROM foo TABLESAMPLE ... REPEATABLE;
> SELECT * FROM foo TABLESAMPLE ... REPEATABLE;

But that's *wrong*.  Not all tablesample methods make any such guarantee.
In fact, neither of our contrib methods do.  Only if you use REPEATABLE
(and the method allows it) is there any promise at all about repeatability.

            regards, tom lane


Re: Wording in TABLESAMPLE documentation

От
Simon Riggs
Дата:
On 12 August 2016 at 18:54, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
>> On 12 August 2016 at 16:23, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> I think "global" might have implications we don't want.  How about
>>> adding ", based on a system-generated seed"?
>
>> What I was trying to express was that
>
>> SELECT setseed(dp);
>> SELECT * FROM foo TABLESAMPLE ...;
>> SELECT * FROM foo TABLESAMPLE ...;
>> SELECT * FROM foo TABLESAMPLE ...;
>
>> would yield a repeatable set of samples, similarly repeatable but not
>> same samples as
>
>> SELECT * FROM foo TABLESAMPLE ... REPEATABLE;
>> SELECT * FROM foo TABLESAMPLE ... REPEATABLE;
>> SELECT * FROM foo TABLESAMPLE ... REPEATABLE;
>
> But that's *wrong*.  Not all tablesample methods make any such guarantee.
> In fact, neither of our contrib methods do.  Only if you use REPEATABLE
> (and the method allows it) is there any promise at all about repeatability.

OK, fair enough. I'll just use your wording then. Thanks.

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services