Re: TABLESAMPLE patch

Поиск
Список
Период
Сортировка
От Petr Jelinek
Тема Re: TABLESAMPLE patch
Дата
Msg-id 54B18493.5030303@2ndquadrant.com
обсуждение исходный текст
Ответ на Re: TABLESAMPLE patch  (Michael Paquier <michael.paquier@gmail.com>)
Ответы Re: TABLESAMPLE patch  (Amit Kapila <amit.kapila16@gmail.com>)
Список pgsql-hackers
On 09/01/15 09:27, Michael Paquier wrote:
>
> Some comments about the 1st patch:
> 1) Nitpicking:
> + *       Block sampling routines shared by ANALYZE and TABLESAMPLE.
> TABLESAMPLE is not added yet. You may as well mention simplify this
> description with something like "Sampling routines for relation
> blocks".
>

Changed.

> 2) file_fdw and postgres_fdw do not compile correctly as they still
> use anl_random_fract. This function is still mentioned in vacuum.h as
> well.

Gah, didn't notice this, fixed. And also since the Vitter's  reservoir
sampling methods are used by other component than just analyze, I moved
those to sampling.c/h.

>
> 3) Not really an issue of this patch, but I'd think that this comment
> should be reworked:
> + * BlockSampler is used for stage one of our new two-stage tuple
> + * sampling mechanism as discussed on pgsql-hackers 2004-04-02 (subject
> + * "Large DB").  It selects a random sample of samplesize blocks out of
> + * the nblocks blocks in the table.  If the table has less than
> + * samplesize blocks, all blocks are selected.
>

I changed the wording slightly, it's still not too great though.

> 4) As a refactoring patch, why is the function providing a random
> value changed? Shouldn't sample_random_fract be consistent with
> anl_random_fract?
>

Yeah I needed that for TABLESAMPLE and it should not really affect the
randomness but you are correct it should be part of second patch of the
patch-set.

> 5) The seed numbers can be changed to RAND48_SEED_0, RAND48_SEED_1 and
> RAND48_SEED_2 instead of being hardcoded?
> Regards,
>

Removed this part from the first patch as it's not needed there anymore.

In second patch which implements the TABLESAMPLE itself I changed the
implementation of random generator because when I looked at the code
again I realized the old one would produce wrong results if there were
multiple TABLESAMPLE statements in same query or multiple cursors in
same transaction.

In addition to the above changes I added test for cursors and test for
the issue with random generator I mentioned above. Also fixed some typos
in comments and function name. And finally I added note to docs saying
that same REPEATABLE might produce different results in subsequent queries.

--
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Sloppy SSPI error reporting code
Следующее
От: Christoph Berg
Дата:
Сообщение: Re: libpq 9.4 requires /etc/passwd?