On 09/01/15 09:27, Michael Paquier wrote:
>
> Some comments about the 1st patch:
> 1) Nitpicking:
> + * Block sampling routines shared by ANALYZE and TABLESAMPLE.
> TABLESAMPLE is not added yet. You may as well mention simplify this
> description with something like "Sampling routines for relation
> blocks".
>
Changed.
> 2) file_fdw and postgres_fdw do not compile correctly as they still
> use anl_random_fract. This function is still mentioned in vacuum.h as
> well.
Gah, didn't notice this, fixed. And also since the Vitter's reservoir
sampling methods are used by other component than just analyze, I moved
those to sampling.c/h.
>
> 3) Not really an issue of this patch, but I'd think that this comment
> should be reworked:
> + * BlockSampler is used for stage one of our new two-stage tuple
> + * sampling mechanism as discussed on pgsql-hackers 2004-04-02 (subject
> + * "Large DB"). It selects a random sample of samplesize blocks out of
> + * the nblocks blocks in the table. If the table has less than
> + * samplesize blocks, all blocks are selected.
>
I changed the wording slightly, it's still not too great though.
> 4) As a refactoring patch, why is the function providing a random
> value changed? Shouldn't sample_random_fract be consistent with
> anl_random_fract?
>
Yeah I needed that for TABLESAMPLE and it should not really affect the
randomness but you are correct it should be part of second patch of the
patch-set.
> 5) The seed numbers can be changed to RAND48_SEED_0, RAND48_SEED_1 and
> RAND48_SEED_2 instead of being hardcoded?
> Regards,
>
Removed this part from the first patch as it's not needed there anymore.
In second patch which implements the TABLESAMPLE itself I changed the
implementation of random generator because when I looked at the code
again I realized the old one would produce wrong results if there were
multiple TABLESAMPLE statements in same query or multiple cursors in
same transaction.
In addition to the above changes I added test for cursors and test for
the issue with random generator I mentioned above. Also fixed some typos
in comments and function name. And finally I added note to docs saying
that same REPEATABLE might produce different results in subsequent queries.
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services