Re: Gsoc2012 idea, tablesample

Поиск
Список
Период
Сортировка
От Kevin Grittner
Тема Re: Gsoc2012 idea, tablesample
Дата
Msg-id 4FACD5D10200002500047B3F@gw.wicourts.gov
обсуждение исходный текст
Ответ на Re: Gsoc2012 idea, tablesample  (Florian Pflug <fgp@phlo.org>)
Ответы Re: Gsoc2012 idea, tablesample  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Gsoc2012 idea, tablesample  (Florian Pflug <fgp@phlo.org>)
Список pgsql-hackers
Florian Pflug <fgp@phlo.org> wrote:
> Maybe one can get rid of these sorts of problems by factoring in
> the expected density of the table beforehand and simply accepting
> that the results will be inaccurate if the statistics are
> outdated?
> 
> One could, for example, simply pick
> 
>   N := SamplingPercentage * MaxTuplesPerPage /
>        AvgLiveTuplesPerPage
> 
> where
> 
>  AvgLiveTuplesPerPage := #Tuples / #Pages
> 
> random TIDs, fetch the live ones, and return them.
To clarify, I read this as using reltuples and relpages for the
table, and returning only tuples which are visible according to the
query's snapshot.  (i.e., I think you used "live" to mean two
different things there.)
Unless I'm missing something, I think that works for percentage
selection, which is what the standard talks about, without any need
to iterate through addition samples.  Good idea!  We don't need to
do any second pass to pare down initial results, either.  This
greatly simplifies coding while providing exactly what the standard
requires.
> I'm not totally sure whether this approach is sensible to
> non-uniformity in the tuple to line-pointer assignment, though.
I think we can solve that by going high enough with tuple numbers to
reach the highest tuple ID that might be in use in the table, and
*not* following HOT chains.  (If we follow HOT chains, we could have
several distinct ctid values which returned the same tuple.)  Or am
I thinking about this incorrectly?
> [more complex alternatives]
I really think your first suggestion covers it perfectly; these more
complex techniques don't seem necessary to me.
-Kevin


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Bruce Momjian
Дата:
Сообщение: Re: Draft release notes complete
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Gsoc2012 idea, tablesample