Re: Gsoc2012 idea, tablesample

Поиск
Список
Период
Сортировка
От Kevin Grittner
Тема Re: Gsoc2012 idea, tablesample
Дата
Msg-id 4FABA84D0200002500047AE5@gw.wicourts.gov
обсуждение исходный текст
Ответ на Re: Gsoc2012 idea, tablesample  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: Gsoc2012 idea, tablesample  (Florian Pflug <fgp@phlo.org>)
Список pgsql-hackers
Robert Haas <robertmhaas@gmail.com> wrote:
> I wonder if you could do this with something akin to the Bitmap
> Heap Scan machinery.  Populate a TID bitmap with a bunch of
> randomly chosen TIDs, fetch them all in physical order
It would be pretty hard for any other plan to beat that by very
much, so it seems like a good approach which helps keep things
simple.
> and if you don't get as many rows as you need, rinse and repeat
> until you do.
Ay, there's the rub.  If you get too many, it is important that you
read all the way to the end and then randomly omit some of them. 
While a bit of a bother, that's pretty straightforward and should be
pretty fast, assuming you're not, like, an order of magnitude high. 
But falling short is tougher; making up the difference could be an
iterative process, which could always wind up with having you read
all tuples in the table without filling your sample.  Still, this
approach seems like it would perform better than generating random
ctid values and randomly fetching until you've tried them all.
> I'm worried this project is getting so complicated that it will be
> beyond the ability of a new hacker to get anything useful done. 
> Can we simplify the requirements here to something that is
> reasonable for a beginner?
I would be inclined to omit monetary unit sampling from the first
commit.  Do the parts specified in the standard first and get it
committed.  Useful as unit sampling is, it seems like the hardest to
do, and should probably be done "if time permits" or left as a
future enhancement.  It's probably enough to just remember that it's
there and make a "best effort" attempt not to paint ourselves in a
corner which precludes its development.
-Kevin


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: WIP Patch: Selective binary conversion of CSV file foreign tables
Следующее
От: Robert Haas
Дата:
Сообщение: Re: PL/Python result set slicing broken in Python 3