Re: TABLESAMPLE patch

Поиск
Список
Период
Сортировка
От Simon Riggs
Тема Re: TABLESAMPLE patch
Дата
Msg-id CANP8+jJTY8NV5HoOcgp_jFcw6+NtfcnYwDwcZn+4vYm0gSj8zw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: TABLESAMPLE patch  (Petr Jelinek <petr@2ndquadrant.com>)
Список pgsql-hackers
On 17 April 2015 at 14:54, Petr Jelinek <petr@2ndquadrant.com> wrote:
 
I agree that DDL patch is not that important to get in (and I made it last patch in the series now), which does not mean somebody can't write the extension with new tablesample method.


In any case attached another version.

Changes:
- I addressed the comments from Michael

- I moved the interface between nodeSampleScan and the actual sampling method to it's own .c file and added TableSampleDesc struct for it. This makes the interface cleaner and will make it more straightforward to extend for subqueries in the future (nothing really changes just some functions were renamed and moved). Amit suggested this at some point and I thought it's not needed at that time but with the possible future extension to subquery support I changed my mind.

- renamed heap_beginscan_ss to heap_beginscan_sampling to avoid confusion with sync scan

- reworded some things and more typo fixes

- Added two sample contrib modules demonstrating row limited and time limited sampling. I am using linear probing for both of those as the builtin block sampling is not well suited for row limited or time limited sampling. For row limited I originally thought of using the Vitter's reservoir sampling but that does not fit well with the executor as it needs to keep the reservoir of all the output tuples in memory which would have horrible memory requirements if the limit was high. The linear probing seems to work quite well for the use case of "give me 500 random rows from table".

For me, the DDL changes are something we can leave out for now, as a way to minimize the change surface.

I'm now moving to final review of patches 1-5. Michael requested patch 1 to be split out. If I commit, I will keep that split, but I am considering all of this as a single patchset. I've already spent a few days reviewing, so I don't expect this will take much longer.

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Simon Riggs
Дата:
Сообщение: Re: Moving on to close the current CF 2015-02
Следующее
От: Simon Riggs
Дата:
Сообщение: Re: Replication identifiers, take 4