Re: WIP: Fast GiST index build

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Re: WIP: Fast GiST index build
Дата
Msg-id 4E4AC0B9.8070203@enterprisedb.com
обсуждение исходный текст
Ответ на Re: WIP: Fast GiST index build  (Alexander Korotkov <aekorotkov@gmail.com>)
Ответы Re: WIP: Fast GiST index build  (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
Список pgsql-hackers
On 16.08.2011 21:46, Alexander Korotkov wrote:
> On Tue, Aug 16, 2011 at 9:43 PM, Heikki Linnakangas<
> heikki.linnakangas@enterprisedb.com>  wrote:
>
>> Why is there ever a buffer on the root node? It seems like a waste of time
>> to load N tuples from the heap into the root buffer, only to empty the
>> buffer after it fills up. You might as well pull tuples directly from the
>> heap.
>>
> Yes, seems reasonable. Buffer on the root node was in the paper. But now I
> don't see the need of it too.

Here's an version of the patch with a bunch of minor changes:

* No more buffer on root node. Aside from the root buffer being 
pointless, this simplifies gistRelocateBuildBuffersOnSplit slightly as 
it doesn't need the special case for root block anymore.

* Moved the code to create new root item from gistplacetopage() to 
gistRelocateBuildBuffersOnSplit(). Seems better to keep the 
buffering-related code away from the normal codepath, for the sake of 
readability.

* Changed the levelStep calculation to use the more accurate upper bound 
on subtree size that we discussed.

* Changed the levelStep calculation so that it doesn't do just 
min(maintenance_work_mem, effective_cache_size) and calculate the 
levelStep from that. Maintenance_work_mem matters determines the max. 
number of page buffers that can be held in memory at a time, while 
effective_cache_size determines the max size of the subtree. Those are 
subtly different things.

* Renamed NodeBuffer to GISTNodeBuffer, to avoid cluttering the namespace

* Plus misc comment, whitespace, formatting and naming changes.

I think this patch is in pretty good shape now. Could you re-run the 
performance tests you have on the wiki page, please, to make sure the 
performance hasn't regressed? It would also be nice to get some testing 
on the levelStep and pagesPerBuffer estimates, and the point where we 
switch to the buffering method. I'm particularly interested to know if 
there's any corner-cases with very skewed data distributions or strange 
GUC settings, where the estimates fails badly.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Alexander Korotkov
Дата:
Сообщение: Re: WIP: Fast GiST index build
Следующее
От: Heikki Linnakangas
Дата:
Сообщение: Re: WIP: Fast GiST index build