Re: WIP: Fast GiST index build

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Re: WIP: Fast GiST index build
Дата
Msg-id 4E44E31B.8080202@enterprisedb.com
обсуждение исходный текст
Ответ на Re: WIP: Fast GiST index build  (Alexander Korotkov <aekorotkov@gmail.com>)
Ответы Re: WIP: Fast GiST index build  (Alexander Korotkov <aekorotkov@gmail.com>)
Список pgsql-hackers
On 11.08.2011 23:30, Alexander Korotkov wrote:
> On Thu, Aug 11, 2011 at 2:28 PM, Heikki Linnakangas<
> heikki.linnakangas@enterprisedb.com>  wrote:
>
>> On 10.08.2011 22:44, Alexander Korotkov wrote:
>>
>>> Manual and readme updates.
>>>
>>
>> Thanks, I'm reviewing these now.
>>
>> Do we want to expose the level-step and buffersize parameters to users?
>> They've been useful during testing, but I'm thinking we should be able to
>> guess good enough values for them automatically, and just remove the
>> options. It's pretty much impossible for a user to tune them correctly, it
>> would require deep knowledge of the buffering algorithm.
>>
>> I'm thinking that even when you explicitly turn buffering on, we should
>> still process the first 10000 or so tuples with simple inserts. That way we
>> always have a sample of tuples to calculate the average tuple size from.
>> It's plausible that if the input data is ordered, looking at the first N
>> tuples will give skewed sample, but I don't think there's much danger of
>> that in practice. Even if the data is ordered, the length of GiST tuples
>> shouldn't vary much.
>>
>> What happens if we get the levelstep and pagesPerBuffer estimates wrong?
>> How sensitive is the algorithm to that? Or will we run out of memory? Would
>> it be feasible to adjust those in the middle of the index build, if we e.g
>> exceed the estimated memory usage greatly?
>
>
> I see the following risks.
>
> For levelstep:
> Too small: not so great IO benefit as can be
> Too large:
>    1) If subtree doesn't fit effective_cache, much more IO then should be
> (because of cache misses during buffer emptying)
>    2) If last pages of buffers don't fit to maintenance_work_mem, possible
> OOM

Hmm, we could avoid running out of memory if we used a LRU cache 
replacement policy on the buffer pages, instead of explicitly unloading 
the buffers. 1) would still apply, though.

> For buffersize:
> Too small: less IO benefit, becuse buffer size is relatively small in
> comparison with sub-tree size.
> Too large: greater CPU overhead (because of more penalty calls) then can be
> with same IO benefit.
>
> Thereby I propose following.
> 1) Too large levelstep is greatest risk. Let's use pessimistic estimate for
> it. Pessimistic estimate has following logic:
> largest sub-tree =>  maximal tuples per page =>  minimal tuple size
> Thereby always using minimal tuple size in levelstep calculation we exclude
> greatest risks.
> 2) Risks of buffersize are comparable and not too critical. Thats why I
> propose to use size of first 10000 tuples for estimate.

Yep, sounds reasonable.

I think it would also be fairly simple to decrease levelstep and/or 
adjust buffersize on-the-fly. The trick would be in figuring out the 
heuristics on when to do that.

Another thing occurred to me while looking at the buffer emptying 
process: At the moment, we stop emptying after we've flushed 1/2 buffer 
size worth of tuples. The point of that is to avoid overfilling a 
lower-level buffer, in the case that the tuples we emptied all landed on 
the same lower-level buffer. Wouldn't it be fairly simple to detect that 
case explicitly, and stop the emptying process only if one of the 
lower-level buffers really fills up? That should be more efficient, as 
you would have "swap" between different subtrees less often.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: our buffer replacement strategy is kind of lame
Следующее
От: Simon Riggs
Дата:
Сообщение: Re: our buffer replacement strategy is kind of lame