Re: WIP: Fast GiST index build
От | Heikki Linnakangas |
---|---|
Тема | Re: WIP: Fast GiST index build |
Дата | |
Msg-id | 4E44E31B.8080202@enterprisedb.com обсуждение исходный текст |
Ответ на | Re: WIP: Fast GiST index build (Alexander Korotkov <aekorotkov@gmail.com>) |
Ответы |
Re: WIP: Fast GiST index build
(Alexander Korotkov <aekorotkov@gmail.com>)
|
Список | pgsql-hackers |
On 11.08.2011 23:30, Alexander Korotkov wrote: > On Thu, Aug 11, 2011 at 2:28 PM, Heikki Linnakangas< > heikki.linnakangas@enterprisedb.com> wrote: > >> On 10.08.2011 22:44, Alexander Korotkov wrote: >> >>> Manual and readme updates. >>> >> >> Thanks, I'm reviewing these now. >> >> Do we want to expose the level-step and buffersize parameters to users? >> They've been useful during testing, but I'm thinking we should be able to >> guess good enough values for them automatically, and just remove the >> options. It's pretty much impossible for a user to tune them correctly, it >> would require deep knowledge of the buffering algorithm. >> >> I'm thinking that even when you explicitly turn buffering on, we should >> still process the first 10000 or so tuples with simple inserts. That way we >> always have a sample of tuples to calculate the average tuple size from. >> It's plausible that if the input data is ordered, looking at the first N >> tuples will give skewed sample, but I don't think there's much danger of >> that in practice. Even if the data is ordered, the length of GiST tuples >> shouldn't vary much. >> >> What happens if we get the levelstep and pagesPerBuffer estimates wrong? >> How sensitive is the algorithm to that? Or will we run out of memory? Would >> it be feasible to adjust those in the middle of the index build, if we e.g >> exceed the estimated memory usage greatly? > > > I see the following risks. > > For levelstep: > Too small: not so great IO benefit as can be > Too large: > 1) If subtree doesn't fit effective_cache, much more IO then should be > (because of cache misses during buffer emptying) > 2) If last pages of buffers don't fit to maintenance_work_mem, possible > OOM Hmm, we could avoid running out of memory if we used a LRU cache replacement policy on the buffer pages, instead of explicitly unloading the buffers. 1) would still apply, though. > For buffersize: > Too small: less IO benefit, becuse buffer size is relatively small in > comparison with sub-tree size. > Too large: greater CPU overhead (because of more penalty calls) then can be > with same IO benefit. > > Thereby I propose following. > 1) Too large levelstep is greatest risk. Let's use pessimistic estimate for > it. Pessimistic estimate has following logic: > largest sub-tree => maximal tuples per page => minimal tuple size > Thereby always using minimal tuple size in levelstep calculation we exclude > greatest risks. > 2) Risks of buffersize are comparable and not too critical. Thats why I > propose to use size of first 10000 tuples for estimate. Yep, sounds reasonable. I think it would also be fairly simple to decrease levelstep and/or adjust buffersize on-the-fly. The trick would be in figuring out the heuristics on when to do that. Another thing occurred to me while looking at the buffer emptying process: At the moment, we stop emptying after we've flushed 1/2 buffer size worth of tuples. The point of that is to avoid overfilling a lower-level buffer, in the case that the tuples we emptied all landed on the same lower-level buffer. Wouldn't it be fairly simple to detect that case explicitly, and stop the emptying process only if one of the lower-level buffers really fills up? That should be more efficient, as you would have "swap" between different subtrees less often. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
В списке pgsql-hackers по дате отправления: