Re: WIP: Fast GiST index build

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Re: WIP: Fast GiST index build
Дата
Msg-id 4E3FAB1D.1060105@enterprisedb.com
обсуждение исходный текст
Ответ на Re: WIP: Fast GiST index build  (Alexander Korotkov <aekorotkov@gmail.com>)
Ответы Re: WIP: Fast GiST index build  (Alexander Korotkov <aekorotkov@gmail.com>)
Список pgsql-hackers
On 07.08.2011 22:28, Alexander Korotkov wrote:
> There is last version of patch. There is the list of most significant
> changes in comparison with your version of patch:
> 1) Reference counting of path items was added. It should helps against
> possible accumulation of path items.

Ok.

> 2) Neighbor relocation was added.

Ok. I think we're going to need some sort of a heuristic on when to 
enable neighbor relocation. If I remember the performance tests 
correctly, it improves the quality of the resulting index, but incurs a 
significant CPU overhead.

Actually, looking at the performance numbers on the wiki page again 
(http://wiki.postgresql.org/wiki/Fast_GiST_index_build_GSoC_2011), it 
looks like neighbor relocation doesn't help very much with the index 
quality - sometimes it even results in a slightly worse index. Based on 
those results, shouldn't we just remove it? Or is there some other data 
set where it helps significantly?

> 3) Subtree prefetching was added.

I'm inclined to leave out the prefetching code for now. Unless you have 
some performance numbers that show that it's critical for the overall 
performance. But I don't think that was the case, it's just an 
additional optimization for servers with big RAID arrays. So, please 
separate that into an add-on patch. It needs to be performance tests and 
reviewed separately.

> 4) Final emptying algorithm was reverted to the original one. My
> experiments shows that typical number of passes in final emptying in
> your version of patch is about 5. It may be significant itself. Also I
> haven't estimate of number of passes and haven't guarantees that it will
> not be high in some corner cases. I.e. I prefer more predictable
> single-pass algorithm in spite of it's a little more complex.

I was trying to get rid of that complexity during index build. Some 
extra code in the final pass would be easier to understand than extra 
work that needs to be done through the index build. It's not a huge 
amount of code, but still.

I'm not worried about the extra CPU overhead of scanning the data 
structures at the final pass. I guess in my patch you had to do extra 
I/O as well, because the buffers were not emptied in strict top-down 
order, so let's avoid that. How about:

Track all buffers in the lists, not only those that are non-empty. Add 
the buffer to the right list at getNodeBuffer(). That way in the final 
stage, you need to scan through all buffers instead of just the 
non-empty ones. But the overhead of that to should be minimal in 
practice, scanning some in-memory data structures is pretty cheap 
compared to building an index. That way you wouldn't need to maintain 
the lists during the index build, except for adding each buffer to 
correct lists in getNodeBuffer().

BTW, please use List for the linked lists. No need to re-implement the 
wheel.

> 5) Unloading even last page of node buffer from main memory to the disk.
> Imagine that that with levelstep = 1 each inner node has buffer. It
> seems to me that keeping one page of each buffer in memory may be memory
> consuming.
>
> Open items I see at this moment:
> 1) I dislike my switching to buffering build method because it's based
> on very unproven assumptions. And I didn't more reliable assumptions in
> scientific papers while. I would like to replace it with something much
> simplier. For example, switching to buffering build when regular build
> actually starts to produce a lot of IO. For this approach implementation
> I need to somehow detect actual IO (not just buffer read but miss of OS
> cache).

Yeah, that's a surprisingly hard problem. I don't much like the method 
used in the patch either.

> 2) I'm worrying about possible size of nodeBuffersTab and path items. If
> we imagine extremely large tree with levelstep = 1 size of this
> datastructures will be singnificant. And it's hard to predict that size
> without knowing of tree size.

I'm not very worried about that in practice. If you have a very large 
index, you presumably have a fair amount of memory too. Otherwise the 
machine is horrendously underpowered to build or do anything useful with 
the index anyway. Nevertheless it would nice to have some figures on 
that. If you have, say, an index of 1 TB in size, how much memory will 
building the index need?

Miscellaneous observations:

* Please run pgindent over the code, there's a lot of spurious 
whitespace in the patch.
* How about renaming GISTLoadedPartItem to something like 
GISTBulkInsertStack, to resemble the GISTInsertStack struct used in the 
normal insertion code. The "loaded part" nomenclature is obsolete, as 
the patch doesn't explicitly load parts of the tree into memory anymore. 
Think about the names of other structs, variables and functions too, 
GISTLoadedPartItem just caught my eye first but there's probably others 
that could have better names.
* Any user-visible options need to be documented in the user manual.
* And of course, make sure comments and the readme are up-to-date.
* Compiler warning:

reloptions.c:259: warning: initializer-string for array of chars is too long
reloptions.c:259: warning: (near initialization for 
‘stringRelOpts[0].default_val’)

I don't think there's a way to add an entry to stringRelOpts in a way 
that works correctly. That's a design flaw in the reloptions.c code that 
has never come up before, as there hasn't been any string-formatted 
relopts before (actually buffering option might be better served by an 
enum reloption too, if we had that). Please start a new thread on that 
on pgsql-hackers.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Anssi Kääriäinen
Дата:
Сообщение: Re: Transient plans versus the SPI API
Следующее
От: Hannu Krosing
Дата:
Сообщение: Re: Transient plans versus the SPI API