Re: WIP: Fast GiST index build

Поиск

Список

Период

Сортировка

От	Alexander Korotkov
Тема	Re: WIP: Fast GiST index build
Дата	8 августа 2011 г. 07:19:04
Msg-id	CAPpHfdsovwbgs30TojJc_zDsc+0ghapm+-AALfP58UaHiYjr2A@mail.gmail.com обсуждение исходный текст
Ответ на	Re: WIP: Fast GiST index build (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
Ответы	Re: WIP: Fast GiST index build
Список	pgsql-hackers

Дерево обсуждения

On Mon, Aug 8, 2011 at 1:23 PM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote:

2) Neighbor relocation was added.

Ok. I think we're going to need some sort of a heuristic on when to enable neighbor relocation. If I remember the performance tests correctly, it improves the quality of the resulting index, but incurs a significant CPU overhead.

Actually, looking at the performance numbers on the wiki page again (http://wiki.postgresql.org/wiki/Fast_GiST_index_build_GSoC_2011), it looks like neighbor relocation doesn't help very much with the index quality - sometimes it even results in a slightly worse index. Based on those results, shouldn't we just remove it? Or is there some other data set where it helps significantly?

Oh, actually I didn't add some results with neighborrelocation = off. I would like to rerun some tests with current version of patch.

3) Subtree prefetching was added.

I'm inclined to leave out the prefetching code for now. Unless you have some performance numbers that show that it's critical for the overall performance. But I don't think that was the case, it's just an additional optimization for servers with big RAID arrays. So, please separate that into an add-on patch. It needs to be performance tests and reviewed separately.

I though that prefetch helps even on separate hard disks by ordering of IOs.

4) Final emptying algorithm was reverted to the original one. My
experiments shows that typical number of passes in final emptying in
your version of patch is about 5. It may be significant itself. Also I
haven't estimate of number of passes and haven't guarantees that it will
not be high in some corner cases. I.e. I prefer more predictable
single-pass algorithm in spite of it's a little more complex.

I was trying to get rid of that complexity during index build. Some extra code in the final pass would be easier to understand than extra work that needs to be done through the index build. It's not a huge amount of code, but still.

I'm not worried about the extra CPU overhead of scanning the data structures at the final pass. I guess in my patch you had to do extra I/O as well, because the buffers were not emptied in strict top-down order, so let's avoid that. How about:

Track all buffers in the lists, not only those that are non-empty. Add the buffer to the right list at getNodeBuffer(). That way in the final stage, you need to scan through all buffers instead of just the non-empty ones. But the overhead of that to should be minimal in practice, scanning some in-memory data structures is pretty cheap compared to building an index. That way you wouldn't need to maintain the lists during the index build, except for adding each buffer to correct lists in getNodeBuffer().

BTW, please use List for the linked lists. No need to re-implement the wheel.

Ok.

5) Unloading even last page of node buffer from main memory to the disk.
Imagine that that with levelstep = 1 each inner node has buffer. It
seems to me that keeping one page of each buffer in memory may be memory
consuming.

Open items I see at this moment:
1) I dislike my switching to buffering build method because it's based
on very unproven assumptions. And I didn't more reliable assumptions in
scientific papers while. I would like to replace it with something much
simplier. For example, switching to buffering build when regular build
actually starts to produce a lot of IO. For this approach implementation
I need to somehow detect actual IO (not just buffer read but miss of OS
cache).

Yeah, that's a surprisingly hard problem. I don't much like the method used in the patch either.

Is it possible to make buffering build a user defined option until we have a better idea?

2) I'm worrying about possible size of nodeBuffersTab and path items. If
we imagine extremely large tree with levelstep = 1 size of this
datastructures will be singnificant. And it's hard to predict that size
without knowing of tree size.

I'm not very worried about that in practice. If you have a very large index, you presumably have a fair amount of memory too. Otherwise the machine is horrendously underpowered to build or do anything useful with the index anyway. Nevertheless it would nice to have some figures on that. If you have, say, an index of 1 TB in size, how much memory will building the index need?

I think with points it would be about 1 million of buffers and about 100-300 megabytes of RAM depending on space utilization. It may be ok, because 1 TB is really huge index. But if maintenance_work_mem is low we can run out of it. Though maintenance_work_mem is quite strange for system with 1 TB indexes.

Miscellaneous observations:

* Please run pgindent over the code, there's a lot of spurious whitespace in the patch.
* How about renaming GISTLoadedPartItem to something like GISTBulkInsertStack, to resemble the GISTInsertStack struct used in the normal insertion code. The "loaded part" nomenclature is obsolete, as the patch doesn't explicitly load parts of the tree into memory anymore. Think about the names of other structs, variables and functions too, GISTLoadedPartItem just caught my eye first but there's probably others that could have better names.
* Any user-visible options need to be documented in the user manual.
* And of course, make sure comments and the readme are up-to-date.
* Compiler warning:

reloptions.c:259: warning: initializer-string for array of chars is too long
reloptions.c:259: warning: (near initialization for ‘stringRelOpts[0].default_val’)

I don't think there's a way to add an entry to stringRelOpts in a way that works correctly. That's a design flaw in the reloptions.c code that has never come up before, as there hasn't been any string-formatted relopts before (actually buffering option might be better served by an enum reloption too, if we had that). Please start a new thread on that on pgsql-hackers.

Ok.

------
With best regards,
Alexander Korotkov.

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: WIP: Fast GiST index build