Re: GIN improvements part 1: additional information

Поиск
Список
Период
Сортировка
От Alexander Korotkov
Тема Re: GIN improvements part 1: additional information
Дата
Msg-id CAPpHfds3QRf91-ubhHKrRg-MVKKzUoL=Zaqt-_OTM74j9ZZRAw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: GIN improvements part 1: additional information  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Список pgsql-hackers
On Mon, Sep 16, 2013 at 11:43 AM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:
On 15.09.2013 12:14, Alexander Korotkov wrote:
On Sat, Jun 29, 2013 at 12:56 PM, Heikki Linnakangas<
hlinnakangas@vmware.com>  wrote:

There's a few open questions:

1. How are we going to handle pg_upgrade? It would be nice to be able to
read the old page format, or convert on-the-fly. OTOH, if it gets too
complicated, might not be worth it. The indexes are much smaller with the
patch, so anyone using GIN probably wants to rebuild them anyway, sooner or
later. Still, I'd like to give it a shot.

2. The patch introduces a small fixed 32-entry index into the packed
items. Is that an optimal number?

3. I'd like to see some performance testing of insertions, deletions, and
vacuum. I suspect that maintaining the 32-entry index might be fairly
expensive, as it's rewritten on every update to a leaf page.

It appears that maintaining 32-entry index is really expensive because it
required re-decoding whole page. This issue is fixed in attached version of
patch by introducing incremental updating of that index. Benchmarks will be
posted later today..

Great! Please also benchmark WAL replay; you're still doing non-incremental update of the 32-entry index in ginRedoInsert.

A couple of quick comments after a quick glance (in addition to the above):

The varbyte encoding stuff is a quite isolated piece of functionality. And potentially useful elsewhere, too. It would be good to separate that into a separate .c/.h files. I'm thinking of src/backend/access/gin/packeditemptr.c, which would contain ginDataPageLeafReadItemPointer, ginDataPageLeafWriteItemPointer and ginDataPageLeafGetItemPointerSize functions. A new typedef for varbyte-encoded things would probably be good too, something like "typedef char *PackedItemPointer". In the new .c file, please also add a file-header comment explaining the encoding.

The README really needs to be updated. The posting tree page structure is a lot more complicated now, there needs to be a whole new section to explain it. A picture would be nice, similar to the one in bufpage.h.

It's a bit funny that we've clumped together all different kinds of GIN insertions into one WAL record type. ginRedoInsert does completely different things depending on what kind of a page it is. And the ginXlogInsert struct also contains different kinds of things depending on what kind of an insertion it is. It would be cleaner to split it into three. (this isn't your patch's fault - it was like that before, too.)

Apparently some bugs breaks index on huge updates independent on incremental update of the 32-entry. I'm in debugging now. That's why benchmarks are delayed.

------
With best regards,
Alexander Korotkov.

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Heikki Linnakangas
Дата:
Сообщение: Re: GIN improvements part 1: additional information
Следующее
От: Heikki Linnakangas
Дата:
Сообщение: Re: patch: add MAP_HUGETLB to mmap() where supported (WIP)