Re: GIN improvements part 1: additional information

Поиск

Список

Период

Сортировка

От	Alexander Korotkov
Тема	Re: GIN improvements part 1: additional information
Дата	10 декабря 2013 г. 22:44:27
Msg-id	CAPpHfdtRAxaq8mShtpd4mh5R0=5hP900mBJNU5TnyaeM44EEyA@mail.gmail.com обсуждение исходный текст
Ответ на	Re: GIN improvements part 1: additional information (Heikki Linnakangas <hlinnakangas@vmware.com>)
Ответы	Re: GIN improvements part 1: additional information
Список	pgsql-hackers

Дерево обсуждения

On Tue, Dec 10, 2013 at 12:26 AM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:

On 12/09/2013 11:34 AM, Alexander Korotkov wrote:
On Mon, Dec 9, 2013 at 1:18 PM, Heikki Linnakangas
<hlinnakangas@vmware.com>wrote:

Even if we use varbyte encoding, I wonder if it would be better to treat
block + offset number as a single 48-bit integer, rather than encode them
separately. That would allow the delta of two items on the same page to be
stored as a single byte, rather than two bytes. Naturally it would be a
loss on other values, but would be nice to see some kind of an analysis on
that. I suspect it might make the code simpler, too.

Yeah, I had that idea, but I thought it's not a better option. Will try to
do some analysis.

The more I think about that, the more convinced I am that it's a good idea. I don't think it will ever compress worse than the current approach of treating block and offset numbers separately, and, although I haven't actually tested it, I doubt it's any slower. About the same amount of arithmetic is required in both versions.

Attached is a version that does that. Plus some other minor cleanup.

(we should still investigate using a completely different algorithm, though)

Yes, when I though about that, I didn't realize that we can reserve less than 16 bits for offset number.

I rerun my benchmark and got following results:

event | period

-----------------------+-----------------

index_build | 00:01:46.39056

index_build_recovery | 00:00:05

index_update | 00:06:01.557708

index_update_recovery | 00:01:23

search_new | 00:24:05.600366

search_updated | 00:25:29.520642

(6 rows)

label | blocks_mark

----------------+-------------

search_new | 847509920

search_updated | 883789826

(2 rows)

label | size

---------------+-----------

new | 364560384

after_updates | 642736128

(2 rows)

Speed is same while index size is less. In previous format it was:

label | size

---------------+-----------

new | 419299328

after_updates | 715915264

(2 rows)

Good optimization, thanks. I'll try another datasets but I expect similar results.

However, patch didn't apply to head. Corrected version is attached.

------
With best regards,
Alexander Korotkov.

Вложения

gin-packed-postinglists-20.patch.gz

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Andres Freund
Дата: 10 декабря 2013 г., 22:28:48
Сообщение: Re: ANALYZE sampling is too good

Следующее

От: Peter Geoghegan
Дата: 10 декабря 2013 г., 22:49:18
Сообщение: Re: ANALYZE sampling is too good

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: GIN improvements part 1: additional information

Вложения

Предыдущее

Следующее