tbm_lossify causes "unbalanced" hashtables / bitmaps

Поиск

Список

Период

Сортировка

От	Andres Freund
Тема	tbm_lossify causes "unbalanced" hashtables / bitmaps
Дата	23 сентября 2016 г. 23:58:50
Msg-id	20160923205843.zcs533sqdzlh4cpo@alap3.anarazel.de обсуждение исходный текст
Ответы	Re: tbm_lossify causes "unbalanced" hashtables / bitmaps Re: tbm_lossify causes "unbalanced" hashtables / bitmaps
Список	pgsql-hackers

Дерево обсуждения

Hi,

Playing around with my hashtable implementation [1] and using it for
bitmapscans/tidbitmap.c I noticed something curious. When using small
work_mem (4MB in my case) for large-ish tables (~5GB), the performance
tanks. That's primarily caused by some issues in the hashtable code
(since fixed), but it made me look more at the relevant tidbitmap
code. Namely tbm_lossify():

static void
tbm_lossify(TIDBitmap *tbm)
{
...pagetable_start_iterate_random(tbm->pagetable, &i);while ((page = pagetable_iterate(tbm->pagetable, &i)) != NULL){
if (page->ischunk)        continue;            /* already a chunk header */
 
    /*     * If the page would become a chunk header, we won't save anything by     * converting it to lossy, so skip
it.    */    if ((page->blockno % PAGES_PER_CHUNK) == 0)        continue;
 
    /* This does the dirty work ... */    tbm_mark_page_lossy(tbm, page->blockno);
    if (tbm->nentries <= tbm->maxentries / 2)    {        /* we have done enough */        break;    }
}

tbm_mark_page_lossy() in turn deletes the individual page entry, and
adds one for the chunk (spanning several pages).

The relevant part is that the loop stops when enough page ranges have
been lossified.  This leads to some problems:

Because iteration (both in my implementation and dynahash) is
essentially in bucket order, the front of the hashtable will be mostly
empty, whereas later parts of the hashtable will be full (i.e. a lot of
collisions). The reason for that is that we'll find all parts of the
hashtable that are uncompressed and "early" in the hashspace, and
they'll possibly hash to something later in the table.  As the number of
entries in the hashtable doesn't increase, we'll never grow the
hashtable to decrease the number of collisions.  I've seen that cause
quite noticeable number of conflicts (which of course are worse in open
addressing table, rather than a separately chained one).

Secondly it'll lead to pretty random parts of the bitmap being
compressed. For performance it'd be good to compress "heavily used"
areas of the bitmap, instead of just whatever happens to be early in the
hash space.


I'm not entirely sure how to resolve that best, but it does seem worth
addressing.  One thing that might be nice is to record the last N
insertions once at 90% full or so, and then lossify in a more targeted
manner using those. E.g. lossify block ranges (256 long atm) that
contain a lot of individual entries or such.

Greetings,

Andres Freund

[1] http://archives.postgresql.org/message-id/20160727004333.r3e2k2y6fvk2ntup%40alap3.anarazel.de

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Joe Conway
Дата: 23 сентября 2016 г., 23:35:47
Сообщение: Re: PG 9.6.0 release schedule

Следующее

От: Andres Freund
Дата: 24 сентября 2016 г., 00:26:02
Сообщение: Re: tbm_lossify causes "unbalanced" hashtables / bitmaps

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

tbm_lossify causes "unbalanced" hashtables / bitmaps

Предыдущее

Следующее