Re: Reducing the size of BufferTag & remodeling forks

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: Reducing the size of BufferTag & remodeling forks
Дата
Msg-id 20150912201941.GA8311@alap3.anarazel.de
обсуждение исходный текст
Ответ на Re: Reducing the size of BufferTag & remodeling forks  (Simon Riggs <simon@2ndQuadrant.com>)
Ответы Re: Reducing the size of BufferTag & remodeling forks  (Simon Riggs <simon@2ndQuadrant.com>)
Re: Reducing the size of BufferTag & remodeling forks  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Список pgsql-hackers
On 2015-09-12 13:12:26 +0100, Simon Riggs wrote:
> Why do we have to do buffer lookups using the full buffer tag?

We don't necessarily.

> Why not just use (relNode, blockNum) and resolve hash collisions, if any?

I tried that and unfortunately it came out as a negative - the number of
collision gets higher and collisions in a hashtable will often imply a
cache miss. You can even see the comparison function rising in the
profile.

I've actually changed course a bit and I'm trying something different: A
two level structure. One hashtable that maps (RelFileNode, ForkNumber)
to a 'open relation' data structure, and from there a radix tree over
just the block number. To avoid having to look up in the hashtable
frequently there's a pointer in RelationData to a fork's radix tree.

This seems to have several advantages:
* It doesn't require changes to the physical representation
* A radix tree allows ordered traversal, allowing for efficient prefetching et al.
* The ability to efficiently get all pages that belong to a relation makes dropping/truncating tables radically more
efficientthan now.
 
* A single lookup in the radix tree is on average significantly faster than in the hash table. I've measured ~350 vs 60
cyclesin a nonconcurrent workload and ~820 vs 72~ cycles in a concurrent workload, using a recorded buffer access
traces.
* Having a 'open relation' representation in shared memory also makes it much easier to implement more efficient
relationextension and truncation - we can have pointers in there that signal how far the relation has been extended and
howfar it has been initialized.
 

The biggest disadvantage is that it's a good bit more complex than what
we have right now. That we need dynamically many radix trees makes
memory management nontrivial.

Sounds halfway sane?

Andres



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Pavel Stehule
Дата:
Сообщение: Re: On-demand running query plans using auto_explain and signals
Следующее
От: Stephen Frost
Дата:
Сообщение: Re: typo in create policy doc