Re: Reducing the size of BufferTag & remodeling forks
От | Andres Freund |
---|---|
Тема | Re: Reducing the size of BufferTag & remodeling forks |
Дата | |
Msg-id | 20150912201941.GA8311@alap3.anarazel.de обсуждение исходный текст |
Ответ на | Re: Reducing the size of BufferTag & remodeling forks (Simon Riggs <simon@2ndQuadrant.com>) |
Ответы |
Re: Reducing the size of BufferTag & remodeling forks
Re: Reducing the size of BufferTag & remodeling forks |
Список | pgsql-hackers |
On 2015-09-12 13:12:26 +0100, Simon Riggs wrote: > Why do we have to do buffer lookups using the full buffer tag? We don't necessarily. > Why not just use (relNode, blockNum) and resolve hash collisions, if any? I tried that and unfortunately it came out as a negative - the number of collision gets higher and collisions in a hashtable will often imply a cache miss. You can even see the comparison function rising in the profile. I've actually changed course a bit and I'm trying something different: A two level structure. One hashtable that maps (RelFileNode, ForkNumber) to a 'open relation' data structure, and from there a radix tree over just the block number. To avoid having to look up in the hashtable frequently there's a pointer in RelationData to a fork's radix tree. This seems to have several advantages: * It doesn't require changes to the physical representation * A radix tree allows ordered traversal, allowing for efficient prefetching et al. * The ability to efficiently get all pages that belong to a relation makes dropping/truncating tables radically more efficientthan now. * A single lookup in the radix tree is on average significantly faster than in the hash table. I've measured ~350 vs 60 cyclesin a nonconcurrent workload and ~820 vs 72~ cycles in a concurrent workload, using a recorded buffer access traces. * Having a 'open relation' representation in shared memory also makes it much easier to implement more efficient relationextension and truncation - we can have pointers in there that signal how far the relation has been extended and howfar it has been initialized. The biggest disadvantage is that it's a good bit more complex than what we have right now. That we need dynamically many radix trees makes memory management nontrivial. Sounds halfway sane? Andres
В списке pgsql-hackers по дате отправления: