Re: [PATCH 04/16] Add embedded list interface (header only)

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: [PATCH 04/16] Add embedded list interface (header only)
Дата
Msg-id 201206221112.47796.andres@2ndquadrant.com
обсуждение исходный текст
Ответ на Re: [PATCH 04/16] Add embedded list interface (header only)  (Peter Geoghegan <peter@2ndquadrant.com>)
Ответы Re: [PATCH 04/16] Add embedded list interface (header only)  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
On Friday, June 22, 2012 12:23:57 AM Peter Geoghegan wrote:
> On 20 June 2012 14:38, Andres Freund <andres@2ndquadrant.com> wrote:
> > It incurs a rather high performance overhead due to added memory
> > allocations and added pointer indirections. Thats fine for most of the
> > current users of the List interface, but certainly not for all. In other
> > places you cannot even have memory allocations because the list lives in
> > shared memory.
> Yes, in general lists interact horribly with the memory hierarchy. I
> think I pointed out to you once a rant of mine on -hackers a while
> back in which I made various points about just how badly they do these
> days.
Yes, but how is that relevant? Its still the best data structure for many use-
cases. Removing one of the two indirections is still a good idea, hence this 
patch ;)

> On modern architectures, with many layers of cache, the cost of the
> linear search to get an insertion point is very large. So this:
> 
> /*
>  * removes a node from a list
>  * Attention: O(n)
>  */
> static inline void ilist_s_remove(ilist_s_head *head,
>                                   ilist_s_node *node)
> 
> 
> is actually even worse than you might suspect.
O(n) is O(n), the constant is irrelevant. Anybody who uses arbitrary node 
removal in a single linked link in the fast path is deserves the pain ;)

> > Several of the pieces of code I pointed out in a previous email use
> > open-coded list implementation exactly to prevent those problems.
> 
> Interesting.
> 
> So, it seems like this list implementation could be described as a
> minimal embeddable list implementation that requires the user to do
> all the memory allocation, and offers a doubly-linked list too. Not an
> unreasonable idea. I do think that the constraints you have are not
> well served by any existing implementation, including List and Dllist.
Yep. Note though that you normally wouldn't do extra/manual memory allocation 
because you just use the already allocated memory of the struct where you 
embedded the list element into.

> Are you planning on just overhauling the Dllist interface in your next
> iteration?
It needs to be unified. Not yet sure whether its better to just remove Dllist 
or morph my code into it.

> All of the less popular compilers we support we support precisely
> because they pretend to be GCC, with the sole exception, as always, of
> the Microsoft product, in this case MSVC. So my position is that I'm
> in broad agreement that we should freely allow the use of inline
> without macro hacks, since we generally resists using macro hacks if
> that makes code ugly, which USE_INLINE certainly does, and for a
> benefit that is indistinguishable from zero, at least to me.
Tom already pointed out that not all compilers pretend to be gcc. I agree 
though that we should try to make all supported compilers support USE_INLINE. 
I think with some ugliness that should be possible at least for aCC. Will 
respond to Tom on that.

> Why are you using the stdlib's <assert.h>? Why have you used the
> NDEBUG macro rather than USE_ASSERT_CHECKING? This might make sense if
> the header was intended to live in port, but it isn't, right?
That should probably be removed, yes. I did it that way that it could be 
tested independently of casserts because the list checking code turns some 
linear algorithms into quadratic ones which is noticeable even when --enable-
cassert is defined.

> Why have you done this:
> 
> #ifdef __GNUC__
> #define unused_attr __attribute__((unused))
> #else
> #define unused_attr
> #endif
> 
> and then gone on to use this unused_attr macro all over the place?
> Firstly, that isn't going to suppress the warnings on many platforms
> that we support, and we do make an effort to build warning free on at
> least 3 compilers these days - GCC, Clang and MSVC. Secondly,
> compilers give these warnings because it doesn't make any sense to
> have an unused parameter - so why have you used one? At the very
> least, if you require this exact interface, use compatibility macros.
> I can't imagine why that would be important though. And even if you
> did want a standard unused_attr facility, you'd do that in c.h, where
> a lot of that stuff lives.
If you look at the places its mostly used in functions like:

/** adds a node at the beginning of the list*/
static inline void ilist_d_push_front(ilist_d_head *head, ilist_d_node *node)
{node->next = head->head.next;node->prev = &head->head;node->next->prev = node;head->head.next =
node;ilist_d_check(head);
}
Where ilist_d_check doesn't do anything if assertions aren't enabled which gcc 
unfortunately groks and warns.

The other case is functions like:

static inline void ilist_s_add_after(unused_attr ilist_s_head *head,                                    ilist_s_node
*after,ilist_s_node *node)
 
{node->next = after->next;after->next = node;
}
Where it makes sense for the api to get the head element for consistency 
reasons. It very well would be possible to add a checking function in there 
too. Its just easier to make single linked lists work correctly than double 
linked ones, thats why there is no check so far ;)

> You haven't put a copyright notice or description in this new file,
> which is required.
good point.

> So, I infer that by embeddable you mean that the intended interface of
> this is that someone writes a struct that has as its first field a
> ilist_d_head or a ilist_s_head, and through the magic of C's
> guarantees about how those structures will be laid-out, type punning
> can be used to traverse the unknown structure, as opposed to doing
> that weird ListCell thing - IIRC, Berkeley sockets uses this kind of
> inheritance. This isn't really obvious from the code, and if you
> engaged me in conversation and asked what an embeddable list was, I
> wouldn't have been able to tell you off the top of my head before
> today. The lack of comments generally should be worked on.
It should contain an example + some comments, yes. Let me scribble you 
together an example from the code using this:

First some structs, ignore the specific contents:

typedef struct ApplyCacheChange
{XLogRecPtr lsn;enum ApplyCacheChangeType action;ApplyCacheTupleBuf* newtuple;ApplyCacheTupleBuf* oldtuple;HeapTuple
table;
/* * While in use this is how a change is linked into a transactions, * otherwise its the preallocated
list.*/ilist_d_nodenode;
 
} ApplyCacheChange;

typedef struct ApplyCacheTXN
{TransactionId xid;
...ilist_d_head changes;
.../* * our position in a list of subtransactions while the TXN is in * use. Otherwise its the position in the list of
preallocated* transactions. */ilist_d_node node;
 
} ApplyCacheTXN;

struct ApplyCache
{
....ilist_d_head cached_changes;size_t nr_cached_changes;
....
}

------------
Two example functions:

ApplyCacheChange*
ApplyCacheGetChange(ApplyCache* cache)
{ApplyCacheChange* change;
if (cache->nr_cached_changes){    cache->nr_cached_changes--;    change = ilist_container(ApplyCacheChange, node,
                     ilist_d_pop_front(&cache->cached_changes));}else{       ...    }
 
...
}

void
ApplyCacheAddChange(ApplyCache* cache, TransactionId xid, XLogRecPtr lsn,                   ApplyCacheChange* change)
{ApplyCacheTXN* txn = ApplyCacheTXNByXid(cache, xid, true);txn->lsn = lsn;ilist_d_push_back(&txn->changes,
&change->node);
}

-----------
Explanation:

In ApplyCacheGetChange we pop an item from a list:

change = ilist_container(ApplyCacheChange, node,                    ilist_d_pop_front(&cache->cached_changes));

If you put that into individual parts:

ilist_d_node *n = ilist_d_pop_front(&cache->cached_changes);

removes one element from the list and returns a ilist_d_node. Obviously thats 
not really all that interesting for us because we want the "content" thats 
been stored there.
So we use:
change = ilist_container(ApplyCacheChange, node, n);

What that does is to use the offsetof() macro/builtin to calculate the offset
of the member 'node' in the struct 'ApplyCacheChange' and do the pointer math 
to subtract that from 'n'.
Which means that, without any further pointer indirection you could access the 
contents of the list element.

The other required example is:
ilist_d_push_back(&txn->changes, &change->node);

We push a change on the list of changes. But as the ilist machinery only deals 
with 'ilist_d_nodes' we pass it '&change->node' as a parameter.

Does that explanation make sense?

Due do the ilist_container trickery we can can have multiple list membership 
nodes in one struct. The important part is that the code that uses the lists 
needs to know in which struct the list element is embedded.

> 
> I am not sure that I like this inconsistency:
> 
>     ilist_d_foreach(t, head)
>     ilist_d_remove(head, t);
> 
> In the first call, head is specified and then node. In the second,
> node and then head. This could probably stand to be made consistent.
I think thats fine because it looks more like the traditional for loops. But 
if others don't agree...

Greetings,

Andres
-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Jeff Davis
Дата:
Сообщение: Re: SP-GiST for ranges based on 2d-mapping and quad-tree
Следующее
От: Andres Freund
Дата:
Сообщение: Re: [PATCH 04/16] Add embedded list interface (header only)