Re: Shared locking in slru.c
| От | Tom Lane |
|---|---|
| Тема | Re: Shared locking in slru.c |
| Дата | |
| Msg-id | 2237.1133814758@sss.pgh.pa.us обсуждение исходный текст |
| Ответ на | Shared locking in slru.c (Tom Lane <tgl@sss.pgh.pa.us>) |
| Список | pgsql-hackers |
I wrote:
> The way the attached patch attacks this is for the shared-lock access
> case to simply set the page's LRU counter to zero, without bumping up
> the LRU counters of the other pages as the normal adjustment would do.
> ...
> I'm not totally happy with this heuristic, though, and was
> wondering if anyone had a better idea. Anyone seen a lock-free
> data structure for LRU or approximately-LRU state tracking?
I've come up with what seems a slightly better way to do this. The idea
is to replace the current structure for tracking which page is the least-
recently-used with this:
typedef struct SlruSharedData
{ ...
/*---------- * We mark a page "most recently used" by setting * page_lru_count[slotno] =
++cur_lru_count; * The oldest page is therefore the one with the highest value of * cur_lru_count -
page_lru_count[slotno] *---------- */ int cur_lru_count;
...
int page_lru_count[NUM_SLRU_BUFFERS];
...
which makes the SlruRecentlyUsed macro look like
#define SlruRecentlyUsed(shared, slotno) \ do { \ int new_lru_count = (shared)->cur_lru_count; \
if(new_lru_count != (shared)->page_lru_count[slotno]) { \ (shared)->cur_lru_count = ++new_lru_count; \
(shared)->page_lru_count[slotno] = new_lru_count; \ } \ } while (0)
and SlruSelectLRUPage() has to look for the maximum value of
"cur_count - shared->page_lru_count[slotno]" rather than just
"shared->page_lru_count[slotno]" as before. This seems like a win
in any case since it takes cycles out of the commonly used path at
a small increase in the cost of SlruSelectLRUPage --- but in that
routine you are about to do I/O anyway, so a few extra subtractions
are negligible.
However, the real reason for doing this is that I think it's OK for
the proposed SimpleLruReadPage_ReadOnly routine to apply this form
of SlruRecentlyUsed even though it holds only a shared lock. Assuming
that int reads and writes are atomic, the only possible failures are
1. If a process running the macro is delayed, it might store a stale
(hence too small) value back into cur_lru_count or a page_lru_count
array element after someone else has incremented them to a larger value.
2. Two processes might read the same cur_lru_count value at the same
time, so that one of their increments is lost. This has the same end
effect as #1, though.
Given reasonable care in SlruSelectLRUPage (see attached excerpt), we
can defend against these scenarios and usually still make a reasonable
choice of which page to evict. In any case, the worst possible scenario
is that we make a nonoptimal choice of page to evict due to confused
lru_count state. This price seems well worth the chance to reduce
contention for shared memory.
Thoughts, objections?
regards, tom lane
/* * If we find any EMPTY slot, just select that one. Else locate the * least-recently-used slot
toreplace. * * Normally the page_lru_count values will all be different and so * there will be a
well-definedLRU page. But since we allow * concurrent execution of SlruRecentlyUsed() within *
SimpleLruReadPage_ReadOnly(),it is possible that multiple pages * acquire the same lru_count values. In that
casewe break ties by * choosing the furthest-back page. * * In no case will we select the slot
containinglatest_page_number * for replacement, even if it appears least recently used. * * Notice
thatthis next line forcibly advances cur_lru_count to a * value that is certainly beyond any value that will be
inthe * page_lru_count array after the loop finishes. This ensures that * the next execution of
SlruRecentlyUsedwill give us good data, * even if it's against a page that has the current counter value.
*/ cur_count = (shared->cur_lru_count)++; best_delta = -1; bestslot = 0; /* no-op, just
keepscompiler quiet */ best_page_number = 0; /* ditto */ for (slotno = 0; slotno < NUM_SLRU_BUFFERS;
slotno++) { int this_delta; int this_page_number;
if (shared->page_status[slotno] == SLRU_PAGE_EMPTY) return slotno; this_delta =
cur_count- shared->page_lru_count[slotno]; if (this_delta < 0) { /* *
Cleanup in case shared updates have caused cur_count * increments to get "lost". We back off the page
counts, * rather than trying to increase cur_count, to avoid any * question of infinite
loopsor failure in the presence of * wrapped-around counts. */
shared->page_lru_count[slotno]= cur_count; this_delta = 0; } this_page_number =
shared->page_number[slotno]; if ((this_delta > best_delta || (this_delta == best_delta &&
ctl->PagePrecedes(this_page_number, best_page_number))) && this_page_number !=
shared->latest_page_number) { bestslot = slotno; best_delta = this_delta;
best_page_number = this_page_number; } }
В списке pgsql-hackers по дате отправления: