Re: LWLocks in DSM memory

Поиск
Список
Период
Сортировка
От Thomas Munro
Тема Re: LWLocks in DSM memory
Дата
Msg-id CAEepm=2Cx3Wi5u0V+U16WHPG8eHBTQMk8pMsoDGrDZ-rX9GMgw@mail.gmail.com
обсуждение исходный текст
Ответ на LWLocks in DSM memory  (Thomas Munro <thomas.munro@enterprisedb.com>)
Ответы Re: LWLocks in DSM memory  (Thomas Munro <thomas.munro@enterprisedb.com>)
Re: LWLocks in DSM memory  (Robert Haas <robertmhaas@gmail.com>)
Re: LWLocks in DSM memory  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers
On Sun, Jul 24, 2016 at 1:10 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> One solution could be to provide a non-circular variant of the dlist
> interface that uses NULL list termination.  I've attached a quick
> sketch of something like that which seems to work correctly.  It is
> only lightly tested so far and probably buggy, but shows the general
> idea.

On reflection, it wouldn't make much sense to put "noncircular" in the
names of interfaces, as that is an internal detail.  Maybe
"relocatable" or "position independent" or something like that, since
that's a user-visible property of the dlist_head.  Here is a version
of the patch that uses _relocatable.

> Any thoughts on this approach, or better ways to solve this problem?
> How valuable is the branch-free property of the circular push and
> delete operations?

I investigated this a bit.  If I do this on my laptop (clang, no
asserts, -O2),  it takes 3895 milliseconds, or 4.8ns per push/delete
operation:

        dlist_init(&head);
        for (i = 0; i < 100000000; ++i)
        {
            dlist_push_head(&head, &nodes[0]);
            dlist_push_tail(&head, &nodes[1]);
            dlist_push_head(&head, &nodes[2]);
            dlist_push_tail(&head, &nodes[3]);
            dlist_delete(&nodes[2]);
            dlist_delete(&nodes[3]);
            dlist_delete(&nodes[0]);
            dlist_delete(&nodes[1]);
        }

The relocatable version takes 5907 milliseconds, or 7.4ns per
push/delete operation:

        dlist_init_relocatable(&head);
        for (i = 0; i < 100000000; ++i)
        {
            dlist_push_head_relocatable(&head, &nodes[0]);
            dlist_push_tail_relocatable(&head, &nodes[1]);
            dlist_push_head_relocatable(&head, &nodes[2]);
            dlist_push_tail_relocatable(&head, &nodes[3]);
            dlist_delete_relocatable(&head, &nodes[2]);
            dlist_delete_relocatable(&head, &nodes[3]);
            dlist_delete_relocatable(&head, &nodes[0]);
            dlist_delete_relocatable(&head, &nodes[1]);
        }

Those operations are ~50% slower.  So getting rid of dlist's clever
branch-free code generally seems like a bad idea.

Next I wondered if it would be a bad idea to use slower relocatable
dlist heads for all LWLocks.  Obviously LWLocks are used all over the
place and it would be bad to slow them down, but I wondered if maybe
dlist manipulation might not be a very significant part of their work.
So I put a LWLock and a counter in shared memory, and had N background
workers run a tight loop that locks, increments and unlocks
concurrently until the counter reaches 1 billion.  This creates
different degrees of contention and wait list sizes.  The worker loop
looks like this:

    while (!finished)
    {
        LWLockAcquire(&control->lock, LW_EXCLUSIVE);
        ++control->counter;
        if (control->counter >= control->goal)
            finished = true;
        LWLockRelease(&control->lock);
    }

I measured the following times for unpatched master, on my 4 core laptop:

16 workers = 73.067s, 74.869s, 75.338s
8 workers  = 65.846s, 67.622s, 68.039s
4 workers  = 68.763s, 68.980s, 69.035s <-- curiously slower here
3 workers  = 59.701s, 59.991s, 60.133s
2 workers  = 53.620s, 55.300s, 55.790s
1 worker   = 21.578s, 21.535s, 21.598s

With the attached patched I got:

16 workers = 75.341s, 77.445s, 77.635s <- +3.4%
8 workers  = 67.462s, 68.622s, 68.851s <- +1.4%
4 workers  = 64.983s, 65.021s, 65.496s <- -5.7%
3 workers  = 60.247s, 60.425s, 60.492s <- +0.7%
2 workers  = 57.544s, 57.626s, 58.133s <- +2.3%
1 worker   = 21.403s, 21.486s, 21.661s <- -0.2%

Somewhat noisy data and different effects at different numbers of workers.

I can post the source for those tests if anyone is interested.  If you
have any other ideas for access patterns to test, or clever ways to
keep push and delete branch-free while also avoiding internal pointers
back to dlist_head, I'm all ears.  Otherwise, if a change affecting
all LWLocks turns out to be unacceptable, maybe we would need to have
a different LWLock interface for relocatable LWLocks to make them
suitable for use in DSM segments.  Any thoughts?

--
Thomas Munro
http://www.enterprisedb.com

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Alvaro Herrera
Дата:
Сообщение: Re: Re: GiST optimizing memmoves in gistplacetopage for fixed-size updates [PoC]
Следующее
От: Thomas Munro
Дата:
Сообщение: Re: LWLocks in DSM memory