Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance

Поиск
Список
Период
Сортировка
От Claudio Freire
Тема Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance
Дата
Msg-id CAGTBQpY4e3na-pnHAnpRapHDgi+EJAbkOgjmrVY_Nkgw5o+ZHQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance  (Stephen Frost <sfrost@snowman.net>)
Список pgsql-hackers
On Wed, Jan 15, 2014 at 3:41 PM, Stephen Frost <sfrost@snowman.net> wrote:
> * Claudio Freire (klaussfreire@gmail.com) wrote:
>> But, still, the implementation is very similar to what postgres needs:
>> sharing a physical page for two distinct logical pages, efficiently,
>> with efficient copy-on-write.
>
> Agreed, except that KSM seems like it'd be slow/lazy about it and I'm
> guessing there's a reason the pagecache isn't included normally..

KSM does an active de-duplication. That's slow. This would be
leveraging KSM structures in the kernel (page sharing) but without all
the de-duplication logic.

>
>> So it'd be just a matter of removing that limitation regarding page
>> cache and shared pages.
>
> Any idea why that limitation is there?

No, but I'm guessing it's because nobody bothered to implement the
required copy-on-write in the page cache, which would be a PITA to
write - think of all the complexities with privilege checks and
everything - even though the benefits for many kinds of applications
would be important.

>> If you asked me, I'd implement it as copy-on-write on the page cache
>> (not the user page). That ought to be low-overhead.
>
> Not entirely sure I'm following this- if it's a shared page, it doesn't
> matter who starts writing to it, as soon as that happens, it need to get
> copied.  Perhaps you mean that the application should keep the
> "original" and that the page-cache should get the "copy" (or, really,
> perhaps just forget about the page existing at that point- we won't want
> it again...).
>
> Would that be a way to go, perhaps?  This does go back to the "make it
> act like mmap, but not *be* mmap", but the idea would be:
> open(..., O_ZEROCOPY_READ)
> read() - Goes to PG's shared buffers, pagecache and PG share the page
> page fault (PG writes to it) - pagecache forgets about the page
> write() / fsync() - operate as normal

Yep.



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: [PATCH] Negative Transition Aggregate Functions (WIP)
Следующее
От: Stephen Frost
Дата:
Сообщение: Re: Why conf.d should be default, and auto.conf and recovery.conf should be in it