Report: Linux huge pages with Postgres

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Report: Linux huge pages with Postgres
Дата
Msg-id 28895.1290886032@sss.pgh.pa.us
обсуждение исходный текст
Ответы Re: Report: Linux huge pages with Postgres  (Robert Haas <robertmhaas@gmail.com>)
Re: Report: Linux huge pages with Postgres  (Simon Riggs <simon@2ndQuadrant.com>)
Re: Report: Linux huge pages with Postgres  (Kenneth Marshall <ktm@rice.edu>)
Re: Report: Linux huge pages with Postgres  (Jonathan Corbet <corbet@lwn.net>)
Список pgsql-hackers
We've gotten a few inquiries about whether Postgres can use "huge pages"
under Linux.  In principle that should be more efficient for large shmem
regions, since fewer TLB entries are needed to support the address
space.  I spent a bit of time today looking into what that would take.
My testing was done with current Fedora 13, kernel version
2.6.34.7-61.fc13.x86_64 --- it's possible some of these details vary
across other kernel versions.

You can test this with fairly minimal code changes, as illustrated in
the attached not-production-grade patch.  To select huge pages we have
to include SHM_HUGETLB in the flags for shmget(), and we have to be
prepared for failure (due to permissions or lack of allocated
hugepages).  I made the code just fall back to a normal shmget on
failure.  A bigger problem is that the shmem request size must be a
multiple of the system's hugepage size, which is *not* a constant
even though the test patch just uses 2MB as the assumed value.  For a
production-grade patch we'd have to scrounge the active value out of
someplace in the /proc filesystem (ick).

In addition to the code changes there are a couple of sysadmin
requirements to make huge pages available to Postgres:

1. You have to configure the Postgres user as a member of the group
that's permitted to allocate hugepage shared memory.  I did this:
sudo sh -c "id -g postgres >/proc/sys/vm/hugetlb_shm_group"
For production use you'd need to put this in the PG initscript,
probably, to ensure it gets re-set after every reboot and before PG
is started.

2. You have to manually allocate some huge pages --- there doesn't
seem to be any setting that says "just give them out on demand".
I did this:
sudo sh -c "echo 600 >/proc/sys/vm/nr_hugepages"
which gave me a bit over 1GB of space reserved as huge pages.
Again, this'd have to be done over again at each system boot.

For testing purposes, I figured that what I wanted to stress was
postgres process swapping and shmem access.  I built current git HEAD
with --enable-debug and no other options, and tested with these
non-default settings:
 shared_buffers        1GB
 checkpoint_segments    50
 fsync            off
(fsync intentionally off since I'm not trying to measure disk speed).
The test machine has two dual-core Nehalem CPUs.  Test case is pgbench
at -s 25; I ran several iterations of "pgbench -c 10 -T 60 bench"
in each configuration.

And the bottom line is: if there's any performance benefit at all,
it's on the order of 1%.  The best result I got was about 3200 TPS
with hugepages, and about 3160 without.  The noise in these numbers
is more than 1% though.

This is discouraging; it certainly doesn't make me want to expend the
effort to develop a production patch.  However, perhaps someone else
can try to show a greater benefit under some other test conditions.

            regards, tom lane

*** src/backend/port/sysv_shmem.c.orig    Wed Sep 22 18:57:31 2010
--- src/backend/port/sysv_shmem.c    Sat Nov 27 13:39:46 2010
***************
*** 33,38 ****
--- 33,39 ----
  #include "miscadmin.h"
  #include "storage/ipc.h"
  #include "storage/pg_shmem.h"
+ #include "storage/shmem.h"


  typedef key_t IpcMemoryKey;        /* shared memory key passed to shmget(2) */
***************
*** 75,80 ****
--- 76,92 ----
      IpcMemoryId shmid;
      void       *memAddress;

+ #ifdef SHM_HUGETLB
+     /* request must be multiple of page size, else shmat() will fail */
+ #define HUGE_PAGE_SIZE (2 * 1024 * 1024)
+     size = add_size(size, HUGE_PAGE_SIZE - (size % HUGE_PAGE_SIZE));
+
+     shmid = shmget(memKey, size,
+                    SHM_HUGETLB | IPC_CREAT | IPC_EXCL | IPCProtection);
+     if (shmid >= 0)
+         elog(LOG, "shmget with SHM_HUGETLB succeeded");
+     else
+ #endif
      shmid = shmget(memKey, size, IPC_CREAT | IPC_EXCL | IPCProtection);

      if (shmid < 0)

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Dimitri Fontaine
Дата:
Сообщение: Re: ALTER OBJECT any_name SET SCHEMA name
Следующее
От: Bruce Momjian
Дата:
Сообщение: Re: GiST insert algorithm rewrite