Re: Configuring for maximum memory usage

Поиск

Список

Период

Сортировка

От	Scott Carey
Тема	Re: Configuring for maximum memory usage
Дата	31 октября 2008 г. 15:49:34
Msg-id	a1ec7d000810310847j10060c84m40c6e080ada59289@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Configuring for maximum memory usage (Christiaan Willemsen <cwillemsen@technocon.com>)
Список	pgsql-performance

Дерево обсуждения

On Thu, Oct 30, 2008 at 2:06 PM, Christiaan Willemsen <cwillemsen@technocon.com> wrote:

Thanks guys,

Lots of info here that I didn't know about! Since I have one of the latest Opensolaris builds, I guess the write throttle feature is already in there. Sadly, the blog doesn't say what build has it included.

If I recall correctly, it went in at about build 89 or so (I think the bottom of the link I provided has a comment to that effect). So its in there now, but not in OpenSolaris 2008.05.

For writes, I do everything synchronized because we really need a consistent database on disk. We can see that during large inserts, the intend log is used a lot.

The DB synchronizes the WAL log automatically, and the table and index data are written non-synchronously until the commit at the end of a checkpoint, in which case sync is called on them. This keeps things consistent on disk. With ZFS, each block written is always consistent, with a checksum kept in the parent block. There are no partial page writes, ever. In theory, you can disable full page writes on the WAL log if there is a bottleneck there since ZFS guarantees fully transactional consistent state of the file system, even if you have a RAID controller or hardware failure that causes a partial write. But WAL log activity is probably not your bottleneck so turning off full page writes on the WAL log is not necessary.

What I'm going to te testing is a smaller shared_buffers value, and a large ARC cache, and exactly the other way around.

Another question: since we have huge tables with hundreds of millions or rows, we partitioned the database (it actually is creating the partitions dynamically now on inserts with very good performance :D ), but the question is: is the size of the partions important for the memory parameters in config file? How can we determine the optimal size of the partition. obviously, when doing selects, you want those preferably only needing a single partition for speed. At the moment, that is for the majority of situations the case. But there might be some other things to think about...

Kind regards,

Christiaan

On Oct 30, 2008, at 7:27 PM, Scott Carey wrote:

On Thu, Oct 30, 2008 at 9:55 AM, Joshua D. Drake <jd@commandprompt.com> wrote:
On Thu, 2008-10-30 at 09:46 -0700, Scott Carey wrote:

>
> Remember that PostgreSQL doesn't cache anything on its own so
> if you do
> want to hit disk it has to be in file cache.
>
> By my understanding, this is absolutely false. Postgres caches pages
> from tables/indexes in shared_buffers. You can make this very large if
> you wish.

You can make it very large with a potentially serious performance hit.
It is very expensive to manage large amounts of shared buffers. It can
also nail your IO on checkpoint if you are not careful (even with
checkpoint smoothing). You are correct that I did not explain what I
meant very well because shared buffers are exactly that, shared
buffers.

You can slam your I/O by havnig too large of either OS file cache or shared_buffers, and you have to tune both.
In the case of large shared_buffers you have to tune postgres and especially the background writer and checkpoints.
In the case of a large OS cache, you have to tune parameters to limit the ammount of dirty pages there and force writes out smoothly.
Both layers attempt to delay writes for their own, often similar reasons, and suffer when a large sync comes along with lots of dirty data.

Recent ZFS changes have been made to limit this, (http://blogs.sun.com/roch/entry/the_new_zfs_write_throttle)
in earlier ZFS versions, this is what usually killed databases -- ZFS in some situations would delay writes too long (even if "long" is 5 seconds) and get in trouble. This still has to be tuned well, combined with good checkpoint tuning in Postgres as you mention. For Linux, there are similar issues that have to be tuned on many kernels, or up to 40% of RAM can fill with dirty pages not written to disk.

Letting the OS do it doesn't get rid of the problem, both levels of cache share very similar issues with large sizes and dirty pages followed by a sync.

The buffer cache in shared_buffers is a lot more efficient for large scanning queries -- A select count(*) test will be CPU bound if it comes from shared_buffers or the OS page cache, and in the former case I have seen it execute up to 50% faster than the latter, by avoiding calling out to the OS to get pages, purely as a result of less CPU used.

However that isn't the exact same thing as a "cache" at least as I was
trying to describe it. shared buffers are used to keep track of pages
(as well as some other stuff) and their current status. That is not the
same as caching a relation.

It is not possible to pin a relation to memory using PostgreSQL.
PostgreSQL relies on the operating system for that type of caching.

The OS can't pin a relation either, from its point of view its all just a bunch of disk data blocks, not relations -- so it is all roughly equivalent. The OS can do a bit better job at data prefetch on sequential scans or other predictable seek sequences (ARC stands for Adaptive Replacement Cache) than postgres currently does (no intelligent prefetch in postgres AFAIK).

So I apologize if I made it sound like Postgres cached the actual relation, its just pages -- but it is basically the same thing as the OS cache, but kept in process closer to the code that needs it. Its a cache that prevents disk reads.

My suggestion for the OP is to try it both ways, and see what is better for his workload / OS / Hardware combination.

Joshua D. Drake

--

В списке pgsql-performance по дате отправления:

Предыдущее

От: Gregory Stark
Дата: 31 октября 2008 г., 08:53:44
Сообщение: Re: Index usage problem on 8.3.3

Следующее

От: Jeff Frost
Дата: 31 октября 2008 г., 18:06:17
Сообщение: Re: Index usage problem on 8.3.3

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Configuring for maximum memory usage

Предыдущее

Следующее