Re: postgres files in use not staying in linux file cache

Поиск
Список
Период
Сортировка
От Brio
Тема Re: postgres files in use not staying in linux file cache
Дата
Msg-id CAM+G8pT=y0CAEw-fkCuniATQW9iNCURQRY-d0gXVrd-eVBvVJA@mail.gmail.com
обсуждение исходный текст
Ответ на postgres files in use not staying in linux file cache  (Brio <brianoraas@gmail.com>)
Список pgsql-performance
(Sorry, our last exchange forgot to cc the pgsql-performance list.)

Yes, I did see the original problem only when postgres was also accessing the file. But the issue is intermittent, so I can't reproduce on demand, so I'm only reporting what I saw a small number of times, and not necessarily (or likely) the whole story.

I've upgraded the kernel on my test machine, and I haven't seen the original problem. But I am seeing what looks like it might be the problem you describe, Jeff. Here's what I saw:

This machine has 64 GB of RAM. There was about 20 GB free, and the rest was mostly file cache, mostly our large 1TB database. I ran a script that did various reading and writing to the database, but mostly updated many rows over and over again to new updated values. As this script ran, the cached memory slowly dropped, and free memory increased. I now have 43 GB free! I'd expect practically any activity to leave files in the cache, and no significant evictions to occur until memory runs low. What actually happens is the cache increases gradually, and then drops down in chunks. I would think that the only file activity that would evict from cache would be deleting files, which would only happen when dropping tables (not happening in my test script), and also WAL file cycling, which should stay a constant amount of memory.

But, if blocks that are written are evicted from the cache, that would explain it, so I'd like to test that. As a very basic test, I tried:
cd /path-to-nfs-mount
echo "foo" > foo.txt
sync   # This command forces a write? I haven't really used it before
linux-fincore foo
shows the file is cached 100%.

Although you don't have the Perl script you mentioned, could you give a basic description of what it does, so I could try to recreate it? I'm not familiar with Perl, but I've done plenty of C programming, so demonstrating this with the actual Linux APIs would be ideal.

Thanks Jeff!



On Mon, Jun 23, 2014 at 3:56 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
On Wed, Jun 18, 2014 at 11:18 PM, Brio <brianoraas@gmail.com> wrote:
> Hi Jeff,
>
> That is interesting -- I hadn't thought about how a read-only index scan
> might actually write the index.
>
> But, to avoid effects like that, that's why I dropped down to simply using
> "cat" on the file, and I saw the same problem there, with no writing back.

I thought that you saw the same problem with cat only when it was
running concurrently with the index scan, and when the index scan
stopped the problem in cat went away.

> So the problem really seemed to be in Linux, not Postgres.
>
> But why would dirty blocks of NetApp-served files get dropped from the Linux
> page cache as soon as they are written back to the NetApp? Is it a bug in
> the NetApp driver? Isn't the driver just NFS?

I don't know why it would do that, it never made much sense to me.
But that is what the experimental evidence indicated.

What I was using was NetApp on the back-end and just the plain linux
NFS driver on the client end, and I assume the problem was on the
client end.  (Maybe you can get a custom client driver from Net-App
designed to work specifically with their server, but if so, I didn't
do that.  For that matter, maybe just the default linux NFS driver has
improved.)

> That sounds like a serious
> issue. Is there any online documentation of bugs like that with NetApp?

Yes, it was a serious issue for one intended use.  But it is was
partially mitigated by the fact that I would probably never run an
important production database over NFS anyway, out of corruption
concerns.  I was hoping to use it just for testing purposes, but this
limit made it rather useless for that as well.  I don't think it would
be a NetApp specific issue and didn't approach it from that angle,
just that NetApp didn't save from the issue.

Cheers,

Jeff

В списке pgsql-performance по дате отправления:

Предыдущее
От: "Huang, Suya"
Дата:
Сообщение: Re: huge pgstat.stat file on PostgreSQL 8.3.24
Следующее
От: Niels Kristian Schjødt
Дата:
Сообщение: Guidelines on best indexing strategy for varying searches on 20+ columns