Обсуждение: Overflow of bgwriter's request queue

Поиск
Список
Период
Сортировка

Overflow of bgwriter's request queue

От
ITAGAKI Takahiro
Дата:
Hi Hackers,

I encountered overflow of bgwriter's file-fsync request queue. It occurred
during checkpoints. Each backend would call fsync disorderly in such cases,
so that the checkpoint takes a long time and the performance has decreased.
It seems to happen frequently on the machines with a lot of memories and
poor disks.

I assume that the cause of this problem is that AbsorbFsyncRequests is not
called for a long time during checkpoints. The attached patch is one of
the solutions for it. It eliminates duplicate requests when the queue
is full, with a simple sort and unique technique.


I hope this problem will be solved by some methods.

---
ITAGAKI Takahiro
NTT Cyber Space Laboratories

Вложения

Re: Overflow of bgwriter's request queue

От
Tom Lane
Дата:
ITAGAKI Takahiro <itagaki.takahiro@lab.ntt.co.jp> writes:
> I encountered overflow of bgwriter's file-fsync request queue. It occurred
> during checkpoints. Each backend would call fsync disorderly in such cases,
> so that the checkpoint takes a long time and the performance has decreased.
> It seems to happen frequently on the machines with a lot of memories and
> poor disks.

I can't help thinking that this is a situation that could only be got
into with a seriously misconfigured database --- per the comments for
ForwardFsyncRequest, we really don't want this code to run at all,
let alone run so often that a queue with NBuffers entries overflows.
What exactly are the test conditions under which you're seeing this
happen?

If there actually is a problem that needs to be solved, I think it'd be
better to try to do AbsorbFsyncRequests somewhere in the main checkpoint
loops.  I don't like the idea of holding the BgWriterCommLock long
enough to do a qsort ... especially not if this occurs only with very
large NBuffers settings.  Also, what if the qsort fails to eliminate any
duplicates, or eliminates only a few?  You could get into a scenario
where the qsort gets repeated every few ForwardFsyncRequest calls, in
which case it'd become a drag on performance itself.  (See also recent
discussion with Qingqing about converting BgWriterCommLock to a
spinlock.  Though I was against that because no performance problem had
been shown, it could still become something we want to do ... but
putting a qsort here would foreclose that option.)
        regards, tom lane


Re: Overflow of bgwriter's request queue

От
ITAGAKI Takahiro
Дата:
I'm sorry when you have received mails of the same content. I had sent
a mail but it seemed not to be delivered, so I'll send it again.


Tom Lane <tgl@sss.pgh.pa.us> wrote:

> > I encountered overflow of bgwriter's file-fsync request queue.
> I can't help thinking that this is a situation that could only be got
> into with a seriously misconfigured database --- per the comments for
> ForwardFsyncRequest, we really don't want this code to run at all,
> let alone run so often that a queue with NBuffers entries overflows.
> What exactly are the test conditions under which you're seeing this
> happen?

It happened at the two environments. [1] TPC-C(DBT-2) / RHEL4 U1 (2.6.9-11)     XFS, 8 S-ATA disks / 8GB
memory(shmem=512MB)[2] TPC-C(DBT-2) / RHEL4 U2 (2.6.9-22)     XFS, 6 SCSI disks / 6GB memory(shmem=1GB)
 

I think it is not so bad configuration. There seems to be a problem in
the combination of XFS and heavy update workloads, but the total throuput
at XFS with my patch was better than ext3.

I suspect that NBuffers for the queue length is not enough. If all buffers
are dirty, ForwardFsyncRequest would be called more than NBuffers times
during BufferSync, so the queue could become full.


> If there actually is a problem that needs to be solved, I think it'd be
> better to try to do AbsorbFsyncRequests somewhere in the main checkpoint
> loops.  I don't like the idea of holding the BgWriterCommLock long
> enough to do a qsort ... especially not if this occurs only with very
> large NBuffers settings.

Ok, I agree. I sent PATCHES a patch that calls AbsorbFsyncRequests
in the loops of BufferSync and mdsync.


> Also, what if the qsort fails to eliminate any
> duplicates, or eliminates only a few?  You could get into a scenario
> where the qsort gets repeated every few ForwardFsyncRequest calls, in
> which case it'd become a drag on performance itself.

Now, I think the above solution is better than qsort, but qsort will also
work not so bad. NBuffers is at least one thousand, while the count of files
that needs fsync is at most hundreds, so duplidate elimination will work well.
In fact, in my machine, the queue became full twice in a checkpoint and
length of the queue decreased from 65536 to *32* by duplicate eliminations.

---
ITAGAKI Takahiro
NTT Cyber Space Laboratories