Re: Improvement of checkpoint IO scheduler for stable transaction responses

Поиск

Список

Период

Сортировка

От	Gavin Flower
Тема	Re: Improvement of checkpoint IO scheduler for stable transaction responses
Дата	3 июля 2013 г. 19:23:57
Msg-id	51D47A17.6000809@archidevsys.co.nz обсуждение исходный текст
Ответ на	Re: Improvement of checkpoint IO scheduler for stable transaction responses (Robert Haas <robertmhaas@gmail.com>)
Список	pgsql-hackers

Дерево обсуждения

<div class="moz-cite-prefix">On 04/07/13 01:31, Robert Haas wrote:<br /></div><blockquote
cite="mid:CA+TgmoZsh0zRdLoPh+PaGswMKqHRLZcAb89O+XRQLhSsjYOaYg@mail.gmail.com"type="cite"><pre wrap="">On Wed, Jul 3,
2013at 4:18 AM, KONDO Mitsumasa
 
<a class="moz-txt-link-rfc2396E" href="mailto:kondo.mitsumasa@lab.ntt.co.jp"><kondo.mitsumasa@lab.ntt.co.jp></a>
wrote:
</pre><blockquote type="cite"><pre wrap="">I tested and changed segsize=0.25GB which is max partitioned table file size
and
default setting is 1GB in configure option (./configure --with-segsize=0.25).
Because I thought that small segsize is good for fsync phase and background disk
write in OS in checkpoint. I got significant improvements in DBT-2 result!
</pre></blockquote><pre wrap="">
This is interesting.  Unfortunately, it has a significant downside:
potentially, there will be a lot more files in the data directory.  As
it is, the number of files that exist there today has caused
performance problems for some of our customers.  I'm not sure off-hand
to what degree those problems have been related to overall inode
consumption vs. the number of files in the same directory.

If the problem is mainly with number of of files in the same
directory, we could consider revising our directory layout.  Instead
of:

base/${DBOID}/${RELFILENODE}_{FORK}

We could have:

base/${DBOID}/${FORK}/${RELFILENODE}

That would move all the vm and fsm forks to separate directories,
which would cut down the number of files in the main-fork directory
significantly.  That might be worth doing independently of the issue
you're raising here.  For large clusters, you'd even want one more
level to keep the directories from getting too big:

base/${DBOID}/${FORK}/${X}/${RELFILENODE}

...where ${X} is two hex digits, maybe just the low 16 bits of the
relfilenode number.  But this would be not as good for small clusters
where you'd end up with oodles of little-tiny directories, and I'm not
sure it'd be practical to smoothly fail over from one system to the
other.

</pre></blockquote><font size="-1">16 bits ==> 4 hex digits<br /><br /><font size="-1">Could you perhaps start <font
size="-1">with1 hex digit, and automagically increase it to 2, 3, .. as needed?  There could be a status file at that
level,that would indicate the current number of hex di<font size="-1">gits, plus a <font size="-1">temporary mapping
filewhen in transition.<br /><br /><font size="-1">Cheers,<br /><font size="-1">Gavin</font><br
/></font></font></font></font></font></font>

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Improvement of checkpoint IO scheduler for stable transaction responses