Re: Block at a time ...

Поиск
Список
Период
Сортировка
От Scott Carey
Тема Re: Block at a time ...
Дата
Msg-id 302806BE-C518-467F-B627-BA8D29B02297@richrelevance.com
обсуждение исходный текст
Ответ на Re: Block at a time ...  (Craig James <craig_james@emolecules.com>)
Список pgsql-performance
On Mar 22, 2010, at 4:46 PM, Craig James wrote:

> On 3/22/10 11:47 AM, Scott Carey wrote:
>>
>> On Mar 17, 2010, at 9:41 AM, Craig James wrote:
>>
>>> On 3/17/10 2:52 AM, Greg Stark wrote:
>>>> On Wed, Mar 17, 2010 at 7:32 AM, Pierre C<lists@peufeu.com>   wrote:
>>>>>> I was thinking in something like that, except that the factor I'd use
>>>>>> would be something like 50% or 100% of current size, capped at (say) 1 GB.
>>>>
>>>> This turns out to be a bad idea. One of the first thing Oracle DBAs
>>>> are told to do is change this default setting to allocate some
>>>> reasonably large fixed size rather than scaling upwards.
>>>>
>>>> This might be mostly due to Oracle's extent-based space management but
>>>> I'm not so sure. Recall that the filesystem is probably doing some
>>>> rounding itself. If you allocate 120kB it's probably allocating 128kB
>>>> itself anyways. Having two layers rounding up will result in odd
>>>> behaviour.
>>>>
>>>> In any case I was planning on doing this a while back. Then I ran some
>>>> experiments and couldn't actually demonstrate any problem. ext2 seems
>>>> to do a perfectly reasonable job of avoiding this problem. All the
>>>> files were mostly large contiguous blocks after running some tests --
>>>> IIRC running pgbench.
>>>
>>> This is one of the more-or-less solved problems in Unix/Linux.  Ext* file systems have a "reserve" usually of 10%
ofthe disk space that nobody except root can use.  It's not for root, it's because with 10% of the disk free, you can
almostalways do a decent job of allocating contiguous blocks and get good performance.  Unless Postgres has some weird
problemthat Linux has never seen before (and that wouldn't be unprecedented...), there's probably no need to fool with
file-allocationstrategies. 
>>>
>>> Craig
>>>
>>
>> Its fairly easy to break.  Just do a parallel import with say, 16 concurrent tables being written to at once.
Result? Fragmented tables. 
>
> Is this from real-life experience?  With fragmentation, there's a point of diminishing return.  A couple head-seeks
nowand then hardly matter.  My recollection is that even when there are lots of concurrent processes running that are
allmaking files larger and larger, the Linux file system still can do a pretty good job of allocating mostly-contiguous
space. It doesn't just dumbly allocate from some list, but rather tries to allocate in a way that results in pretty
good"contiguousness" (if that's a word). 
>
> On the other hand, this is just from reading discussion groups like this one over the last few decades, I haven't
triedit... 
>

Well how fragmented is too fragmented depends on the use case and the hardware capability.  In real world use, which
forme means about 20 phases of large bulk inserts a day and not a lot of updates or index maintenance, the system gets
somewhatfragmented but its not too bad.  I did a dump/restore in 8.4 with parallel restore and it was much slower than
usual. I did a single threaded restore and it was much faster.  The dev environments are on ext3 and we see this pretty
clearly-- but poor OS tuning can mask it (readahead parameter not set high enough).   This is CentOS 5.4/5.3, perhaps
laterkernels are better at scheduling file writes to avoid this.  We also use the deadline scheduler which helps a lot
onconcurrent reads, but might be messing up concurrent writes. 
On production with xfs this was also bad at first --- in fact worse because xfs's default 'allocsize' setting is 64k.
Sofiles were regularly fragmented in small multiples of 64k.   Changing the 'allocsize' parameter to 80MB made the
restoreprocess produce files with fragment sizes of 80MB.  80MB is big for most systems, but this array does over
1000MB/secsequential read at peak, and only 200MB/sec with moderate fragmentation. 
It won't fail to allocate disk space due to any 'reservations' of the delayed allocation, it just means that it won't
chooseto create a new file or extent within 80MB of another file that is open unless it has to.  This can cause
performanceproblems if you have lots of small files, which is why the default is 64k. 



> Craig



В списке pgsql-performance по дате отправления:

Предыдущее
От: Scott Carey
Дата:
Сообщение: Re: pg_dump far too slow
Следующее
От: Scott Carey
Дата:
Сообщение: Re: why does swap not recover?