Обсуждение: AW: Re: New Linux xfs/reiser file systems
> > I think it's worth noting that Oracle has been petitioning the > > kernel developers for better raw device support: in other words, > > the ability to write directly to the hard disk and bypassing the > > filesystem all together. > > But there could be other reasons why Oracle would want to do > raw stuff. The reasons are: 1. Most Unixen now have shared (between several machines) raw devicesOracle needs this for their shared everything ParallelServer. Only 2 Unixen that I know of have shared filesystems (IBM gpfs and Sun Veritas) (both are rather new) 2. The allocation time for raw devices is by far better (near instantaneous) thancreating preallocated files in a fs. Providing1 Tb of raw devices is a task of minutes, creating 1 Tb filsystems with preallocated 2 Gb files is a task of hoursat best. 3. absolute control over writes and page location (you don't want interleaved pages) 4. Efficient use of buffer memory. Usual use of filesystems buffers the disk pages twice,one copy in the db buffer pool,one in the OS file cache. 5. async raw IO (most Unixes provide async raw IO on raw devices, only some provide raw IO on filesystem files).(async IOhas 2 advantages: CPU work can be done while waiting for IO and IO can complete within one OS timeslice (20 us). This ispossible with modern disk systems, that have large caches) Andreas
> 2. The allocation time for raw devices is by far better (near > instantaneous) than creating preallocated files in a > fs. Providing 1 Tb of raw devices is a task of minutes, > creating 1 Tb filsystems with preallocated 2 Gb files is a > task of hours at best. Filesystem dependent, surely? Veritas' VxFS can create filesystems quickly, and quickly preallocate space for the files. If you actually want to write data into the files that would take longer. :) Creating a 1TB UFS filesystem might take a while, and UFS doesn't support pre-allocation of space as far as I know so creating 2GB files would take time too. Perhaps hours. :-( > 3. absolute control over writes and page location (you don't want > interleaved pages) As well as a filesystem, most large systems I'm familiar with use volume management software (VxVM, LVM, ...) and their "disks" will be allocated space on disk arrays. These additional layers aren't arguments against simplifying the filesystem layer, but they sure will complicate measurement and tuning. :-) Regards, Giles