Обсуждение: optimal ZFS filesystem on JBOD layout

Поиск
Список
Период
Сортировка

optimal ZFS filesystem on JBOD layout

От
Joachim Worringen
Дата:
Greetings,

we are running a few databases of currently 200GB (growing) in total for
data warehousing:
- new data via INSERTs for (up to) millions of rows per day; sometimes
with UPDATEs
- most data in a single table (=> 10 to 100s of millions of rows)
- queries SELECT subsets of this table via an index
- for effective parallelisation, queries create (potentially large)
non-temporary tables which are deleted at the end of the query => lots
of simple INSERTs and SELECTs during queries
- large transactions: they may contain millions of INSERTs/UPDATEs
- running version PostgreSQL 8.4.2

We are moving all this to a larger system - the hardware is available,
therefore fixed:
- Sun X4600 (16 cores, 64GB)
- external SAS JBOD with 24 2,5" slots:
   - 18x SAS 10k 146GB drives
   - 2x SAS 10k 73GB drives
   - 4x Intel SLC 32GB SATA SSD
- JBOD connected to Adaptec SAS HBA with BBU
- Internal storage via on-board RAID HBA:
   - 2x 73GB SAS 10k for OS (RAID1)
   - 2x Intel SLC 32GB SATA SSD for ZIL (RAID1) (?)
- OS will be Solaris 10 to have ZFS as filesystem (and dtrace)
- 10GigE towards client tier (currently, another X4600 with 32cores and
64GB)

What would be the optimal storage/ZFS layout for this? I checked
solarisinternals.com and some PostgreSQL resources and came to the
following concept - asking for your comments:
- run the JBOD without HW-RAID, but let all redundancy be done by ZFS
for maximum flexibility
- create separate ZFS pools for tablespaces (data, index, temp) and WAL
on separate devices (LUNs):
- use the 4 SSDs in the JBOD as Level-2 ARC cache (can I use a single
cache for all pools?) w/o redundancy
- use the 2 SSDs connected to the on-board HBA as RAID1 for ZFS ZIL

Potential issues that I see:
- the ZFS ZIL will not benefit from a BBU (as it is connected to the
backplane, driven by the onboard-RAID), and might be too small (32GB for
~2TB of data with lots of writes)?
- the pools on the JBOD might have the wrong size for the tablespaces -
like: using the 2 73GB drives as RAID 1 for temp might become too small,
but adding a 146GB drive might not be a good idea?
- with 20 spindles, does it make sense at all to use dedicated devices
for the tabelspaces, or will the load be distributed well enough across
the spindles anyway?

thanks for any comments & suggestions,

   Joachim


Re: optimal ZFS filesystem on JBOD layout

От
Greg Smith
Дата:
Joachim Worringen wrote:
> Potential issues that I see:
> - the ZFS ZIL will not benefit from a BBU (as it is connected to the
> backplane, driven by the onboard-RAID), and might be too small (32GB
> for ~2TB of data with lots of writes)?

This is a somewhat unpredictable setup.  The conservative approach would
be to break two disks out of the larger array for the ZIL, running
through the battery-backed cache, rather than using the SSD drives for
that.  The best way IMHO to use SSD for PostgreSQL is to put your large
indexes on it, so that even if the drive does the wrong thing when you
write and the index gets corrupted you can always rebuild them rather
than suffer data loss.  Also, index writes really benefit from being on
something with low seek times, moreso than the ZIL or WAL.

--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us