Re: Postgresql 9.4 and ZFS?

Поиск
Список
Период
Сортировка
От Joseph Kregloh
Тема Re: Postgresql 9.4 and ZFS?
Дата
Msg-id CAAW2xfeR9vwzx_NmEAJSUOSgZ0ya7UqKyhn=wFa8VHMfE8ojuw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Postgresql 9.4 and ZFS?  (Benjamin Smith <lists@benjamindsmith.com>)
Ответы Re: Postgresql 9.4 and ZFS?
Список pgsql-general


On Wed, Sep 30, 2015 at 5:12 PM, Benjamin Smith <lists@benjamindsmith.com> wrote:
On Wednesday, September 30, 2015 09:58:08 PM Tomas Vondra wrote:
> On 09/30/2015 07:33 PM, Benjamin Smith wrote:
> > On Wednesday, September 30, 2015 02:22:31 PM Tomas Vondra wrote:
> >> I think this really depends on the workload - if you have a lot of
> >> random writes, CoW filesystems will perform significantly worse than
> >> e.g. EXT4 or XFS, even on SSD.
> >
> > I'd be curious about the information you have that leads you to this
> > conclusion. As with many (most?) "rules of thumb", the devil is
> > quiteoften the details.
>
> A lot of testing done recently, and also experience with other CoW
> filesystems (e.g. BTRFS explicitly warns about workloads with a lot of
> random writes).
>
> >>> We've been running both on ZFS/CentOS 6 with excellent results, and
> >>> are considering putting the two together. In particular, the CoW
> >>> nature (and subsequent fragmentation/thrashing) of ZFS becomes
> >>> largely irrelevant on SSDs; the very act of wear leveling on an SSD
> >>> is itself a form of intentional thrashing that doesn't affect
> >>> performance since SSDs have no meaningful seek time.
> >>
> >> I don't think that's entirely true. Sure, SSD drives handle random I/O
> >> much better than rotational storage, but it's not entirely free and
> >> sequential I/O is still measurably faster.
> >>
> >> It's true that the drives do internal wear leveling, but it probably
> >> uses tricks that are impossible to do at the filesystem level (which is
> >> oblivious to internal details of the SSD). CoW also increases the amount
> >> of blocks that need to be reclaimed.
> >>
> >> In the benchmarks I've recently done on SSD, EXT4 / XFS are ~2x
> >> faster than ZFS. But of course, if the ZFS features are interesting
> >> for you, maybe it's a reasonable price.
> >
> > Again, the details would be highly interesting to me. What memory
> > optimization was done? Status of snapshots? Was the pool RAIDZ or
> > mirrored vdevs? How many vdevs? Was compression enabled? What ZFS
> > release was this? Was this on Linux,Free/Open/Net BSD, Solaris, or
> > something else?
>
> I'm not sure what you mean by "memory optimization" so the answer is
> probably "no".

I mean the full gamut:

Did you use an l2arc? Did you use a dedicated ZIL? What was arc_max set to?
How much RAM/GB was installed on the machine? How did you set up PG? (PG
defaults are historically horrible for higher-RAM machines)


In my testing with pgbench I actually saw a decrease in performance with a ZIL enabled. I ended up just keeping the L2ARC and dropping the. ZIL will not provide you with any speed boost as a database. On a NAS with NFS shared for example, a ZIL would work well. ZIL is more for data protection than anything.

I run in Production FreeBSD 10.1 with an NVMe mirror for L2ARC, the rest of the storage is spinning drives. With a combination of filesystem compressions. For example, archival tablespaces and the log folder are on gzip compression on an external array. Faster stuff like the xlog are lz4 and on an internal array.

If you are interested I might still have the data from when I executed the tests.

-Joseph Kregloh

> FWIW I don't have much experience with ZFS in production, all I have is
> data from benchmarks I've recently done exactly with the goal to educate
> myself on the differences of current filesystems.
>
> The tests were done on Linux, with kernel 4.0.4 / zfs 0.6.4. So fairly
> recent versions, IMHO.
>
> My goal was to test the file systems under the same conditions and used
> a single device (Intel S3700 SSD). I'm aware that this is not a perfect
> test and ZFS offers interesting options (e.g. moving ZIL to a separate
> device). I plan to benchmark some additional configurations with more
> devices and such.

Also, did you try with/without compression? My information so far is that
compression significantly improves overall performance.

> > A 2x performance difference is almost inconsequential in my
> > experience, where growth is exponential. 2x performance change
> > generally means 1 to 2 years of advancement or deferment against the
> > progression of hardware; our current, relatively beefy DB servers
> > are already older than that, and have an anticipated life cycle of at
> > leastanother couple years.
>
> I'm not sure I understand what you suggest here. What I'm saying is that
> when I do a stress test on the same hardware, I do get ~2x the
> throughput with EXT4/XFS, compared to ZFS.

What I'm saying is only what it says on its face: A 50% performance difference
is rarely enough to make or break a production system; performance/capacity
reserves of 95% or more are fairly typical, which means the difference between
5% utilization and 10%. Even if latency rose by 50%, that's typically the
difference between 20ms and 30ms, not enough that, over the 'net for a
SOAP/REST call, that anybody'd notice even if it's enough to make you want to
optimize things a bit.

> > // Our situation // Lots of RAM for the workload: 128 GB of ECC RAM
> > with an on-disk DB size of ~ 150 GB. Pretty much, everything runs
> > straight out of RAM cache, with only writes hitting disk. Smart
> > reports 4/96 read/write ratio.
>
> So your active set fits into RAM? I'd guess all your writes are then WAL
> + checkpoints, which probably makes them rather sequential.
>
> If that's the case, CoW filesystems may perform quite well - I was
> mostly referring to workloads with a lot of random writes to he device.

That's *MY* hope, anyway! :)

> > Query load: Constant, heavy writes and heavy use of temp tables in
> > order to assemble very complex queries. Pretty much the "worst case"
> > mix of reads and writes, average daily peak of about 200-250
> >
>  > queries/second.
>
> I'm not sure how much random I/O that actually translates to. According
> to the numbers I've posted to this thread few hours ago, a tuned ZFS on
> a single SSD device handles ~2.5k tps (with dataset ~2x the RAM). But
> those are OLTP queries - your queries may write much more data. OTOH it
> really does not matter that much if your active set fits into RAM,
> because then it's mostly about writing to ZIL.

I personally don't yet know how much sense an SSD-backed ZIL makes when the
storage media is also SSD-based.

> > 16 Core XEON servers, 32 HT "cores".
> >
> > SAS 3 Gbps
> >
> > CentOS 6 is our O/S of choice.
> >
> > Currently, we're running Intel 710 SSDs in a software RAID1 without
> > trim enabled and generally happy with the reliability and performance
> > we see. We're planning to upgrade storage soon (since we're over 50%
> > utilization) and in the process, bring the magic goodness of
> > snapshots/clones from ZFS.
>
> I presume by "software RAID1" you mean "mirrored vdev zpool", correct?

I mean "software RAID 1" with Linux/mdadm. We haven't put ZFS into production
use on any of our DB servers, yet.

Thanks for your input.

Ben


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

В списке pgsql-general по дате отправления:

Предыдущее
От: Keith Fiske
Дата:
Сообщение: Re: Postgresql 9.4 and ZFS?
Следующее
От: Steve Pribyl
Дата:
Сообщение: BDR Rejoin of failed node, hangs.