This has been a long while in the making—it’s test results time. To truly understand the fundamentals of computer storage, it’s important to explore the impact of various conventional RAID (Redundant Array of Inexpensive Disks) topologies on performance. It’s also important to understand what ZFS is and how it works. But at some point, people (particularly computer enthusiasts on the Internet) want numbers.
First, a quick note: This testing, naturally, builds on those fundamentals. We’re going to draw heavily on lessons learned as we explore ZFS topologies here. If you aren’t yet entirely solid on the difference between pools and vdevs or what ashift and recordsize mean, we strongly recommend you revisit those explainers before diving into testing and results.
And although everybody loves to see raw numbers, we urge an additional focus on how these figures relate to one another. All of our charts relate the performance of ZFS pool topologies at sizes from two to eight disks to the performance of a single disk. If you change the model of disk, your raw numbers will change accordingly—but for the most part, their relation to a single disk’s performance will not.
Equipment as tested
We used the eight empty bays in our Summer 2019 Storage Hot Rod for this test. It’s got oodles of RAM and more than enough CPU horsepower to chew through these storage tests without breaking a sweat.
The Storage Hot Rod’s also got a dedicated LSI-9300-8i Host Bus Adapter (HBA) which isn’t used for anything but the disks under test. The first four bays of the chassis have our own backup data on them—but they were idle during all tests here and are attached to the motherboard’s SATA controller, entirely isolated from our test arrays.
How we tested
As always, we used fio to perform all of our storage tests. We ran them locally on the Hot Rod, and we used three basic random-access test types: read, write, and sync write. Each of the tests was run with both 4K and 1M blocksizes, and I ran the tests both with a single process and iodepth=1 as well as with eight processes with iodepth=8.
For all tests, we’re using ZFS on Linux 0.7.5, as found in main repositories for Ubuntu 18.04 LTS. It’s worth noting that ZFS on Linux 0.7.5 is two years old now—there are features and performance improvements in newer versions of OpenZFS that weren’t available in 0.7.5.
We tested with 0.7.5 anyway—much to the annoyance of at least one very senior OpenZFS developer—because when we ran the tests, 18.04 was the most current Ubuntu LTS and one of the most current stable distributions in general. In the next article in this series—on ZFS tuning and optimization—we’ll update to the brand-new Ubuntu 20.04 LTS and a much newer ZFS on Linux 0.8.3.
Initial setup: ZFS vs mdraid/ext4
When we tested mdadm and ext4, we didn’t really use the entire disk—we created a 1TiB partition at the head of each disk and used those 1TiB partitions. We also had to invoke arcane arguments—
mkfs.ext4 -E lazy_itable_init=0,lazy_journal_init=0—to avoid ext4’s preallocation from contaminating our results.
Using these relatively small partitions instead of the entire disks was a practical necessity, since ext4 needs to grovel over the entire created filesystem and disperse preallocated metadata blocks throughout. If we had used the full disks, the usable space on the eight-disk RAID6 topology would have been roughly 65TiB—and it would have taken several hours to format, with similar agonizing waits for every topology tested.
ZFS, happily, doesn’t need or want to preallocate metadata blocks—it creates them on the fly as they become necessary instead. So we fed ZFS each 12TB Ironwolf disk in its entirety, and we didn’t need to wait through lengthy formatting procedures—each topology, even the largest, was ready for use a second or two after creation, with no special arguments needed.
ZFS vs conventional RAID
A conventional RAID array is a simple abstraction layer that sits between a filesystem and a set of disks. It presents the entire array as a virtual “disk” device that, from the filesystem’s perspective, is indistinguishable from an actual, individual disk—even if it’s significantly larger than the largest single disk might be.
ZFS is an entirely different animal, and it encompasses functions that normally might occupy three separate layers in a traditional Unixlike system. It’s a logical volume manager, a RAID system, and a filesystem all wrapped into one. Merging traditional layers like this has caused many a senior admin to grind their teeth in outrage, but there are very good reasons for it.
There is an absolute ton of features ZFS offers, and users unfamiliar with them are highly encouraged to take a look at our 2014 coverage of next-generation filesystems for a basic overview as well as our recent ZFS 101 article for a much more comprehensive explanation.
Megabytes vs Mebibytes
As in the last article, our units of performance measurement here are kibibytes (KiB) and mebibytes (MiB). A kibibyte is 1,024 bytes, a mebibyte is 1,024 kibibytes, and so forth—in contrast to a kilobyte, which is 1,000 bytes, and a megabyte, which is 1,000 kilobytes.
Kibibytes and their big siblings have always been the standard units for computer storage. Prior to the 1990s, computer professionals simply referred to them as K and M—and used the inaccurate metric prefixes when they spelled them out. But any time your operating system refers to GB, MB, or KB—whether in terms of free space, network speed, or amounts of RAM—it’s really referring to GiB, MiB, and KiB.
Storage vendors, unfortunately, eventually seized upon the difference between the metrics as a way to more cheaply produce “gigabyte” drives and then “terabyte” drives—so a 500GB SSD is really only 465 GiB, and 12TB hard drives like the ones we’re testing today are really only 10.9TiB each.