Homemade Hybrid Disk Drives Part-2
A little hybrid, a little fusion; a digital chimera emerges.
Long ago in September 2015 my fast/cheap/good plan had at its center the Seagate-manufactured hybrid drives that had 4GB of MLC NAND embedded in the normal 2.5-inch HDD package with a 500GB platter set. The plan was to make a RAID array from them. That plan died with firmware issues and the fact that the NAND wasn't heavily parallelized, hurting its throughput. Alternatively, as a Fusion drive, macOS Yosemite also didn’t like something about those drives (Core Storage commands in macOS would fail) and the plan was finally scrapped.
I never completely abandoned the idea because hybrid drives use a read-only cache architecture based on frequency of access that works rather well. The general experience with them is positive, except that we’d like much bigger caches.
Bigger and Better Caches
There are some enterprise-grade hybrid drives had have 32GB caches, but they tend to have relatively small data capacity; all in the sub-terabyte range. I did a couple of Fusion Drive experiments and wrote an article (Fusion 4700) about a 250GB SSD bonded to a 4500GB 3-drive RAID-0 group. It worked well, but it drove Windows-7 (and me) crazy with ‘ghost’ drive letters. It was also a rather fragile combination of technologies in that any one of four elements (the SSD or the three HDD’s) could completely break the Fusion Drive assembly.
The best attribute of Fusion Drives is that they’re a read/write solution that speeds both parts of the drive usage equation. By building the hard drive part of the Fusion Drive combo as a ‘fast enough’ element, it’ll have a fairly happy life. The downside is that the Core Storage function of macOS will fill the SSD to within 4GB of completely packed, and if the 'garbage collection' or TRIM functions of the SSD element cannot maintain full-speed writing under those conditions, then the value of the Fusion Drive diminishes as the SSD gets fuller.
A Little About Latency
A discussion of read/write bandwidth would not be complete without consideration of latency. This aspect of computing and network performance is so often distorted by confusing claims that it’s almost impossible to tear some people away from a wrong-headed view about what makes something ‘fast’. To say it all in one breath, a real-time data stream needs only enough bandwidth to transport the payload and protocol overhead without queuing; while batch transfers will shrink the start-to-end time with increasing bandwidth. The start-to-end time of a real-time data flow isn’t changed (hardly any) once there is ‘enough’ bandwidth. Latency can be induced with device driver CPU path-length, large buffers, physical distance (think spaceflight) and asynchronous hardware interactions.
RAM has fabulously low latency; around 14-nanoseconds for DDR3 compared to SSD's that are about 15000x slower at 200-microseconds latency. These new NVMe PCIe drives on the market do better at about 60-microseconds latency, but still places them 5000x slower than DDR3 RAM. Spinning HDD's are 'way back in the pack at 600000x slower. However, it's not completely fair to look at the base technology's latency without some additional context.
Despite the cool 14-nanosecond (ns) number for RAM's own latency, there is more at play when the RAM is referenced. The L2 cache on the Westmere CPU's in this Mac Pro has about a 3ns latency (10 CPU cycles). Access to the RAM is 40 CPU cycles plus another 60 or 100 nanoseconds depending on whether it's on same side 'socket' or has to go across the QPI to other socket's RAM. On a 3.33 GHz x5680 processor, that's 72ns to 112ns depending on where the data is. It gets far messier than that too, depending on the address boundaries of the data, the size of the data request, and the participation of the L3 CPU cache.
Here's a terrific overview article about Westmere CPU cache and RAM interactions:
In case you can't sleep, there's another great Intel PDF report about SSD behavior can be found here at the link immediately below. Viewer discretion is advised for the technically queasy.
A Fairly Optimal Solution
With due consideration of all these moving parts, my HHDD solution checks all the boxes. The jewel that makes this all possible is a large 16GB read/write L1 cache in RAM that implements a 10-second lazy-write timeout. This solution allocates 1/3 of the 48GB of RAM in the Mac Pro, giving it something to do rather than just sitting there, flaunting its potential.
Keep in mind that the small cluster size of 4K that's necessary to help small writes where HDD's are weak (and SSD's always can use this help) also requires lots of overhead for management in the RAM cache. There's 3GB of RAM cache overhead for the 4K cluster size. Yup, 3GB is a lot, and entirely impractical unless you have my situation with gobs of RAM otherwise doing nothing. By using bigger 64K clusters, the management overhead dropped to less than 200MB.
This RAM cache is partnered with a 10x larger 167GB read-only L2 cache partition of a RAID-0 SSD pair, having a throughput near 1GByte/sec. The sizes chosen for the L1 and L2 cache are somewhat arbitrary based on my observations and anticipations of the action going on in my Mac Pro. Another positive attribute of using the L1 RAM cache is the fact that the speed is available at low queue depths. Despite the pretty benchmarks, workstation disk workloads have over 80-percent of their I/O at a queue depth of 2 or 1; 90-percent under a queue depth of 4. As for the bigger cache, the L2 cache is 'persistent' across reboots, so its able to speed up anything that was done in any session from any of the 3 drives that I've assigned to its pool. I'll be monitoring the hit rate and the churn to see if the size must be adjusted.
There are two obvious issues with this solution. The first is that it can only be done in Windows. I’m using a product called PrimoCache from Romex Software to build these intelligent caches, and I have not found an equivalent product for macOS. The second issue is that the RAM cache is volatile, and extends the time exposure window of a system crash from the half-second or so that an SSD or HDD may have in its DRAM controller capacitor-based power protection. With the 10-second lazy-write in L1, it's possible to lose/corrupt more data than before. Unplanned power-off concerns are readily covered by the UPS that supports my systems, so it's really a matter of Windows stability, and the underlying reliability of the Mac Pro hardware.
Keeping my stable and mostly happy Win-7 Pro in place, I set out to install Win-10 Pro (again) on my 2010 Mac Pro.
A Smooth Windows 10 Installation
The emotional scars of my data disaster of June 2, 2017 (a day that will live in infamy) caused by the Anniversary Edition Update have begun to heal. As therapy, I decided to try it from scratch, and was rewarded with a surprising result: First, I removed/disconnected all other drives from the machine, leaving only the 500GB Seagate Momentus XT hybrid SSHD in place. I downloaded the Win-10 1803 Creator Edition ISO image, burned it to a DVD and booted it from a USB 2.0 Samsung DVD reader. Everything installed on the Seagate drive as if it was a normal PC. I was shocked, and still feel as if I’ve accidentally discovered some covered-up secret that I’m not supposed to know.
I used Macrium Reflect Free to clone the Seagate to a 500GB Crucial MX500 SSD, and all is well. Every subsequent product installation has gone perfectly. The only hardware that didn’t get detected and supplied with a driver is the native Bluetooth of the Mac Pro, and the NewerTech MaxPower RAID card. By the way, this RAID card is physically a Highpoint RocketRAID 2721, but has unique firmware that needs a driver that's well-hidden on the Internet. Grrrr. Moving on, I plan to displace the native Mac Pro Bluetooth with a USB Bluetooth 4.0 adapter made by IOGear.
After re-installing all the data drives, there were two special changes I had to make. One was is in the power settings of Win-10 to disable Fast Boot. If you don’t disable Fast Boot, Win-10 only hibernates when shutdown, leaving zillions of files in a state that obligates Win-7 to run CHKDSK for integrity. The other change was a registry edit to ensure that the clock had the right interpretation of the UTC clock to show the correct local time.
You may notice that the naming of my data drives has a drive letter as a prefix and suffix. Doing that helps to reduce confusion, as I freely access these HFS+ and NTFS drives from the "opposite" operating system. I have NTFS for Mac and HFS for Windows installed. These products are from Paragon Software, and have been trouble-free.
Now, with the Win-10 Pro in place, the packaging and configuration of the disk pools can now proceed. See Part-3
- Ted Gary of TedLand
May 17, 2018