I remember trying to build my first NAS. It’s difficult. Most blog posts are written from a business use point-of-view, most forum posts you can’t trust: are they out of date, based on myths or written by someone with no clue?
So I was surprised when I read The ‘Hidden’ Cost of Using ZFS for Your Home NAS, it’s well written, without the usual storage snake oil. But most people won’t need large storage arrays at home.
Having helped my fifth friend configure as NAS (who’ve all also worked in storage), I realised there’s a bit of a need for a simple guide, which just outlines some of the considerations, without too much hand-holding. That way, it won’t go out of date and you don’t have to find exactly the same hardware.
This one is long.
The challenge: You have less than 4 TB of data you want to store on a NAS for home entertainment purposes (entry-level, home). You decide to use FreeNAS, because it has a great web interface which is easy to use, and the tech behind it (ZFS) is excellent. It’s a professional product, so it has all the bells and whistles. It runs FreeBSD (close enough to Linux, right?) and you can put a Minecraft server on it. Awesome.
You don’t fancy paying for a ready-made NAS box like Synology, because they’re usually anaemic and a bit locked down. You could go DIY and build a small server, but last-generation HP microservers have steep discounts, enterprise hardware and usually support ECC memory. The last-gen ones are still very good, don’t be fooled. So a HP microserver makes sense from a time, money and quality perspective, a rare combo. Those often don’t come with disks or much memory.
More RAM is always better, and you don’t need to buy the HP RAM, so stick at least 8 GiB in there. ECC RAM if you can, it’s usually as expensive as normal RAM, but normal will do. FreeNAS apparently works well with ECC RAM in protecting data integrity, but from what I read nobody can figure out if there’s any real danger of undetected data corruption with normal RAM (real means more probable than e.g. winning the lottery in my opinion - again, for a business that might not be good enough).
Erm, but how many disks do you put in? I’ve seen so many people put off by the complexity of this. It’s a hard topic, and forums can be really misleading on this one.
First, read about the 3-2-1 rule for backup. Remember this, it’s important. Now, think about your backup solution first. rsync.net is great and ZFS-compatible, but not cheap. You might just say “fuck it” and ignore off-site backups, because you’re just that cool.
You can also go for a hybrid approach: Really important things (e.g. photos or tax returns) go into Dropbox/Google Drive/<cloud provider>, and the laptop with <cloud provider> is also backed up on the NAS. Less important data (e.g. copies of music CDs or film DVDs) goes on the NAS. Such an approach is fine in my opinion. It’s safer than most people’s data, low cost, and doesn’t really have a weak point except the less important stuff. If <cloud provider> shuts down, find a new one. If the house burns down or floods, buy a Netflix and Spotify account. But again, we’re talking probabilities here. How much “certainty” do you want?
Where were we? How many disks? Do you care about redundancy of the data on the NAS? If not, just put as many disks as you need in there. Remember, redundancy just means “can I access my data uninterrupted once a disk dies?” and is not to be confused with backup. So having no redundancy is completely viable for home use, don’t let anybody tell you otherwise. You need good backup solution anyway and backing up is more important than redundancy.
That said, disks are kind of cheap, and redundancy is an easy way to have an extra copy of the data. If you remember the 3-2-1 rule, redundancy can add 1 to the first part (three copies), but but never to the second (different format) or third part (off-site backup). Or maybe, you’re going down the hybrid approach and need a bit of reassurance.
So redundancy is nice. How much data have you got? If you have more data than will fit on a large hard-drive (as of 2016, ~3 or 4 TB), then you don’t have much choice. Just go for RAIDZ2 with 4 or more disks, which is out of scope for this post. Less data than 3 TB for the foreseeable future means three options: a mirror (2 disks), a RAIDZ1 array (3 disks) or a RAIDZ2 array (4 disks).
My advice is just use a 2 disk mirror, unless you don’t mind buying 4 drives at once ($$$). I hear some people shouting at the screen already.
“But Toby, a mirror doesn’t provide the same level of protection as four drives in RAIDZ2!” That is correct, but it’s also usually irrelevant. First, ZFS provides many benefits over an external hard-drive, and a mirror is a huge step up in terms of reliability [citation needed]. So fault tolerance isn’t the only consideration here. Second, off-site backups. Yes I know I’m repeating myself but if your house and NAS burns down, all RAID levels are equal. Remember, the ‘R’ in RAID is redundancy, not “protection”, that’d be PAID. Third, two mirrors are more flexible that one RAIDZ2 array. Two mirrors?
If you’re short on cash, or if you don’t need that much space yet, just get one mirror. You can always add another mirror later. The later part is the interesting part. If you have to buy two more drives a short amount of time later, you’ve not lost much. Two mirrors have a similar fault tolerance to RAIDZ21, and you’ve maybe gotten another paycheck to spread the cost. But 2 or 3 years later, and chances are drives have gotten bigger/cheaper. So now, your second mirror can be bigger/cheaper that mirror #1.
Apparently, you can also grow mirrors by replacing the older drives with newer ones one-by-one and rebuilding each time (so twice). At the end, ZFS will expand the mirror to the new size.
In my opinion, this flexibility makes two mirrors preferable to RAIDZ2 and RAIDZ1.
“But Toby, RAIDZ1 gets me more storage space than a mirror but is cheaper than RAIDZ2!” RAIDZ1 does seem like a sweet deal. You should know that businesses don’t use RAIDZ1 any more; the idea is that with modern disks and RAIDZ1, it’s just too dangerous that another drive will die during a rebuild and ruin the redundancy. This is true for many drives, but it probably won’t matter with three disks, and anyway we’re not a business. People who tell you not to use RAIDZ1 usually don’t have off-site backups in place, which you do, right?
My argument against RAIDZ1 is more practical. Your microserver only has 4 bays. RAIDZ1 has worse rebuild and fault tolerance than RAIDZ2 or two mirrors, and less flexibility than two mirrors. The only time I can see RAIDZ1 being useful is if you’re so limited that you can’t afford 4 drives soon, but you know you’ll need the all the space sooner. If you’re that strapped for cash and desperate for space, a NAS probably isn’t for you.
TL;DR: Use mirrors!
It’s slightly worse: If one drive dies, then it’s partner can’t die. Arguably, the partner is now more likely to die because it’s being read for the rebuild. But rebuilding mirrors is often faster than arrays. It’s complicated. You have backups, right? So stop worrying. ↩