I've worked for awhile now on a "low cost" server solution that can handle up to 40 SATA hard drives. I use this for my primary online data storage needs and then also use a USB removable storage solution from TQ for things that don't necessarily need to be online all the time.
In the process of building the storage rig/server, I had to come to terms and figure out a few things.
#1 - Keep It Simple! Keep It Simple! Keep It Simple! Keep It Simple!
#2 - RAID 5 or similar RAID configurations don't make sense in these types of applications (low usage, online/near-line type storage). The thing that took me forever to come to terms with is that if I lose a hard drive, I lose the contents and have to go back to the originals. For whatever reason, this was extremely tough for me with my server background and constant worrying about losing data at the server level, but in this case it just doesn't apply and the costs of keeping the safety net outweigh any benefits. Also, performance isn’t an issue as individual drives have more than enough performance for home-based media needs.
FWIW, I was tempted and for awhile did use RAID5 type solutions. The problem is that there is/was no RAID controller that I was aware when building the server that will power down the drives when not in use. This may have changed recently with constant push for server-side power efficiency, but I still don't want 5 to 15 drives powering up and generating unnecessary heat whenever one file is accessed in the RAID.
I'm currently using 40 1TB drives and having them all up and running in various RAID5 arrays would be a huge power waste and heat generator. It also locks the hard drives into the RAID controller's formatting and this meant changing out RAID arrays or RAID controllers is a major pain. Having just recently upgraded from 500GB to 1TB drives, it was much easier just being able to swap the drives in/out one at time as needed versus having to deal with large RAID5 type setups and needing to setup an array using hard drive slots that don't exist, etc.
Lastly, rebuilding large RAID5 arrays using SATA drives takes forever.
#2 - Drive Letters would be an issue so I'd needed a work around for that. In my case, I choose to use Windows Server and use its ability to mount a drive to folder and then share out the master folder. For instance, setup a share called HDBANK (for Hard Drive Bank) and then create folders for each hard drive (i.e. HD00, HD01, ...). You can then use Disk Manager to link the hard drive to an HDxx folder. This way, you only need one share and/or one drive letter for all the drives on the storage server. I used Windows as it was available to me, but Linux should have a similar type solution available and should work just as well.
#3 - The drive setup/format had to use a standard format (FAT, NTFS, etc.). No special formatting, not part of any RAID, not tied to any RAID Controller, etc. I needed the ability to pull the drive and read it on any system as wanted. I ended up using NTFS as it was the path of least resistance for me.
#4 - Heat is the #1 enemy for large storage solutions. Not only in the longevity of the hard drive(s), but in requiring more power for the hard drives, more power to cool the room, bigger power supplies, and so on. All of which ultimately leads to higher utility bills. As such, I needed to choose hard drives based on heat and power usage instead of performance – this is something that I’m not particular used to or good at
.
I ultimately ended up using Western Digital's 1TB drives. They run noticeable cooler (you can actually touch them after you power them off), and draw less power when operating compared to other 1TB drives plus include aggressive power management as part of the drive’s firmware.
With those basic ideas in mind, here's my setup:
1 - Windows Server 2008 (was 2003 as a few days ago)
1 - Silicon Image eSATA PCIe Controller w/ 4 External Ports
4 - USB to SATA Cables
8 - 5 Port Multipliers
8 - 5 Bay Storage Removable Racks from Super Micro
2 - External 750 Watt Supplies for the 40 External Hard Drives
The storage setup is like this:
SI eSATA Card
#1 -> 5 Port Multiplier -> 5 Bay Hard Drive Storage Rack from Super Micro
...
...
#4 -> 5 Port Multiplier -> 5 Bay Hard Drive Storage Rack from Super Micro
USB to SATA Cable
-> 5 Port Multiplier -> 5 Bay Hard Drive Storage Rack from Super Micro
...
...
-> 5 Port Multiplier -> 5 Bay Hard Drive Storage Rack from Super Micro
FYI - I've tried using multiple SI eSATA cards without luck. Driver issues. As such, to get the extra storage I had to go with the slower USB to SATA Cables type solution for second set of 20 Hard Drives.
Things I’m still considering or will experiment with…
To me, a perfect solution would be racks and racks of hard drives that were connected to a eSATA card and an uber relay card of some type. Essentially all the hard drives would be off, but from a program you could trip a relay and power up any hard drive. My concern is that you'd also need to switch the SATA connections as well and I'm concerned about the integrity of the high speed serial links thru a typically relay switch would have issues, but don't have the experience to create a custom FGPA type solution or ??? for it. You could use port multipliers like I have above, but since I ran into problems with multiple SI cards and almost any card that supports PMs seem to use SI's chips. As such, it would be hard to scale above 20 drives. I’ve also yet to find a 100% reliable eSATA Hot Plug card/solution. Until eSATA becomes like USB’s Hot Plug, I always worry that I’m going loss data when hot swapping a drive connected to an eSATA port. USB to SATA is fine, but the performance stinks in comparison to eSATA.
Now with iSCSI coming down into the price range of mere mortals. It may make sense to have multiple low cost Linux boxes as iSCSI hosts for 4 or more hard drives and either access directly and combine up the iSCSI Targets/Drives via a server share. I'm currently looking into this as a solution. My biggest worry is again power management. Can you even power down an iSCSI target/drive when not in use? Will the recipient tolerant the 5 to 10 second delay of waiting for a drive to power up or will it error out, etc. If this would work, then I think this solution would be the ultimate in scalability and if possibly combined with Wake-On-Lan or Line Voltage Relay Power Boards, could allow for tons of “online storage” with essentially little to no power or heat issues. FYI - Intel just released a new storage box that can run Linux or a host of other OSes. Looks great and almost no noise if combined with low power/noise hard drives.
Hope this helps! For me, it’s about the process. If you now any one that builds kit cars, kit planes, etc. Most of them enjoy the building and not the driving or flying. So for me, it’s that same, but I enjoy working building the solutions, etc. Storing lots of stuff is just an excuse to build it in the first place.