Saturday, January 27, 2018

Set up a Linux File Server


The Problem

Managing digital files isn’t easy these days. Improvements to media fidelity and the sheer volume of available content have arguably outpaced improvements in storage technology and affordability.
There are two important kinds of storage drives: hard disk drives (HDD) and solid state drives (SSD)
HDD’s have a magnetically sensitive disk spinning inside them. Specific data can only be read or written to when the portion of the disk it resides on is physically underneath a drive head. Since data is typically spread out, HDD's are slow.
Operating systems — especially commercial ones — have gotten big. Consequently, systems that run operating systems off of HDD’s are also slow. In fact, old Macs with just HDD’s can be almost unusably slow with the latest macOS installed.
SSD’s, on the other hand, have no mechanical parts. So you will probably want at least one of these in your system.
Unfortunately, they are significantly more expensive than HDD’s of the same capacity at the present time.
If you’re lucky, you can afford to buy an SSD that’s big enough to store all your files. But most of us can't. So unless you're sufficiently comfortable, patient and trusting to keep all of your files in the Cloud, that means putting together a system including both SSD's and HDD's (or, a hybrid or Fusion drive.)

Solution 1: String Together Lots of External HDD's

The first draft of the solution might be to add HDD's to a single computer with an internal SDD.
Unfortunately, if you're a Mac user with recent hardware and you're too squeamish to void your warranty by cracking (and in most cases, I do mean cracking) them open, you'll need external drive enclosures for those HDD's.
But there are problems with these things.
  1. They are noisy.
  2. They must be attached with a forest of cables (See my blog post on Meta-Towers for a partial workaround)
  3. Unless you leave your system on all the time, they have to be turned on and off separately from the computer they're paired with.
Mac SSD's have been enjoying the new Apple File System (AFS) since the release of macOS High Sierra. But Mac HDD's are currently still stuck on the old HFS+ file system. HFS+ lacks checksumming to guard against file corruption.
If you're aren't Mac user, file management is easier. But you'll still encounter frustrations if you want to use your files remotely, from other devices (including from other computers.)

Solution 2: File Servers

One solution is to set up a file server.
A file server is a computer that lets other devices read and edit its files remotely, using high-level networking protocols.
There are several options for file server software. I find that the best ones are significantly easier to set up on Linux. I'll discuss the options in the next section.
Linux and the EXT4 file system are great at reliably and flexibly serving files. They are also free!
(Note that, unlike macOS, no mainstream Linux file system offers a Versions feature that consistently lets you revert to earlier versions of a file.)
Your file server could also be your primary computer system. But most people don't want Linux for their primary systems, and dual boot can be difficult to set up and keep set up.
So I recommend getting a cheap, expandable Linux tower with lots of internal drive bays. I am very happy with my own zaReason Limbo. I actually have mine rigged up to share a display with my Mac mini, and to share a clicky Unicomp keyboard via a USB 2.0 switch.
It's up to you how many internal HDD's to get and how big they should be. One 3 TB HDD holds pretty much all my permanent files except for archived iMovie projects, star catalogues, and planetary object video. (If you want to get technical, I suppose I also have Nintendo games on other drives.)
You'll need to pick a Linux distribution and a desktop environment. Ubuntu is my distribution of choice, and I use KDE as my main desktop environment (I got them packaged together as "kubuntu".) But I also like Gnome.
You'll want to set up mounts for all your installed HDD's so they're available immediately upon startup. You can do this by editing /etc/fstab, or by writing startup scripts that invoke udisksctl mount.

File Server Software

Save yourself a headache and install file server software using a package manager (kubuntu includes Muon). And be sure to keep up to date with the latest versions, to protect against security exploits.
After installation, you'll need to locate the configuration file for your new file server software. Typically, you can accomplish this with Linux's locate shell command, or find / -name ….conf. If there's more than one search result, you might need to experiment to determine which is the one your service is using.
Package managers usually configure sensible and secure defaults. But just be aware that a misconfigured file server in principal could expose all your files to the local network — or even the Internet — without requiring a password or anything! So take the time to review your configurations and understand what they mean and do.
Your package manager should set up your file server software as a Linux service (usually, a systemd service) that you can stop, start and check the status of with the service shell command.
The file server softwares discussed below aren't mutually exclusive. If you want, you could install all or them.

SMB via Samba

Samba is a free and open source implementation of Server Message Block (SMB) protocol. SMB has traditionally been associated with Microsoft Windows, but today is pretty much everywhere.
As of SMB version 3.0, Apple now recommends its use for file sharing rather than AFP (see below). But many users complain about SMB 3.0'd performance.

Samba Configuration

Samba's configuration file is typically called smb.conf and its service is typically called smb.
Once you've found the configuration file, edit it.
Under the [global] section, make sure passdb backend = tdbsam
(It's fine if there's also a colon and a file path after this.)
TDB is an encrypted authentication store that performs well enough for our purposes.
Remove the ; comment character in from of the [homes] section. Do the same for browsable, and change the value to yes. Also remove the comment character from read only, and change it to no, if necessary.
Save the file and restart your Samba (again, probably "smb") service.
Unfortunately, Samba is not well integrated into Linux's own authentication mechanism. You'll need to create Samba user accounts. You will typically accomplish this with the smbpasswd -a command.
You can avoid the need to reconfigure if you use the same names for your Linux and Samba accounts. 
You may find it helpful to troubleshoot your Samba configuration by installing and running smbclient.
smbclient -U your_user_name //your_host_name/your_user_name -d 3
where -d 3 indicates the debug level (the higher the level, the more verbose.)

AFP via Netatalk

Netatalk is a free and open source implementation of Apple Filing Protocol (AFP) which has been supported on Macs since version 9 of the classic Mac OS. As a result, Netatalk can probably talk to any Mac you still have in operation.

Configuration

Netatalk typically has a configuration file named "afp.conf" and its service is typically called "netatalk".
Configuring Netatalk is easier than Samba.
Create or uncomment a homes section in the configuration file.
[Homes]
      basedir regex = /home
You can also create sections for any other important directories.
[Another]
      path = /my_user_account_name/media/my_volume
      valid users = my_user_account_name
By default, recent versions of Netatalk allow the use DHX2 and DHCAST128 encryption by default. That's fine, just as long as cleartext isn't enabled. You can check on this using the asip-status.pl script.
Unlike with Samba, you'll be able to login with your existing Linux user account.
tail -f Netatalk's log file (its location should be configured) and then restart the Netatalk ("afp") service. If you don't see anything worrying in the logs, then you should be good to go.

Other Options

There are still other file server options.

NFS

Network File System (NFS) protocol originated on Solaris, but today is viewed as the native file server protocol of Linux.
NFS isn't quite as widely supported outside the macOS and Unix family of computers, but it has been battle-tested and is pretty fast.

SFTP via Open SSH Server

Secure File Transfer Protocol (SFTP) sits on top of the ssh protocol and service. 
Ssh and SFTP let you exploit public key infrastructure to securely serve files without ever needing to type a password.
On Mac, the excellent Transmit app ships with Finder plugins to seamlessly mount SFTP file systems as volumes.

WebDAV

Web Distributed Authoring and Versioning (WebDAV) piggybacks file sharing atop a webserver. It can be installed into Apache webserver (httpd).

Connecting to Your File Server

Please note that you will not need to enable anything in the Sharing control panel on your Mac… these options are solely for incoming connection requests.
To initiate an outgoing connection to your file server, pull down the Go file menu from the Finder and select "Connect to Server…"
Here's how to construct your connection string for each protocol:
ProtocolFormat
Sambasmb://server_name/share_name
AFPafp://server_name/optional_share_name
SFTPsftp://server_name/optional_directory_path
NFSnfs://server_name/optional_directory_path
Once in, things may appear a little weirder than you're used to. For example, old custom Finder icon files appear as "I7CIPB~N" over SMB. But for the most part, everything should be OK. 
Some folks in the online community recommend improving SMB 3.0 performance by reconfiguring your Mac to change the SIGNING_ON client setting to FALSE. But this could expose you to eavesdropping attacks if someone infiltrates your network, so I recommend against it. If you're concerned about performance, don't initiate SMB 3.0 connections from macOS Sierra or High Sierra.
Also keep in mind that if you need to do batch operations on a lot of large files, you always have the option of doing so on your Linux server, without using a file sharing protocol. For this, you could shell in with ssh or mirror its screen with Remote Desktop Client (RDC). (Because of Linux's relative lack of reliance on modifier key / mouse combinations, RDC from Mac to Linux works a lot better than the other way around.)
Or, of course, you could use your Linux box directly, like a normal system.

Search

Oh wait, what about search? I don't think about it often, but when I do, I consider it to be pretty much the most useful thing on my computer.
Linux is well supported in this area by Recoll and Gnome Tracker. They install easily with a package manager. Both can handle important common file types like PDF, ePub, and HTML.
Recent versions of Gnome Tracker can even be integrated with Netatalk and Samba! Unfortunately, there don't seem to be package manager packages for these versions at the present time, so you'd have to compile them manually, from source code.
But if Linux isn't your primary operating system, then chances are you use some native apps with idiosyncratic file formats. On Mac, these apps often ship with automated plugins to that allow Mac's search tool, Spotlight, to index their files. (I understand there is something similar called Wox on Windows.)
Linux won't be able to exploit these custom formats.
The solution is to let Spotlight index your remote volumes, just like it indexes the files residing on your Mac's internal drive! Sure, the initial index will be slow to build over your network. But that index will live on your primary system's SDD, so the actual searches will be fast.
Unfortunately, indexing of remote drives from macOS has been slightly broken since the release of macOS Sierra. But a simple fix to create /private/var/db/Spotlight-V100/Volumes/ (if missing) will allow you to enable a remote volume:
mdutil -i on /Volume/volume_name
Spotlight indexes are protocol and volume specific. So you won't be able to connect via, say, AFP and exploit an index made over SMB.

I kind of like maintaining Spotlight and Gnome Tracker indexes. That way, I can validate them against each other.

Backup

You'll want to the schedule regular back ups of your file server. I find the easiest way to do this is with an HDD dock. They make it easy to rotate backup drives.
On Linux, Back in Time is a good backup solution.
For what it's worth, I've found that initial backups complete much faster on EXT4 than on HFS+.
Whatever you pick, don't forget to keep using it and every now and then test with a spare drive to make sure you're really backing up what you believe you are.

Serving Other Things

If you're new to Linux, you'll find it's good at serving lots of other things, too.
You might want to install a dedicated media server like Plex. There's a client app for Plex on Apple TV, Roku, and Amazon Fire.
If you want to serve webpages to your local network, try Apache Webserver (httpd), which you may already have if you went the WebDAV route for file serving. Nginx is another good option.
You may even want to run a database server. There are many, many options. PostgreSQL and MySQL are good go-to picks.

No comments:

Post a Comment