Last year, I eagerly anticipated the release Google Drive. I had complained a lot about my experiences
with other synchronization software, and fully expected Google to knock this one out of the park. It's an application that should really emphasize Google's strengths: systems software, storage, scaling distributed systems.
In Spring 2012, I made the leap and moved all my personal cloud storage over to Google, but I ran into too many technical problems and gave up. (Some of these problems I'll detail below.) Now, a year later, I wanted to check in again and see how things are improving.
I'm afraid I have to report that there are still these major deal breakers for me, and perhaps for other "power users":
- Scalability isn't there. For example, if you try the simple benchmark of adding a folder with thousands of very small files, you'll see that maximum throughput is a few files per second.
- Getting stuck ("Unable to sync") seems common.
- Symlinks are still ignored.
It's surprising to me when solutions for syncing do
not aggregate meta-data for changed files before communicating over the wire (e.g. like rsync). The Google drive API seems to
encourage per-file remote operations. I heard there is some support for batching, but I'm not sure if that is specific to some Google APIs or generic across them. It would sure help here.
Of course, these services all work great for storing small numbers of medium sized files. Maybe there's no desire or need to support scaling and more intensive use? Yet, I think even non-techie users may end up with large numbers of small files even if they don't created them directly (e.g. in my Aperture library). For myself, I ultimately want something closer to a distributed file system. For example, I like to edit files within a git checkout locally on my laptop and have them synced to a server where I run the code. This requires three things:
- Cross platform -- Linux/Mac in my case.
- Low latency -- file edits should appear quickly on the other side.
- Equally good treatment of large numbers of small files and small numbers of large files.
Alas, in spite of the massive increase in the number of cloud-based directory synchronization options,
none seem to meet all three of these criteria. Still. I'll go through a list of disqualifying points at the end of this post.
The end result is that I still use the same solution I did ten years ago. I run "
unison -repeat 2" for linking working copies on different machines. The only thing missing is convenient file-system watching via inotify (i.e. OS-driven notification of changes rather than scanning). This is the killer feature that many of the newer cloud offerings have compared to unison, and it is the key to low-latency, as well as the always-on usage model Dropbox-style systems employ. Unison has rudimentary support for integrating with a file-system watcher and I've
sporadically had that functionality working, but it was fragile and hard to set up last time I tried it.