Thursday, April 26, 2012

Dropbox wiki gone -- Why we little people must Clone the Cloud

Clone the cloud.  It's better for everyone.

With github or with Google Drive this happens transparently.  The user has a full copy of what's on the server.  Github even does this for wiki and web page data, which is great, and some people create static HTML blogs using github and tools like Jekyll.


But unfortunately, most text that's poured into comments, forums, wikis, and blogs seems to have a dubious shelf life.  The case in point is that Dropbox took down their wiki a while back, making this link in one of my previous posts dead.  Poof.  In this particular case I kept a copy in the text file I pasted it from, so I am republishing it below.

"Data liberation" doesn't quite fix the problem, either.  Some companies are better than others about providing optional downloads of your data.  Google, in particular, is very good.  But laborious opt-in downloads (that most people don't use) aren't good enough.  Always-on synchronization or mirroring is called for.

I'm optimistic.  With the mass movement towards synchronization based approaches (SkyDrive, iCloud, Google Drive, etc) and VCS (github, bitbucket) I hope we are moving in the right direction.  And why not?  Everyone wins: 
  • For the cloud service provider it means an extra backup copy that reduces the risk of angry customers if a software failure (or more likely, account security breach) destroys data.  It also provides local caching, which always makes good engineering sense.
  • For the user, you get to access your data on the plane and know that it will live on when the company in question decides to retire the cloud service you've been using.


Reproduced Dropbox Wiki page on Symlinks

Current Behavior

Dropbox Mac and Linux clients currently follow symbolic links. This means that the links turn into directories on other Dropbox clients (but remain as symbolic links on the original client). This behavior can be useful for bringing directories into your Dropbox, but can cause serious problems when the source and destination of the link are both within Dropbox. In this scenario the data becomes duplicated on other hosts and the subsequent semantics of the arrangement are unclear and perhaps unspecified.
Consider a folder foo inside your Dropbox, and a symbolic link bar pointing to it.

Dropbox/
    foo/
    bar -> foo

Let's say we introduce foo and bar on our Mac laptop, let it sync, and then switch to our Linux desktop. On the Linux machine, foo and bar appear to be completely separate directories. Indeed, if we put the Mac laptop to sleep, these directories remain independent. We can then make changes on the Linux desktop. But when the Mac laptop reconnects it of course still has bar as a symbolic link, and views the directories as one. It will (hopefully) serve as a tunnel between the two directories, reconciling the changes that were made to each. Then the reconciled directories are pushed back to the server and appear identical again on the Linux desktop.

Note that if there are conflicts, how those changes are applied may depend on the order that the Dropbox client happens to sync foo vs. bar on the Mac client. Yet conflicts are the bread and butter of synchronization solutions so this is not unexpected.

This "delayed mirroring" semantics may be acceptable, but as of this writing (Feb 2011) it is implemented inconsistently. If both clients are online and a new file is introduced within foo or bar on the Linux client, it may or may not appear in its counterpart directory. Probably the use of file system monitoring is responsible for the discrepancies, and forcing dropbox to restart (and re-index) will fix transient problems.

A Workaround
Of course you can always simply avoid putting folders with symlinks in Dropbox, but if you must open this pandora's box, then you can either live with the situation described above or you can go around Dropbox and try to synchronize the symlinks yourself. Fortunately, you can do this because Dropbox does not damage symlinking arrangements on a particular client once they exist.
Usually, a path's status as symlink or folder does not change often, so it is sufficient to perform a non-dropbox synchronization only occasionally and allow Dropbox to perform the day to day synchronization. One option for manually synchronizing directories that respects symbolic links is the Unison file synchronizer.

Possible Future Solutions to Consider

A votebox entry on symbolic links can be found here.

One proposal would be a hybrid approach --- to follow symbolic links that point outside of the dropbox folder but preserve those that are inside.

The problem with Windows support remains a major one. (Again, Unison can serve as a point of reference here.) But even if internal symbolic links inside a Dropbox are unusable on Windows clients, this may be preferable to the correctness and data integrity problems that can result from the current behavior.

No comments:

Post a Comment