In the process of moving all of my new code to GitHub, I started thinking about how I wanted to manage things like my IPython notebooks. Previoulsy, with my code on Bitbucket, I created a single hg repo for all of my notebooks on our cluster. Given that the notebooks have to live in a single directory for every instance of the notebook, I did that to avoid having to run multiple notebook processes for each of the notebooks I was working on.

It occured to me today that there is another possible solution to this, and one I’ll be using for all of my new code (which will be on GitHub). The process looks like this, and seems much less messy from a VCS standpoint.

  1. Launch a notebook process on the cluster (or use an existing one)
  2. Create a new notebook file
  3. Create a new directory on the cluster to house the notebook file (with touched .gitignore and files)
  4. Move the notebook to this directory
  5. Create a new repo on GitHub
  6. git add ., git commit, git remote add origin, git push origin master, etc.
  7. Symlink the .ipynb file back into the directory from where the notebook is running.

Seems to work great. Note that deleting the notebook from the Dashboard will only delete the symlink, not the notebook itself. Maybe a safety feature, maybe an annoyance, but only time will tell. Simple enough.