I worked out how to retroactively annex a large file that had been checked into a git repo some time ago. I thought this might be useful for others, so I am posting it here. Suppose you have a git repo where somebody had checked in a large file you would like to have annexed, but there are a bunch of commits after it and you don't want to loose history, but you also don't want everybody to have to retrieve the large file when they clone the repo. This will re-write history as if the file had been annexed when it was originally added. This command works for me, it relies on the current behavior of git rm tag which is to use a directory named. If the behavior of git git rm tag, you can specify the directory to use with the -d option. Based on the hints given here I've worked on a filter to both annex and add urls via filter-branch: Git rm tag script above is very specific but I think there are a few ideas that can be used in general, the general structure is. One thing I noticed is that git-annex needs to checksum each file even if they were previously annexed rather obviously since there is no general way to tell if the file is the same as the old one without checksummingbut in the specific case that we are replacing files that are already in git, we do actually have the sha1 checksum for each file in question, which could be used. So, trying to work with this, I wrote a filter script that starts out annexing stuff in the first commit, and continously writes out sha1filenamegit-annex-object triplets to a global file, when it then starts with the next commit, it compares the sha1s of the index with those of the global file, and any matches are manually symlinked directly to the corresponding git-annex-object without checksumming. I've done a few tests and this seems to be considerably faster than letting git-annex checksum everything. This is from a git-svn import of the free software Red Eclipse game project, there are approximately 3500 files images, maps, models, etc. Hope it might be useful for someone else wrestling with filter-branch and git-annex I recently had the need of re-kind-of-annexing an unusually large repo one of the largest. Attaching the link here as I feel it might be helpful for very large projects where git-filter-branch can become prohibitively slow Hmm, guyz. Are you serious with these scripts. Wow, scary Dilyin's comment is scary. It suggests bad things can happen, but is not very clear. Bloated history is one thing. Obviously broken repo is bad but can be slowly recovered from remotes. More common than it seems There's a case probably more common than people actually report: mistakenly doing git add instead of git annex add and realizing it only after a number of commits. Doing git annex add at that time will have the file duplicated regular git and annex. Extra wish: when doing git annex add of a file that is already present in git history, git-annex could notice and tell. Can anyone elaborate on the scripts provided here, are they safe. What can happen if improperly used or in corner cases. Despite the warning, I'm not dead yet. There's much more to do than the one-liner. There I also did git annex init, git-annex found its state branches. There, filter-branch operation finished in 90s first try, 149s second try. Practicing reduction on clone This produced no visible benefit: time git gc --aggressive time git repack -a -d Even cloning and retrying on clone. Joey, does this seem okay?.