11 March 2008

SVN Externals are Evil; Use Piston or Braid

I've recently spent a considerable amount of time rectifying problems caused by SVN externals. In one of the codebases I work on, it had been developed with a heavy number of Rails plugins as SVN externals. In general, it was a good approach as these were external code, or shared code, etc. This I think is at least better than directly checking the code in, as you have a more precise record of where it's from, etc. I should also note that our externals were all set to specific tags or branches specific to our code (i.e. not to trunk, where you'd be getting updates without your control). Sounds good, what about this "evil"?

The problem comes in when you need to make changes to the code of an external. You might think, well, go change the root code and then adjust your tag, etc. In some cases you can't do that - maybe it's not code you have commit rights to, or you're making a change that's specific to your app and can't be done another way, or, as was often in the case I had, we were on a much older version, and the trunk and other tags had major differences that I didn't want to integrate.

Thus, what I needed to do was remove this as an external, and check the code in directly. Another approach would be to branch it from where you were and modify that, etc. I wasn't able to do that due to various Subversion permissions (probably not a common case, but I had no choice). This action itself (remove external, add code) is not a real problem in SVN. But, it IS a problem when you go to update. A simple "svn up" on other machines failed. That is pathetic. Instead, what I had to do was go delete the existing (svn externaled) directories, then do "svn up". This of course broke our continuous integration server, and I also had to go manually fix this up on machines I was deploying to. Crappy, but if that was the end of it, I'd probably not be as unhappy...

When it comes to merging these kinds of changes into branches, watch out! This is where SVN just flails. First if you happen to use svnmerge.py to manage your branch merging, forget it. It just can't deal with it, and will leave you with a partially complete merge. Doing it manually, even with things like --ignore-ancestors, does not work either. I had to do something similar as to the "svn up" fix: I had to go in and delete all the directories that were previously svn externals, and then do my merge. And note, do NOT delete the parent directories. For example, if all of your Rails app's plugins were externals, do go and nuke "vendor/plugins". It will then be totally confused and just not do anything, and fail. Nope, you need to specifically delete each offending svn external directory. I make extensive use of branches (I do most work on a branch for daily work), so you can multiply these problems across the number of branches you might need to be merging to, etc.

Having said all that, this problem isn't really all that illogical. I don't know how SVN works internally, but the whole svn:externals thing seems a bit like a hack, or at least not a first class citizen in SVN land. SVN merge or update, should be able to see: hey, you were up to date (for your current revision) on directory X, but this update is going to replace that with new code with the same dir name. But, it doesn't, maybe because it doesn't look at the externals properly in relation. I don't know, and I don't care, since it's broken, and my fix is that I'm moving to Git soon enough :) Also, as another point of view, I know Perforce handles this kind of thing just fine (we used remote mounted Perforce depots all the time at Adobe, and made seriously extensive use of branches (in fact, we required working on a branch)).

Now that I've spent entirely too much time on the build-up, what's the solution? Simple: use Piston (or Braid if using Git). What Piston does, is to not use svn:externals, and instead check the code in directly, yet maintain linkage to the external it came from. My take is this is really probably how svn:externals should've worked (I presume that constantly updating an external is actually a rarely desired trait). You import an svn external using Piston, and it will pull the latest code from whatever SVN URL you supply. In this case, you could use trunk, or you could as usual use a tag or branch. But then it's fixed - it will not update that anytime you do "svn update". Instead, it is up to you to explicitly tell it to update. This avoids svn externals as far as your daily operations go, and also causes zero problems for merges. It does more though.

The second benefit of Piston is that you can then modify the external code, but still bring down updates from the external, allowing a synergy between using external code and your app's specific needs. This is exactly what I needed on a couple of plugins we use, where those plugins' code had deviated significantly from our codebase so I couldn't use a newer version, but I needed to make some changes.

To summarize, the evil is SVN itself not handling changing of externals (i.e. to/from an external) in basic operations like updates and merges, which may cause a lot of manual work on your end, and break automated builds or similar. The solution: use Piston or Braid and get the best of everything.


Justin George said...

My trick, when I had to do this, was to create a new repo, check in the forked plugin code there, and simply change the URL to point to the new repo (presumably on a tag).

Then you've effectively moved the code into your control, but you don't have to worry about plugin changes polluting your main app's repo.

Chris said...

That's actually what we had previously - and the svn externals were all to other repos in our SVN (mix of our own plugins, and others). But, that doesn't get around the general issues caused by svn externals. If you ever have to remove that external and put code in with the same dir name, SVN just can't seem to cope very well. The Piston approach seems to be a much nicer approach, and you could even combine it with what you mention, which is what I do - we still maintain the code as SVN repos of their own, but instead of pointing to it as an external, I use Piston. Then, if I update that code, and make a new tag or whatever, I can simply move to the new code via a "piston switch" or "piston update" (if on trunk or on a non-tagged branch, etc.).

Even cooler, is the ability with Piston to make changes, and still be able to update while preserving those changes. I just used that today. I have a plugin that's Pistonized. I needed to make a small tweak to it, which I had done. Then today, I needed to update that plugin code, and just used Piston to "switch" to a new tag, but it took care of merging the changes I'd made previously with the new code. Very nice. Externals won't do that.

Anonymous said...

We have a suite of large C++ projects, with many shared modules between them. We have been using svn:externals to assemble a project tree from modules, but this creates its own headaches when (for example) you want to tag a release (comprising all modules). So thanks for the tip on Piston, it looks like it may do what we need.

Ultimately we will be migrating to Mercurial and using its Forest extension to manage this complexity. Subversion simply has too many limitations for us, and Mercurial is faster and more flexible. We tried git but it is way too complex.

Chris said...

Anonymous: I think your move to Mercurial will be great. I am actually almost 100% moved to Git now. I see people say it's complex, but honestly, I don't get that. It seems, for standard workflows, to be almost no different than using SVN or others. And, for things like branching and merging, it is far easier. There are a lot of more complex, powerful things you can do too, that take a bit more study, but for your typical day-to-day work, it seems definitely no harder.

With Git, what I use now is its "submodules", which is a built in feature, that is roughly comparable to svn:externals. This works for similar cases as svn:externals, but in cases where I need to modify the external, I then just clone the external item's Git repository, and then submodule that clone. This allows me to make code changes at will, but still get the benefit of the submodule where it's just referencing the "external" code. Submodules can use branches and so on, so you have a lot of choices on the approaches here.

Finally, throw GitHub on top of it all and you really have a stellar system. I use GitHub for all my projects, yet none of them are public projects (yet). It's still a great additional layer on top, plus the coordination with other people's Git/GitHub repos is really great.

Anyway, either way you slice it, the word is out, and SVN's days are numbered (although at this time, that's still a high number ;-)

Phil said...

SVN externals are evil when you need to modify some code in them and you don't have sufficient permissions.

I'm also looking at moving to GIT but as yet have not made the transition.

pisi said...

your piston link may be invalid. the new one could be http://piston.rubyforge.org/.