17 March 2008

Another Reason to NOT Put Seed Data in Rails Migrations

I discussed an approach I took recently to getting standard or seed data into your app. While I've used Migrations quite successfully in the past for this, I am no longer doing so. And, on an older app, I just got bit by it. So, here's another reason not to do it...

Now that I've run into this, it's extremely obvious, but: If you change model code for a model which is used to create records in previous migrations, you can easily break those prior migrations. This won't matter when you have existing databases you are migrating, but it will matter if you need to create a database and migrate it from scratch (maybe your Continuous Integration server does that for example, or you are simply setting up a new DB in your development environment).

For example, the case that bit me was that I recently changed a model, that had some data created by migrations, to specify "acts_as_list". In doing so, I created a new migration that added the position column - an attribute that gets filled in automatically for your model when you create objects of that type. However, when recreating the database and running up through the older migrations, the prior seed data failed, since the position column did not yet exist, yet the model's code was trying to populate it.

Luckily I was actually adding an administrative interface to managed CRUD and other ops on this particular model, and as part of that, no longer needed the seed data anyway, so was able to just nuke that from the older migrations (and luckily no tests depended on it, and production and staging systems were well past those migrations).

12 comments:

Phil said...

What I like to do is just stub out the parts of the model that the migrations rely on. This is good practice anyway, even for migrations that do not contain seed data, but if you are able to be consistent about it I believe it solves the seed data problem.

Chris said...

I've seen that solution/technique as well (model stubbing in the migration). But it just doesn't rub me the right way. It's not DRY, and it seems like a hack.

With Rails 2, what I've been doing, as per my other post, is creating the data in an initializer. I could do a similar approach with Rails 1, but since I only have one app left on Rails 1, and haven't needed it, I don't.

I also like the DB fixtures approach (don't have the link handy at the moment). This is actually very nice, and the only downside is that you'd have to add the code to your Capistrano deployment scripts to ensure it got run appropriately on deploy, but that wouldn't be much work obviously.

Anonymous said...

I'm with Phil on this one, I prefer to add skeleton classes to migrations to solve the this.

The DRY violation is actually an illusion. The old migrations do not belong with your current source. They relate to your old source, which is stored as a previous version in your source control system. Nobody ever said source control was not DRY because it kept historical records.

A solution to this is to store migration steps on a per-branch basis, and this is how I did it in a migration system I built a few years back. That way there is no doubt which version of the code the models relate to.

Chris said...

I agree that the old migrations don't belong with current code, that was part of my point - was that you have one facility: DB migrations, that handle one area of change in your app, but they aren't in sync with other code areas of your app that they have a relationship with.

The skeleton approach is certainly a reasonable way to go, it just doesn't seem to sit well with me, I think because I feel like it's misleading (this isn't necessarily rational or whatever, just a "feeling" if you will). I'm also, at least lately, feeling I like the idea of having my standard/seed data live outside migrations, and keeping a narrower purpose to migrations (of just being the DB schema).

I think it will be interesting to see over the next year or so how migrations and seed data, etc. develop in the Rails space. There are a lot of solutions out there right now, none seem perfect (as if anything ever is).

Thanks for the feedback guys.

Anonymous said...

I see your point, but to me, moving seed data outside migrations feels like it is being treated as second class, when in fact it is as essential to the app as the schema.

I tend to categorise data into two forms: static and transient. Static data is stuff that the code depends on, ie it has some identifying column that the code searches for. This is the ONLY stuff I put in migrations. Transient data is everything else, stuff that the app merely processes and presents, nothing that is involved in the app logic.

I'm busy working on a contract now, but I hope when I've finished I'll have enough time to write a migration system. I can't stand AR migrations - in fact I'm not a big fan of AR at all. There's definitely a need for something better.

Chris said...

Ash, that makes sense, agreed. In general the area needs help, and as you said, other approaches/solutions will be interesting to see. Not something I have time to do right now either.

As for AR in general, I'm mostly ok with it, in that it makes things easy, but performance could be improved. I'm hoping to do a small project in Merb soon, and I'm thinking when I do that I'll try out a different ORM/data/model solution just to get another viewpoint. AR though, after having used Hibernate, is just infinitely more pleasant and pretty much removes the need for a person :)

Anonymous said...

I wanna use Merb soon too :) If you are trying Merb, why not give DataMapper a shot? You could always use AR Migrations to do the database and DataMapper as an ORM. DataMapper has an identity map which is pretty cool, it means an object is only loaded once from the database (better performance, and avoids annoying issues where you are working on two different objects that you think are the same thing).

Chris said...

Yep, DataMapper is on the list. I also want to look at Thrift, and Amazon's SimpleDB (which I now have access to). It'll be an exploratory project, to learn about these technologies and pros and cons. Performance and system resource utilization will be key things I'm looking at because I want to understand how to maximize a deployment for something like Facebook where you may have a ton of traffic, and you may or may not have a ton of income off it (if you do ads right or whatever, then great, but we'll see).

Chris said...

So, for those of you using migrations to add standard data, how do you deal with this for the test environment? If I run "rake test" in a Rails 2.x app, it blows away all data in test, and thus I don't have my standard data.

Anonymous said...

nice post! I agree to you,thanks for the information share this or your files to this site http://filestance.com/
so that your document where not lost.thanks

Steve McCaine said...

interesting points. I prefer to add skeleton classes to migrations to solve the this.

kurtis

http://kurtis647.blogspot.com/

Chris said...

@steve Ya, I'm leaning towards taking that approach next time as well. After various discussion here and with some other folks (thanks for all the comments folks!) that seems to be the best route. I don't particularly like it, but it seems to be the most reliable solution.