04 February 2009

MySQL Performance Issues and acts_as_versioned

Recently we ran into an interesting performance issue with MySQL. We have an automated process we run at night a few nights a week that does data harvesting for hotel rates and such. This data is versioned so that we can look at historical values. However, this script had begun to really crawl. Originally it took a couple hours to run. But it had gotten to the point where it could take almost a day.



I tracked this down to being a SQL MAX call used by acts_as_versioned to determine the next version for one of these records. The problem is that it had to sift through nearly 10 million records. In testing this on my local machine, just one of these SQL queries could take 45 seconds! Think about doing this across oh say 100,000 hotels, ya, not good.



The good folks at GitHub ran into this same thing (with a table of 36M records) on nearly the same day. Their approach is similar to the approach I'll be taking on another table (which isn't currently affecting us this way, but will have different benefits), which was to split it into two tables, one with older data. I could have done this, and would have, but the reality was that we simply didn't need to keep these versions, as we weren't using the data. So, luckily, I was able to just no longer version this particular model, and throw out that table. After doing that, I ran the script and it took just over an hour. Yea!



So, this is something to note if you use acts_as_versioned with models that have frequent changes and a decent number of those models to begin with (think multipliers). One of the things I'll be looking into in the future is whether that MAX needs to get done, or whether acts_as_versioned can be smarter about how it does it. On first glance you'd think you could just use the version number on the original model itself, but that number isn't guaranteed to be the latest number, since you can rollback versions and so on.