03 February 2008

Delete an S3 Bucket Containing Thousands of Files

A backup bucket I had on Amazon S3 (via Jungle Disk) had gotten out of control and was costing me too much. I decided to kill it off completely and take a different approach. So, I wanted to delete the bucket. I thought, easy, I'll just use Interarchy, pick the bucket, hit delete and be done. Nope. Interarchy kept choking on some of the files. I tried a few times. Turned to Transmit, failed as well. So, instead, a few lines of Ruby, via the aws-s3 gem, and I was done, with one caveat.

Assuming you have the aws-s3 gem installed, I just used IRB to do it. First there was setting up the connection, and then finding the bucket in question:


require 'aws/s3'

AWS::S3::Base.establish_connection!(:access_key_id => 'put-yer-key-here', :secret_access_key => 'put-ye-ole-secret-key-here')

# Find the bucket the blunt way, by getting a list of the few I have, then picking the particular one (for me it was the second one in the array):

buckets = AWS::S3::Service.buckets
evil_bucket = buckets[1]


Then, you need to blow everything away. According to the docs, you should just be able to delete the bucket, passing the ":force => true" option, but that didn't work for me - it complained that the bucket didn't exist. So, instead I decided I'd delete everything in the bucket using the Bucket#delete_all call. That appeared to work, but it wasn't actually empty. Then I found that the library only pulls down 1000 files at a time (this is a standard S3 limitation in a listing call - although you'd think the library would realize this and loop until it was truly done), so it was only deleting 1000 files. So, the trick then was to do:


while !evil_bucket.empty?
puts "."
evil_bucket.delete_all
end
evil_bucket.delete


I have the puts in there, simply to observe progress (and it's also sort of fun to see how many files there really were). Obviously this is quick and dirty, but it wound up being far more effective, and nearly as simple as I had expected Interarchy to be. One note, if you do have a lot of files, this is not something that goes quick - it was taking about a minute per 1000 files (i.e. per delete_all call) on my system.

4 comments:

Saurabh said...

Chris:

If you have to do something one time like these deletes, and do not need a reusable code in an automate program, you can try our product, Bucket Explorer. It should be able help with these S3 operations.

Saurabh

Henrik N said...

Thanks, had the same situation.

Picked the bucket with this instead:

AWS::S3::Service.buckets.find {|b| b.name =~ /foo/ }

since picking by index is scary.

Chris said...

Good call Henrik, yes, if you have a lot of buckets that'll be more effective for sure (I had less than 10 buckets), and is obviously a more robust/flexible solution.

Jonathan S said...

Or AWS::S3::Bucket.find('foo') -- let the web service do the work.