Backing up your MySQL database to S3
Here’s a simple recipe, with complete code available on Github (written in Ruby), to automatically back up your MySQL database to Amazon’s S3 storage service, with regular incremental backup.
This article appears in The S3 Cookbook, an e-book written by Scott Patten. It is is based on the automatic MySQL backup in EC2 on Rails.
Solution
You might expect that you could simply upload the MySQL database files to S3. That could work if all your tables were MyISAM tables (assuming you did LOCK TABLES and FLUSH TABLES to make sure the database files were in a consistent state), but it won’t work for InnoDB tables. A more general approach is to use the mysqldump tool to back up the full contents of the database and then use MySQL’s binary log for incremental backups.
The binary log contains all changes to the database that are made since the last full backup, so to restore the database you first restore the full backup (the output from mysqldump) and then apply the changes from the binary log.
Here are Ruby scripts for doing the full backup, incremental backup, and restore.
The full backup script uses mysqldump to do the initial full backup and uploads it’s output to S3. It assumes the bucket is empty.
#!/usr/bin/env ruby
require "common"
begin
FileUtils.mkdir_p @temp_dir
# assumes the bucket's empty
dump_file = "#{@temp_dir}/dump.sql.gz"
cmd = "mysqldump --quick --single-transaction --create-options -u#{@mysql_user} --flush-logs --master-data=2 --delete-master-logs"
cmd += " -p'#{@mysql_password}'" unless @mysql_password.nil?
cmd += " #{@mysql_database} | gzip > #{dump_file}"
run(cmd)
AWS::S3::S3Object.store(File.basename(dump_file), open(dump_file), @s3_bucket)
ensure
FileUtils.rm_rf(@temp_dir)
end
Once the full backup has been done, the following script can be run frequently (perhaps every 5 or 10 minutes) to rotate the binary log and upload it to S3. It must be run by a user that has read access to the MySQL binary log (see the Discussion section for details on configuring the MySQL binary log path).
#!/usr/bin/env ruby
require "common"
begin
FileUtils.mkdir_p @temp_dir
execute_sql "flush logs"
logs = Dir.glob("#{@mysql_bin_log_dir}/mysql-bin.[0-9]*").sort
logs_to_archive = logs[0..-2] # all logs except the last
logs_to_archive.each do |log|
# The following executes once for each filename in logs_to_archive
AWS::S3::S3Object.store(File.basename(log), open(log), @s3_bucket)
end
execute_sql "purge master logs to '#{File.basename(logs[-1])}'"
ensure
FileUtils.rm_rf(@temp_dir)
end
The following script restores the full backup (mysqldump output) and the subsequent binary log files. It assumes the database exists and is empty.
#!/usr/bin/env ruby
require "common"
# Retrieve a single file from S3
def retrieve_file(file)
key = File.basename(file)
AWS::S3::S3Object.find(key, @s3_bucket)
open(file, 'w') do |f|
AWS::S3::S3Object.stream(key, @s3_bucket) do |chunk|
f.write chunk
end
end
end
# List the files matching filename_prefix in the S3 bucket
def list_keys(filename_prefix)
AWS::S3::Bucket.objects(@s3_bucket, :prefix => filename_prefix).collect{|obj| obj.key}
end
# Retrieve the files matching filename_prefix in the S3 bucket
def retrieve_files(filename_prefix, local_dir)
list_keys(filename_prefix).each do |k|
file = "#{local_dir}/#{File.basename(k)}"
retrieve_file(file)
end
end
begin
FileUtils.mkdir_p @temp_dir
# download the dump file from S3
file = "#{@temp_dir}/dump.sql.gz"
retrieve_file(file)
# restore the dump file
cmd = "gunzip -c #{file} | mysql -u#{@mysql_user} "
cmd += " -p'#{@mysql_password}' " unless @mysql_password.nil?
cmd += " #{@mysql_database}"
run cmd
# download the binary log files
retrieve_files("mysql-bin.", @temp_dir)
logs = Dir.glob("#{@temp_dir}/mysql-bin.[0-9]*").sort
# restore the binary log files
logs.each do |log|
# The following will be executed for each binary log file
cmd = "mysqlbinlog --database=#{@mysql_database} #{log} | mysql -u#{@mysql_user} "
cmd += " -p'#{@mysql_password}' " unless @mysql_password.nil?
run cmd
end
ensure
FileUtils.rm_rf(@temp_dir)
end
The previous three scripts (full_backup.rb, incremental_backup.rb, and restore.rb) all include config.rb which contains all user-specific configuration and common.rb which defines some common functions:
@mysql_database = "your-mysql-database" @mysql_user = "root" @mysql_password = "password" @s3_bucket = "your-s3-bucket-name" @aws_access_key_id = "your-aws-access-key" @aws_secret_access_key = "your-aws-secret-access-key-id" @mysql_bin_log_dir = "/var/lib/mysql/binlog" @temp_dir = "/tmp/mysql-backup"
require "config"
require "rubygems"
require "aws/s3"
require "fileutils"
def run(command)
result = system(command)
raise("error, process exited with status #{$?.exitstatus}") unless result
end
def execute_sql(sql)
cmd = %{mysql -u#{@mysql_user} -e "#{sql}"}
cmd += " -p'#{@mysql_password}' " unless @mysql_password.nil?
run cmd
end
AWS::S3::Base.establish_connection!(:access_key_id => @aws_access_key_id, :secret_access_key => @aws_secret_access_key, :use_ssl => true)
# It doesn't hurt to try to create a bucket that already exists
AWS::S3::Bucket.create(@s3_bucket)
Discussion
To enable binary logging make sure that the MySQL config file (my.cnf) has the following line in it:
log_bin = /var/db/mysql/binlog/mysql-bin
The path (/var/db/mysql/binlog) can be any directory that MySQL can write to, but it needs to match the value of @mysql_bin_log_dir in config.rb.
Note for EC2 users: The root volume (”/”) has limited space, it’s a good idea to use /mnt for your MySQL data files and logs.
The MySQL user needs to have the “RELOAD” and the “SUPER” privileges, these can be granted with the following SQL commands (which need to be executed as the MySQL root user):
GRANT RELOAD ON *.* TO 'user_name'@'%' IDENTIFIED BY 'password'; GRANT SUPER ON *.* TO 'user_name'@'%' IDENTIFIED BY 'password';
(Replace user_name with the value of @mysql_user in config.rb).
You’ll probably want to perform the full backup on a regular schedule, and the incremental backup on a more frequent schedule, but the relative frequency of each will depend on how large your database is, how frequently it’s updated, and how important it is to be able to restore quickly. This is because for a large database mysqldump can be slow and can increase the system load noticeably, while rotating the binary log is quick and inexpensive to perform. But if your database changes normally contain many updates (as opposed to just inserts) it can be slower to restore from the binary logs.
To have the backups run automatically you could add something like the following to your crontab file, adjusting the times as necessary:
# Incremental backup every 10 minutes */10 * * * * root /usr/local/bin/incremental_backup.rb # Full backup every day at 05:01 1 5 * * * root /usr/local/bin/full_backup.rb
Before this can work however, two small details must be taken care of, which have been left as an exercise for the reader:
- When the full backup runs it should delete any binary log files that might already exist in the bucket. Otherwise the restore will try to restore them even though they’re older than the full backup.
- The execution of the scripts should not overlap. If the full backup hasn’t finished before the incremental starts (or vice versa) the backup will be in an inconsistent state.
Marston A.:
Hey Paul,
Great article. I’m wondering how easy/hard it would be to roll encryption into this (PGP, maybe openssl etc) to encrypt the tar dumps before putting them on S3.
18 February 2009, 1:09 pmXavier Shay:
I’ve used this code as a base for a rails plugin that provides rake tasks: http://github.com/xaviershay/db2s3/tree/master
Major differences are using ruby’s Tempfile rather than managing your own temp dir, externalizing config so you can store it safely in source control, and some specs to make sure it all works.
6 March 2009, 10:31 pmTony Primerano:
Some notes.
In my.cnf I had log_bin only which results in the binary logs being named mysqld.bin thus I needed to edit incremental_backup.rb and restore.rb
I was concerned that there was only one full backup in s3. Isn’t it possible for my DB to be damaged and then overwrite the good copy in s3? Assuming the dump is done daily I moved the last dumpfile before pushing the latest one.
last_dump = File.basename(dump_file) + (Date.today-1).wday.to_s
# allows for 7 days of dumps to be in S3
begin
AWS::S3::S3Object.rename(File.basename(dump_file), last_dump, @s3_bucket)
rescue AWS::S3::NoSuchKey
puts last_dump + ” did not exist. continuing”
end
If someone accidentally deleted the database before the full backup would we push an empty dump file? Just in case I also check the size of the dump and only push to s3 if the size is greater than an arbitrary 1000 bytes
I also added code to remove the binary logs from S3 on full backup.. I have not added locking yet to prevent the scripts from overlapping.. Any reason why you didn’t share the whole solution? :-)
Another question: what happens if AWS::S3::S3Object.store fails. Seems like that would be grounds for a retry. (another reason why I like having 7 days of backups in s3. hopefully one is good. )
30 June 2009, 3:59 pmPaul Dowman:
Tony:
Thanks for pointing out that the full backup should delete the binary logs. That’s a bug, I’ll try to find time to fix it.
You’re right, when any of the S3 calls fail we should retry. I recently switched the backup feature in EC2 on Rails (which this example code is based on) to use the right_aws S3 library instead of aws/s3 because it automatically retries a few times with an increasing pause between failures.
Regarding the other issues: I look at this as just mirroring the MySQL data on S3, not archiving it. If you screw up your database in any way then as you pointed out you’ll be backing up your screwed up version. I think it’s a good idea to archive it as well, and I archive mine regularly (but to a different S3 bucket entirely).
I’m not sure what you mean about having “log_bin only”.
6 July 2009, 12:43 pmTony Primerano:
By “log_bin only” I mean that I didn’t specify a path
Instead of
log_bin = /var/db/mysql/binlog/mysql-bin
I just have
log_bin
This resulted in the path and filename being different.
Thanks for the code. This is was a great starting point. Is the updated code available outside of installing ec2onrails? I recently moved off AWS and I now use Mosso for everything.
9 July 2009, 7:19 pmPaul Dowman:
I’ve been meaning to make it into a RubyGem, but haven’t had time.
11 July 2009, 6:32 amTony:
Hi Paul,
Thanks for your useful script first,but I have a question with it
See:
@mysql_database = “your-mysql-database”
Is it means that you can only backup a single database with this script?
If I want to backup all the databases,what should I change with your script?
Thanks
12 July 2009, 9:58 pmPaul Dowman:
That’s correct, this will just do a single database, but I think it would be a pretty simple change to do multiple databases. Check the documentation for the mysqldump command, I think it can take an “–all-databases” parameter.
13 July 2009, 7:03 amEliot Sykes:
mysqldump also takes a –databases option which lets you name each database to dump. For 3 databases named db_name1, db_name2, and db_name3:
mysqldump –databases db_name1 db_name2 db_name3
28 July 2009, 3:18 amEliot Sykes:
That should be two hypens before the databases option, not a single long dash, “- -” without the space between.
28 July 2009, 3:19 amJeevan:
Nice article.
shell script option http://chetanjm.wordpress.com/2009/07/16/simple-s3-backup-database-and-code-script-using-s3cmd/
25 August 2009, 6:52 amJulien:
Awesome! I’ve been looking for that…
23 September 2009, 1:17 pmJust one little question though : I have a Master Master replication. Is there any chance that this breaks the replication? (I am thinking that flushing the logs might be bad). If so, is there any solution to that?