Backing up your MySQL database to S3

Here’s a simple recipe, with complete code available on Github (written in Ruby), to automatically back up your MySQL database to Amazon’s S3 storage service, with regular incremental backup.

This article appears in The S3 Cookbook, an e-book written by Scott Patten. It is is based on the automatic MySQL backup in EC2 on Rails.

Solution

You might expect that you could simply upload the MySQL database files to S3. That could work if all your tables were MyISAM tables (assuming you did LOCK TABLES and FLUSH TABLES to make sure the database files were in a consistent state), but it won’t work for InnoDB tables. A more general approach is to use the mysqldump tool to back up the full contents of the database and then use MySQL’s binary log for incremental backups.

The binary log contains all changes to the database that are made since the last full backup, so to restore the database you first restore the full backup (the output from mysqldump) and then apply the changes from the binary log.

Here are Ruby scripts for doing the full backup, incremental backup, and restore.

The full backup script uses mysqldump to do the initial full backup and uploads it’s output to S3. It assumes the bucket is empty.

full_backup.rb:

#!/usr/bin/env ruby

require "common"

begin
  FileUtils.mkdir_p @temp_dir

  # assumes the bucket's empty
  dump_file = "#{@temp_dir}/dump.sql.gz"

  cmd = "mysqldump --quick --single-transaction --create-options -u#{@mysql_user}  --flush-logs --master-data=2 --delete-master-logs"
  cmd += " -p'#{@mysql_password}'" unless @mysql_password.nil?
  cmd += " #{@mysql_database} | gzip > #{dump_file}"
  run(cmd)

  AWS::S3::S3Object.store(File.basename(dump_file), open(dump_file), @s3_bucket)
ensure
  FileUtils.rm_rf(@temp_dir)
end

Once the full backup has been done, the following script can be run frequently (perhaps every 5 or 10 minutes) to rotate the binary log and upload it to S3. It must be run by a user that has read access to the MySQL binary log (see the Discussion section for details on configuring the MySQL binary log path).

incremental_backup.rb:

#!/usr/bin/env ruby

require "common"

begin
  FileUtils.mkdir_p @temp_dir
  execute_sql "flush logs"
  logs = Dir.glob("#{@mysql_bin_log_dir}/mysql-bin.[0-9]*").sort
  logs_to_archive = logs[0..-2] # all logs except the last
  logs_to_archive.each do |log|
    # The following executes once for each filename in logs_to_archive
    AWS::S3::S3Object.store(File.basename(log), open(log), @s3_bucket)
  end
  execute_sql "purge master logs to '#{File.basename(logs[-1])}'"
ensure
  FileUtils.rm_rf(@temp_dir)
end

The following script restores the full backup (mysqldump output) and the subsequent binary log files. It assumes the database exists and is empty.

restore.rb:

#!/usr/bin/env ruby

require "common"

# Retrieve a single file from S3
def retrieve_file(file)
  key = File.basename(file)
  AWS::S3::S3Object.find(key, @s3_bucket)

  open(file, 'w') do |f|
    AWS::S3::S3Object.stream(key, @s3_bucket) do |chunk|
      f.write chunk
    end
  end
end

# List the files matching filename_prefix in the S3 bucket
def list_keys(filename_prefix)
  AWS::S3::Bucket.objects(@s3_bucket, :prefix => filename_prefix).collect{|obj| obj.key}
end

# Retrieve the files matching filename_prefix in the S3 bucket
def retrieve_files(filename_prefix, local_dir)
  list_keys(filename_prefix).each do |k|
    file = "#{local_dir}/#{File.basename(k)}"
    retrieve_file(file)
  end
end

begin
  FileUtils.mkdir_p @temp_dir

  # download the dump file from S3
  file = "#{@temp_dir}/dump.sql.gz"
  retrieve_file(file)

  # restore the dump file
  cmd = "gunzip -c #{file} | mysql -u#{@mysql_user} "
  cmd += " -p'#{@mysql_password}' " unless @mysql_password.nil?
  cmd += " #{@mysql_database}"
  run cmd

  # download the binary log files
  retrieve_files("mysql-bin.", @temp_dir)
  logs = Dir.glob("#{@temp_dir}/mysql-bin.[0-9]*").sort

  # restore the binary log files
  logs.each do |log|
    # The following will be executed for each binary log file
    cmd = "mysqlbinlog --database=#{@mysql_database} #{log} | mysql -u#{@mysql_user} "
    cmd += " -p'#{@mysql_password}' " unless @mysql_password.nil?
    run cmd
  end
ensure
  FileUtils.rm_rf(@temp_dir)
end

The previous three scripts (full_backup.rb, incremental_backup.rb, and restore.rb) all include config.rb which contains all user-specific configuration and common.rb which defines some common functions:

config.rb:

@mysql_database = "your-mysql-database"
@mysql_user = "root"
@mysql_password = "password"
@s3_bucket = "your-s3-bucket-name"
@aws_access_key_id = "your-aws-access-key"
@aws_secret_access_key = "your-aws-secret-access-key-id"
@mysql_bin_log_dir = "/var/lib/mysql/binlog"
@temp_dir = "/tmp/mysql-backup"

common.rb:

require "config"
require "rubygems"
require "aws/s3"
require "fileutils"

def run(command)
  result = system(command)
  raise("error, process exited with status #{$?.exitstatus}") unless result
end

def execute_sql(sql)
  cmd = %{mysql -u#{@mysql_user} -e "#{sql}"}
  cmd += " -p'#{@mysql_password}' " unless @mysql_password.nil?
  run cmd
end

AWS::S3::Base.establish_connection!(:access_key_id => @aws_access_key_id, :secret_access_key => @aws_secret_access_key, :use_ssl => true)

# It doesn't hurt to try to create a bucket that already exists
AWS::S3::Bucket.create(@s3_bucket)

Discussion

To enable binary logging make sure that the MySQL config file (my.cnf) has the following line in it:

log_bin = /var/db/mysql/binlog/mysql-bin

The path (/var/db/mysql/binlog) can be any directory that MySQL can write to, but it needs to match the value of @mysql_bin_log_dir in config.rb.

Note for EC2 users: The root volume (”/”) has limited space, it’s a good idea to use /mnt for your MySQL data files and logs.

The MySQL user needs to have the “RELOAD” and the “SUPER” privileges, these can be granted with the following SQL commands (which need to be executed as the MySQL root user):

GRANT RELOAD ON *.* TO 'user_name'@'%' IDENTIFIED BY 'password';
GRANT SUPER ON *.* TO 'user_name'@'%' IDENTIFIED BY 'password';

(Replace user_name with the value of @mysql_user in config.rb).

You’ll probably want to perform the full backup on a regular schedule, and the incremental backup on a more frequent schedule, but the relative frequency of each will depend on how large your database is, how frequently it’s updated, and how important it is to be able to restore quickly.  This is because for a large database mysqldump can be slow and can increase the system load noticeably, while rotating the binary log is quick and inexpensive to perform. But if your database changes normally contain many updates (as opposed to just inserts) it can be slower to restore from the binary logs.

To have the backups run automatically you could add something like the following to your crontab file, adjusting the times as necessary:

crontab:

# Incremental backup every 10 minutes
*/10 * * * *  root  /usr/local/bin/incremental_backup.rb
# Full backup every day at 05:01
1 5 * * *  root  /usr/local/bin/full_backup.rb

Before this can work however, two small details must be taken care of, which have been left as an exercise for the reader:

  1. When the full backup runs it should delete any binary log files that might already exist in the bucket. Otherwise the restore will try to restore them even though they’re older than the full backup.
  2. The execution of the scripts should not overlap. If the full backup hasn’t finished before the incremental starts (or vice versa) the backup will be in an inconsistent state.

12 Comments

  1. Marston A.:

    Hey Paul,

    Great article. I’m wondering how easy/hard it would be to roll encryption into this (PGP, maybe openssl etc) to encrypt the tar dumps before putting them on S3.

  2. Xavier Shay:

    I’ve used this code as a base for a rails plugin that provides rake tasks: http://github.com/xaviershay/db2s3/tree/master

    Major differences are using ruby’s Tempfile rather than managing your own temp dir, externalizing config so you can store it safely in source control, and some specs to make sure it all works.

  3. Tony Primerano:

    Some notes.

    In my.cnf I had log_bin only which results in the binary logs being named mysqld.bin thus I needed to edit incremental_backup.rb and restore.rb

    I was concerned that there was only one full backup in s3. Isn’t it possible for my DB to be damaged and then overwrite the good copy in s3? Assuming the dump is done daily I moved the last dumpfile before pushing the latest one.

    last_dump = File.basename(dump_file) + (Date.today-1).wday.to_s
    # allows for 7 days of dumps to be in S3
    begin
    AWS::S3::S3Object.rename(File.basename(dump_file), last_dump, @s3_bucket)
    rescue AWS::S3::NoSuchKey
    puts last_dump + ” did not exist. continuing”
    end

    If someone accidentally deleted the database before the full backup would we push an empty dump file? Just in case I also check the size of the dump and only push to s3 if the size is greater than an arbitrary 1000 bytes

    I also added code to remove the binary logs from S3 on full backup.. I have not added locking yet to prevent the scripts from overlapping.. Any reason why you didn’t share the whole solution? :-)

    Another question: what happens if AWS::S3::S3Object.store fails. Seems like that would be grounds for a retry. (another reason why I like having 7 days of backups in s3. hopefully one is good. )

  4. Paul Dowman:

    Tony:

    Thanks for pointing out that the full backup should delete the binary logs. That’s a bug, I’ll try to find time to fix it.

    You’re right, when any of the S3 calls fail we should retry. I recently switched the backup feature in EC2 on Rails (which this example code is based on) to use the right_aws S3 library instead of aws/s3 because it automatically retries a few times with an increasing pause between failures.

    Regarding the other issues: I look at this as just mirroring the MySQL data on S3, not archiving it. If you screw up your database in any way then as you pointed out you’ll be backing up your screwed up version. I think it’s a good idea to archive it as well, and I archive mine regularly (but to a different S3 bucket entirely).

    I’m not sure what you mean about having “log_bin only”.

  5. Tony Primerano:

    By “log_bin only” I mean that I didn’t specify a path
    Instead of
    log_bin = /var/db/mysql/binlog/mysql-bin
    I just have
    log_bin

    This resulted in the path and filename being different.

    Thanks for the code. This is was a great starting point. Is the updated code available outside of installing ec2onrails? I recently moved off AWS and I now use Mosso for everything.

  6. Paul Dowman:

    I’ve been meaning to make it into a RubyGem, but haven’t had time.

  7. Tony:

    Hi Paul,

    Thanks for your useful script first,but I have a question with it
    See:
    @mysql_database = “your-mysql-database”

    Is it means that you can only backup a single database with this script?

    If I want to backup all the databases,what should I change with your script?

    Thanks

  8. Paul Dowman:

    That’s correct, this will just do a single database, but I think it would be a pretty simple change to do multiple databases. Check the documentation for the mysqldump command, I think it can take an “–all-databases” parameter.

  9. Eliot Sykes:

    mysqldump also takes a –databases option which lets you name each database to dump. For 3 databases named db_name1, db_name2, and db_name3:

    mysqldump –databases db_name1 db_name2 db_name3

  10. Eliot Sykes:

    That should be two hypens before the databases option, not a single long dash, “- -” without the space between.

  11. Jeevan:

    Nice article.

    shell script option http://chetanjm.wordpress.com/2009/07/16/simple-s3-backup-database-and-code-script-using-s3cmd/

  12. Julien:

    Awesome! I’ve been looking for that…
    Just one little question though : I have a Master Master replication. Is there any chance that this breaks the replication? (I am thinking that flushing the logs might be bad). If so, is there any solution to that?

Leave a comment