Backing up your MySQL database to S3

Here's a simple recipe, with complete code available on Github (written in Ruby), to automatically back up your MySQL database to Amazon's S3 storage service, with regular incremental backup.

This article appears in The S3 Cookbook, an e-book written by Scott Patten. It is is based on the automatic MySQL backup in EC2 on Rails.

Solution

You might expect that you could simply upload the MySQL database files to S3. That could work if all your tables were MyISAM tables (assuming you did LOCK TABLES and FLUSH TABLES to make sure the database files were in a consistent state), but it won't work for InnoDB tables. A more general approach is to use the mysqldump tool to back up the full contents of the database and then use MySQL's binary log for incremental backups.

The binary log contains all changes to the database that are made since the last full backup, so to restore the database you first restore the full backup (the output from mysqldump) and then apply the changes from the binary log.

Here are Ruby scripts for doing the full backup, incremental backup, and restore.

The full backup script uses mysqldump to do the initial full backup and uploads it's output to S3. It assumes the bucket is empty.

full_backup.rb:

#!/usr/bin/env ruby

require "common"

begin
  FileUtils.mkdir_p @temp_dir

  # assumes the bucket's empty
  dump_file = "#{@temp_dir}/dump.sql.gz"

  cmd = "mysqldump --quick --single-transaction --create-options -u#{@mysql_user}  --flush-logs --master-data=2 --delete-master-logs"
  cmd += " -p'#{@mysql_password}'" unless @mysql_password.nil?
  cmd += " #{@mysql_database} | gzip > #{dump_file}"
  run(cmd)

  AWS::S3::S3Object.store(File.basename(dump_file), open(dump_file), @s3_bucket)
ensure
  FileUtils.rm_rf(@temp_dir)
end

Once the full backup has been done, the following script can be run frequently (perhaps every 5 or 10 minutes) to rotate the binary log and upload it to S3. It must be run by a user that has read access to the MySQL binary log (see the Discussion section for details on configuring the MySQL binary log path).

incremental_backup.rb:

#!/usr/bin/env ruby

require "common"

begin
  FileUtils.mkdir_p @temp_dir
  execute_sql "flush logs"
  logs = Dir.glob("#{@mysql_bin_log_dir}/mysql-bin.[0-9]*").sort
  logs_to_archive = logs[0..-2] # all logs except the last
  logs_to_archive.each do |log|
    # The following executes once for each filename in logs_to_archive
    AWS::S3::S3Object.store(File.basename(log), open(log), @s3_bucket)
  end
  execute_sql "purge master logs to '#{File.basename(logs[-1])}'"
ensure
  FileUtils.rm_rf(@temp_dir)
end

The following script restores the full backup (mysqldump output) and the subsequent binary log files. It assumes the database exists and is empty.

restore.rb:

#!/usr/bin/env ruby
    
require "common"

# Retrieve a single file from S3
def retrieve_file(file)
  key = File.basename(file)
  AWS::S3::S3Object.find(key, @s3_bucket)

  open(file, 'w') do |f|
    AWS::S3::S3Object.stream(key, @s3_bucket) do |chunk|
      f.write chunk
    end
  end
end

# List the files matching filename_prefix in the S3 bucket
def list_keys(filename_prefix)
  AWS::S3::Bucket.objects(@s3_bucket, :prefix => filename_prefix).collect{|obj| obj.key}
end

# Retrieve the files matching filename_prefix in the S3 bucket
def retrieve_files(filename_prefix, local_dir)
  list_keys(filename_prefix).each do |k|
    file = "#{local_dir}/#{File.basename(k)}"
    retrieve_file(file)
  end
end

begin
  FileUtils.mkdir_p @temp_dir

  # download the dump file from S3
  file = "#{@temp_dir}/dump.sql.gz"
  retrieve_file(file)

  # restore the dump file
  cmd = "gunzip -c #{file} | mysql -u#{@mysql_user} "
  cmd += " -p'#{@mysql_password}' " unless @mysql_password.nil?
  cmd += " #{@mysql_database}"
  run cmd

  # download the binary log files
  retrieve_files("mysql-bin.", @temp_dir)
  logs = Dir.glob("#{@temp_dir}/mysql-bin.[0-9]*").sort

  # restore the binary log files
  logs.each do |log|
    # The following will be executed for each binary log file
    cmd = "mysqlbinlog --database=#{@mysql_database} #{log} | mysql -u#{@mysql_user} "
    cmd += " -p'#{@mysql_password}' " unless @mysql_password.nil?
    run cmd
  end
ensure
  FileUtils.rm_rf(@temp_dir)
end

The previous three scripts (full_backup.rb, incremental_backup.rb, and restore.rb) all include config.rb which contains all user- specific configuration and common.rb which defines some common functions:

config.rb:

@mysql_database = "your-mysql-database"
@mysql_user = "root"
@mysql_password = "password"
@s3_bucket = "your-s3-bucket-name"
@aws_access_key_id = "your-aws-access-key"
@aws_secret_access_key = "your-aws-secret-access-key-id"
@mysql_bin_log_dir = "/var/lib/mysql/binlog"
@temp_dir = "/tmp/mysql-backup"

common.rb:

require "config"

require "rubygems"
require "aws/s3"
require "fileutils"

def run(command)
  result = system(command)
  raise("error, process exited with status #{$?.exitstatus}") unless result
end

def execute_sql(sql)
  cmd = %{mysql -u#{@mysql_user} -e "#{sql}"}
  cmd += " -p'#{@mysql_password}' " unless @mysql_password.nil?
  run cmd
end

AWS::S3::Base.establish_connection!(:access_key_id => @aws_access_key_id, :secret_access_key => @aws_secret_access_key, :use_ssl => true)

# It doesn't hurt to try to create a bucket that already exists
AWS::S3::Bucket.create(@s3_bucket)

Discussion

To enable binary logging make sure that the MySQL config file (my.cnf) has the following line in it:

log_bin = /var/db/mysql/binlog/mysql-bin

The path (/var/db/mysql/binlog) can be any directory that MySQL can write to, but it needs to match the value of @mysql_bin_log_dir in config.rb.

Note for EC2 users: The root volume ("/") has limited space, it's a good idea to use /mnt for your MySQL data files and logs.

The MySQL user needs to have the "RELOAD" and the "SUPER" privileges, these can be granted with the following SQL commands (which need to be executed as the MySQL root user):

GRANT RELOAD ON *.* TO 'user_name'@'%' IDENTIFIED BY 'password';
GRANT SUPER ON *.* TO 'user_name'@'%' IDENTIFIED BY 'password';

(Replace user_name with the value of @mysql_user in config.rb).

You'll probably want to perform the full backup on a regular schedule, and the incremental backup on a more frequent schedule, but the relative frequency of each will depend on how large your database is, how frequently it's updated, and how important it is to be able to restore quickly. This is because for a large database mysqldump can be slow and can increase the system load noticeably, while rotating the binary log is quick and inexpensive to perform. But if your database changes normally contain many updates (as opposed to just inserts) it can be slower to restore from the binary logs.

To have the backups run automatically you could add something like the following to your crontab file, adjusting the times as necessary:

# Incremental backup every 10 minutes

*/10 * * * *  root  /usr/local/bin/incremental_backup.rb
# Full backup every day at 05:01
1 5 * * *  root  /usr/local/bin/full_backup.rb

Before this can work however, two small details must be taken care of, which have been left as an exercise for the reader:

  1. When the full backup runs it should delete any binary log files that might already exist in the bucket. Otherwise the restore will try to restore them even though they're older than the full backup.
  2. The execution of the scripts should not overlap. If the full backup hasn't finished before the incremental starts (or vice versa) the backup will be in an inconsistent state.
Categories: EC2, MySQL, Ruby, S3, Systems

Share

Like this article? I spend a lot of time writing here, it would only take a second to say thanks by sharing...

Comments

Tell me what you think on Twitter or privately by email.