Archive for the ‘EC2’ Category.

Movin’ on… (status of EC2 on Rails)

I began a fun project a couple of years ago: EC2 on Rails. It became quite widely used, people contributed some great code, and a small community developed.

I had a great vision for what it should become. But since I was very busy with a start-up (which we successfully sold last August), I struggled to find the time to work on it. I did a lot of work that I never ended up releasing because I couldn’t find the time for testing and fixing the last few small bugs (though it has been in use in production with great success).

The hardest thing to find time for was always documentation and communication of the status, so today I’m taking the time to clarify since I get asked a lot:

I won’t be working on it any more.

But open source is a wonderful thing and anyone who wants to keep using it can fork it and do so.

Thanks to everyone who contributed features and fixes.

I apologize for letting it languish for so long, I had the best intentions to find some more time but now that I have a four-week old baby I know that it’s impossible.

A great success in it’s day

It felt great to be sitting in a session at RailsConf 2008 and hear the presenter recommend EC2 on Rails.

When I first created EC2 on Rails it was the first and only Rails AMI, and in fact it was the first public Ubuntu AMI that I know of (though Eric Hammond went on to create what later became the definitive Ubuntu public AMI and Canonical eventually produced official Ubuntu images).

In spite of the sparse documentation it was simple enough that many people used it either as-is or as a starting point for their own custom setup.

I think there’s still a great need for a simple open-source Rails server image, but now there are at least a couple of options, and the choices for all components of the Rails production stack have improved hugely.

Some of the custom functionality is now available via other projects like Marc-André Cournoyer’s mysql_s3_backup.

I’d do a few things differently

If I had the time to continue working on it I’d make some major changes in the architecture:

  • I’d use Chef to configure the image instead of a build script. This would allow running instances to be upgraded more easily, allow greater customization by the user, and allow the sharing of common customizations.
  • I’d stop using Capistrano for deployment, or at least move all the code that’s inside Capistrano recipes into scripts that exist on the server. (Chef-deploy looks promising but I haven’t had a chance to play with it yet).
  • I would provide better support for elastic clusters (i.e. adding and removing instances from the cluster).

I have a lot of thoughts on how those things would be achieved, feel free to get in touch with me if you are building something similar and want to chat about it.

The new and improved but unreleased version

The unreleased version (available on GitHub) has been substantially rewritten. It is now based on Nginx, and Phusion Passenger, and uses the awesome Varnish proxy for balancing across multiple instances (optionally with HTTP caching). As I mentioned it’s being used in production with great success, but there are still a few minor known issues and probably some untested areas.

Please feel free to fork it and give it new life.

Backing up your MySQL database to S3

Here’s a simple recipe, with complete code available on Github (written in Ruby), to automatically back up your MySQL database to Amazon’s S3 storage service, with regular incremental backup.

This article appears in The S3 Cookbook, an e-book written by Scott Patten. It is is based on the automatic MySQL backup in EC2 on Rails.

Solution

You might expect that you could simply upload the MySQL database files to S3. That could work if all your tables were MyISAM tables (assuming you did LOCK TABLES and FLUSH TABLES to make sure the database files were in a consistent state), but it won’t work for InnoDB tables. A more general approach is to use the mysqldump tool to back up the full contents of the database and then use MySQL’s binary log for incremental backups.

The binary log contains all changes to the database that are made since the last full backup, so to restore the database you first restore the full backup (the output from mysqldump) and then apply the changes from the binary log.

Here are Ruby scripts for doing the full backup, incremental backup, and restore.

The full backup script uses mysqldump to do the initial full backup and uploads it’s output to S3. It assumes the bucket is empty.

full_backup.rb:

#!/usr/bin/env ruby

require "common"

begin
  FileUtils.mkdir_p @temp_dir

  # assumes the bucket's empty
  dump_file = "#{@temp_dir}/dump.sql.gz"

  cmd = "mysqldump --quick --single-transaction --create-options -u#{@mysql_user}  --flush-logs --master-data=2 --delete-master-logs"
  cmd += " -p'#{@mysql_password}'" unless @mysql_password.nil?
  cmd += " #{@mysql_database} | gzip > #{dump_file}"
  run(cmd)

  AWS::S3::S3Object.store(File.basename(dump_file), open(dump_file), @s3_bucket)
ensure
  FileUtils.rm_rf(@temp_dir)
end

Once the full backup has been done, the following script can be run frequently (perhaps every 5 or 10 minutes) to rotate the binary log and upload it to S3. It must be run by a user that has read access to the MySQL binary log (see the Discussion section for details on configuring the MySQL binary log path).

incremental_backup.rb:

#!/usr/bin/env ruby

require "common"

begin
  FileUtils.mkdir_p @temp_dir
  execute_sql "flush logs"
  logs = Dir.glob("#{@mysql_bin_log_dir}/mysql-bin.[0-9]*").sort
  logs_to_archive = logs[0..-2] # all logs except the last
  logs_to_archive.each do |log|
    # The following executes once for each filename in logs_to_archive
    AWS::S3::S3Object.store(File.basename(log), open(log), @s3_bucket)
  end
  execute_sql "purge master logs to '#{File.basename(logs[-1])}'"
ensure
  FileUtils.rm_rf(@temp_dir)
end

The following script restores the full backup (mysqldump output) and the subsequent binary log files. It assumes the database exists and is empty.

restore.rb:

#!/usr/bin/env ruby

require "common"

# Retrieve a single file from S3
def retrieve_file(file)
  key = File.basename(file)
  AWS::S3::S3Object.find(key, @s3_bucket)

  open(file, 'w') do |f|
    AWS::S3::S3Object.stream(key, @s3_bucket) do |chunk|
      f.write chunk
    end
  end
end

# List the files matching filename_prefix in the S3 bucket
def list_keys(filename_prefix)
  AWS::S3::Bucket.objects(@s3_bucket, :prefix => filename_prefix).collect{|obj| obj.key}
end

# Retrieve the files matching filename_prefix in the S3 bucket
def retrieve_files(filename_prefix, local_dir)
  list_keys(filename_prefix).each do |k|
    file = "#{local_dir}/#{File.basename(k)}"
    retrieve_file(file)
  end
end

begin
  FileUtils.mkdir_p @temp_dir

  # download the dump file from S3
  file = "#{@temp_dir}/dump.sql.gz"
  retrieve_file(file)

  # restore the dump file
  cmd = "gunzip -c #{file} | mysql -u#{@mysql_user} "
  cmd += " -p'#{@mysql_password}' " unless @mysql_password.nil?
  cmd += " #{@mysql_database}"
  run cmd

  # download the binary log files
  retrieve_files("mysql-bin.", @temp_dir)
  logs = Dir.glob("#{@temp_dir}/mysql-bin.[0-9]*").sort

  # restore the binary log files
  logs.each do |log|
    # The following will be executed for each binary log file
    cmd = "mysqlbinlog --database=#{@mysql_database} #{log} | mysql -u#{@mysql_user} "
    cmd += " -p'#{@mysql_password}' " unless @mysql_password.nil?
    run cmd
  end
ensure
  FileUtils.rm_rf(@temp_dir)
end

The previous three scripts (full_backup.rb, incremental_backup.rb, and restore.rb) all include config.rb which contains all user-specific configuration and common.rb which defines some common functions:

config.rb:

@mysql_database = "your-mysql-database"
@mysql_user = "root"
@mysql_password = "password"
@s3_bucket = "your-s3-bucket-name"
@aws_access_key_id = "your-aws-access-key"
@aws_secret_access_key = "your-aws-secret-access-key-id"
@mysql_bin_log_dir = "/var/lib/mysql/binlog"
@temp_dir = "/tmp/mysql-backup"

common.rb:

require "config"
require "rubygems"
require "aws/s3"
require "fileutils"

def run(command)
  result = system(command)
  raise("error, process exited with status #{$?.exitstatus}") unless result
end

def execute_sql(sql)
  cmd = %{mysql -u#{@mysql_user} -e "#{sql}"}
  cmd += " -p'#{@mysql_password}' " unless @mysql_password.nil?
  run cmd
end

AWS::S3::Base.establish_connection!(:access_key_id => @aws_access_key_id, :secret_access_key => @aws_secret_access_key, :use_ssl => true)

# It doesn't hurt to try to create a bucket that already exists
AWS::S3::Bucket.create(@s3_bucket)

Discussion

To enable binary logging make sure that the MySQL config file (my.cnf) has the following line in it:

log_bin = /var/db/mysql/binlog/mysql-bin

The path (/var/db/mysql/binlog) can be any directory that MySQL can write to, but it needs to match the value of @mysql_bin_log_dir in config.rb.

Note for EC2 users: The root volume (”/”) has limited space, it’s a good idea to use /mnt for your MySQL data files and logs.

The MySQL user needs to have the “RELOAD” and the “SUPER” privileges, these can be granted with the following SQL commands (which need to be executed as the MySQL root user):

GRANT RELOAD ON *.* TO 'user_name'@'%' IDENTIFIED BY 'password';
GRANT SUPER ON *.* TO 'user_name'@'%' IDENTIFIED BY 'password';

(Replace user_name with the value of @mysql_user in config.rb).

You’ll probably want to perform the full backup on a regular schedule, and the incremental backup on a more frequent schedule, but the relative frequency of each will depend on how large your database is, how frequently it’s updated, and how important it is to be able to restore quickly.  This is because for a large database mysqldump can be slow and can increase the system load noticeably, while rotating the binary log is quick and inexpensive to perform. But if your database changes normally contain many updates (as opposed to just inserts) it can be slower to restore from the binary logs.

To have the backups run automatically you could add something like the following to your crontab file, adjusting the times as necessary:

crontab:

# Incremental backup every 10 minutes
*/10 * * * *  root  /usr/local/bin/incremental_backup.rb
# Full backup every day at 05:01
1 5 * * *  root  /usr/local/bin/full_backup.rb

Before this can work however, two small details must be taken care of, which have been left as an exercise for the reader:

  1. When the full backup runs it should delete any binary log files that might already exist in the bucket. Otherwise the restore will try to restore them even though they’re older than the full backup.
  2. The execution of the scripts should not overlap. If the full backup hasn’t finished before the incremental starts (or vice versa) the backup will be in an inconsistent state.

EC2 on Rails version 0.9.9.1 released

I finally found the time to release a new version of EC2 on Rails. It fixes some bugs, updates some software (Rails 2.2, Rubygems 1.3.1, Ubuntu 8.04.2 LTS), and includes public images for the European EC2 region.

For a full list of changes see the change log.

My next priorities are:

  1. Integrating other people’s changes, especially Adam Greene’s huge changes (support for EBS and much more).
  2. Improving (or you could say fixing!) multi-instance support. It should be as easy to manage your app running on an EC2 cluster as it is on a single instance. I’m now using this myself so I finally have the motivation! :-)
  3. General robustness improvements.
  4. Documentation.

Please report any bugs using the RubyForge bug tracker or by email.

EC2 on Rails version 0.9.9 released

I released version 0.9.9 of EC2 on Rails a few weeks ago and announced it on the mailing list but forgot to mention it here. The main change was switching to version 8.04 (”Hardy”) of Ubuntu. See the change log for full details.

EC2 on Rails version 0.9.8 available

UPDATE: it’s 0.9.8.1 now, there was a small update to the RubyGem. The new gem uses the same AMI’s.

EC2 on Rails version 0.9.8 is now available (or will be in a few hours when the RubyForge servers are synced). This is a recommended update for everyone.

It includes some major new features:

  • monit monitoring daemon: monitors mysqld, apache, memcached, mongrels, system load and free drive space
  • incremental MySQL backup (important for large databases)
  • Apache SSL support
  • a local Postfix SMTP server enabled by default

And most importantly this fixes the problem with broken Ubuntu package updates which was caused by a missing repository in the list of repositories.

As I mentioned yesterday, the base image is now built using Eric Hammond’s EC2 Ubuntu script.

Also, there are major new features such as incremental MySQL backup (important for large databases), Apache SSL support, and a local Postfix SMTP server enabled by default.

My priorities now are:

  1. Release an update based on Ubuntu 8.04 Hardy (this version is still using Ubuntu 7.10 Gutsy because I wanted to provide a reliable update as quickly as possible due to bug #20040. But now that the base image is built with Eric Hammond’s script it should be easy to update to Hardy.)
  2. Create complete documentation.
  3. Release a 100% bug-free version 1.0 with the current feature-set. Please help by reporting any bugs you find, either using the RubyForge bug tracker or by email.