Archive for the ‘Software Development’ Category.

The S3 Cookbook

Did you know that you can enable access logging on S3? Did you know that you can add arbitrary metadata to objects in S3? Did you know that you can serve compressed content from S3?

The S3 Cookbook, an e-book written by Scott Patten, has easy-to-follow recipes to do those and about 60 other things (including one that I contributed for backing up a MySQL database to S3). In addition to the recipes, it also has chapters on S3’s architecture, authenticating S3 requests, and an overview of the S3 API.

You can checkout the full table of contents and download a sample chapter.

It’s published on Sopobo, a new platform for authors to self-publish technical books (that also happens to be created by Scott!). Sopobo includes tools for readers to interact with each other and with the author. If I were writing a book that I planned to shop around to a few publishers I’d seriously consider putting it on there first to get some feedback (and make a few bucks) before it got picked up.

How to convert from Subversion to Git

When I was moving EC2 on Rails to Git I found several posts that explained how to convert a repo from svn to Git. But none of them included converting your svn tags to Git tags, so here’s yet another how-to guide. (Git experts please comment if I’m doing anything dumb.)

1. Install Git

First, you’ll need Git installed with git-svn included (git-svn will actually allow you to push changes back to the original Subversion repository, but for our purposes we’re assuming that this is a one-time conversion).

If you’re using OS X you should already be using MacPorts, so just do:

prompt> sudo port install git-core +svn

Or, on Ubuntu or Debian Linux:

prompt> sudo apt-get install git-svn

2. Create the authors file

Next, create a text file that maps Subversion committers to Git authors so the names and email addresses will be correct in the history. Save it as authors.txt:

pdowman = Paul Dowman <paul@hellospambot.com>
svnuser2 = Another User <anotheruser@whatever.com>

3. Clone the repository

Now run the command that will import your svn repo into a local Git repo. I’m assuming your svn repo had the standard layout of /trunk, /tags and /branches.

prompt> git svn clone <svn repo url> --no-metadata -A authors.txt -t tags -b branches -T trunk <destination dir name>

Now running git log should show all your commit history with the correct authors.

4. Convert branches to tags

There’s one more thing. All your tags are now remote branches, not tags, in your Git repo. So you’ll need to convert them manually (or write a script to do it if you have a lot, I’ll leave that as an exercise for the reader). For each Subversion tag (i.e. Git remote branch) you’ll add it as a Git tag, then delete the remote branch. List them with:

promp> git branch -r

Then for each tag listed do:

prompt> git tag tagname tags/tagname
prompt> git branch -r -d tags/tagname

You now have a local Git repository with all your history and tags. If you don’t need to share it with anyone else then you’re done.

5. Push to a public repo (optional)

If you want to publish to a public repository (for example Github), you’ll need to add it as a remote repo and then push to it.

prompt> git remote add origin git@github.com:userid/project.git
prompt> git push origin master --tags

You next stop should probably be the Git tutorial for Subversion users. Enjoy!

Got Git

I’ve moved EC2 on Rails from Subversion to Git. It’s hosted on Github at pauldowman/ec2onrails.

Most Ruby developers are familiar with Git by now. (If you’re not: it’s a distributed version control system that was created in 2005 by Linus Torvalds for Linux kernel development). Within the last year almost every Ruby-based open-source project has switched to Git (including Rails itself). And in fact they’re almost all hosted on Github!

At first I found it funny that even though they were moving to a distributed version control system, everyone decided to keep their repositories in the exact same place. Being distributed, a Git repository doesn’t need to be hosted unless you’re sharing it. You can publish it easily on any web server, and RubyForge (which has always been the most popular place to host Ruby projects) supports git.

But after playing with it a bit I can see why everyone is choosing Github. For one thing they got the project hosting part right, with a simple clean UI and cool features like an API and hooks for all kinds of services.

But the really cool thing about Github is that it provides a social environment. You can watch projects as you’d expect, but you can also follow people, send them messages, and easily send them pull requests (to integrate changes you’ve made). It’s great for discovering interesting and new projects: just follow friends and people whose work you like to see what they’re watching, creating and forking.

Now that EC2 on Rails is on Github it’s more likely that other people will want to build it themselves so I’ll try to make that easier with a one-step script at the root of the project. Feel free to fork it, implement changes and send me patches or pull requests.

And find me on Github!

A rock-solid setup for sending SMTP mail from your EC2 web server

(None of this is EC2-centric, but it’s particularly needed on EC2.)

A frequent topic of discussion on the EC2 forums is how to send email reliably, efficiently, and especially without it being marked as spam. I found that even with a valid SPF record most mail sent from an EC2 instance was marked as spam or silently discarded.

This is probably partly because of the lack of matching reverse DNS records. But spam filters can be a bit arbitrary and the easiest way is to relay outgoing mail through a good smtp provider. (I don’t recommend relaying outbound mail through Google Apps, they supposedly have a 500 messages/day limit according to many people on their forums, although I couldn’t find that published anywhere. UPDATE: The info is here, thanks John Ward.)

I have tried a couple of SMTP providers, and I recommend AuthSMTP. They are reliable, have good service, and our mail that’s delivered through them almost never gets marked as spam. Also, they have monthly quotas rather than daily, so you have a chance to increase it before you hit the limit.

Rather than deliver directly to the AuthSMTP mail server from your web app it’s a good idea to deliver to a local queueing mail server, which will forward via the AuthSMTP gateway. Your web app will deliver mail to localhost (or perhaps a dedicated instance if you prefer), port 25.

This has several advantages:

  1. Your web server can finish the request more quickly.
  2. There’s less chance that the mail server will be unavailable. At least the mail will be queued locally until the remote server becomes available again. AuthSMTP has proven to be quite reliable, but it has been unavailable on a couple of occasions.
  3. AuthSMTP limits the number of concurrent connections that you can make. You can easily configure your local mail server to limit the number of outgoing connections to the gateway.

Configuration

I recommend using Postfix, it’s fast, reliable and most importantly, easy to configure. Your Linux distribution will definitely have a Postfix package available (it comes pre-installed on EC2 on Rails). On Debian or Ubuntu install with:

sudo aptitude install postfix

Here’s the config file, /etc/postfix/main.cf:

myhostname = www.YOURDOMAIN.com
mydomain = YOURDOMAIN.com
myorigin = $mydomain

smtpd_banner = $myhostname ESMTP $mail_name
biff = no
append_dot_mydomain = no

alias_maps = hash:/etc/aliases
alias_database = hash:/etc/aliases
mydestination = localdomain, localhost, localhost.localdomain, localhost
mynetworks = 127.0.0.0/8
mailbox_size_limit = 0
recipient_delimiter = +

# SECURITY NOTE: Listening on all interfaces. Make sure your firewall is
# configured correctly
inet_interfaces = all

relayhost = [mail.authsmtp.com]
smtp_connection_cache_destinations = mail.authsmtp.com
smtp_sasl_auth_enable = yes
smtp_sasl_password_maps = static:YOUR_AUTHSMPT_USER_ID:YOUR_AUTHSMTP_PW
smtp_sasl_security_options = noanonymous

default_destination_concurrency_limit = 4

soft_bounce = yes

How simple is that?! Have you ever seen a sendmail config file?

soft_bounce is important because it means that postfix will queue the messages if they’re bounced by the remote gateway for any reason (this is only if it’s bounced by the gateway, not if it’s bounced by the destination server). This would usually be caused by some configuration problem like an authentication failure. If the message is bounced by the eventual destination server (e.g. the mailbox doesn’t exist or is full), or if the destination server can’t be contacted, your local server won’t know about it because the message has already been accepted by the gateway. (It’s probably a good idea to keep track of bounced messages returned by the eventual destination server, see “Don’t spoof the From field” below.)

default_destination_concurrency_limit is so you stay within AuthSMTP’s concurrent connection limit. If you have Postfix running on multiple instances you’ll need to adjust this accordingly.

To see mail that’s stuck in the queue:

mailq

Postfix will automatically try to resend it, but you can force it to be sent immediately using:

sudo postqueue -f

Monitoring

Of course you need to know if anything goes wrong with the mail delivery and it won’t be in your web app’s log. I use scripts in /etc/cron.hourly to check logs and mail me the output if there are errors. But when it comes you mail delivery failure you might have a bit of a chicken-and-egg problem: you can’t use postfix to send the mail if postfix is having problems. Here’s a simple ruby script to send emergency mail via a different mail server. It’s configured to use Google Apps (you’ll need to create a new account to send the mail from), if you don’t use Google Apps you can easily change this to use a different mail server.

Save this as /usr/local/bin/emergency_mail_sender:

#!/usr/bin/env ruby

# This is a simple script to send mail via an alternate server when there are
# errors with the normal queueing mail sender
# The subject is the first command-line arg and the body is received on stdin

#################################

from_address             = "admin_mail_sender@YOURDOMAIN.com"
to_address               = "admin@YOURDOMAIN.com"
smtp_server              = "smtp.gmail.com"
smtp_port                = 587
smtp_mail_from_domain    = "YOURDOMAIN.com"
smtp_account_name        = "admin_mail_sender@YOURDOMAIN.com"
smtp_password            = "YOUR_PASSWORD"
smtp_authentication_type = :plain
debug                    = false

#################################

subject = ARGV[0]
body = $stdin.read

require 'rubygems'
require 'net/smtp'
require 'tlsmail'

exit if body.nil? || body == ""

msgstr = <<END_OF_MESSAGE
Subject: #{subject}

#{body}
END_OF_MESSAGE

Net::SMTP.enable_tls(OpenSSL::SSL::VERIFY_NONE)
  smtp = Net::SMTP.new(smtp_server, smtp_port)
  smtp.set_debug_output $stderr if debug
  smtp.start(smtp_mail_from_domain, smtp_account_name, smtp_password, smtp_authentication_type) do |s|
    s.send_message msgstr, from_address, to_address
end

Here’s a script that can be run by cron every hour to check for mail delivery problems, it uses the emergency_mail_sender script to notify you of the problem. It works on Ubuntu (but it needs the logtail package installed), it might not work on other systems. Save this as /etc/cron.hourly/check_mail_logs

#!/bin/sh
hostname=`hostname -s`
mailer=/usr/local/bin/emergency_mail_sender
/usr/sbin/logtail -f/var/log/mail.warn | $mailer "$hostname: mail warnings"
/usr/sbin/logtail -f/var/log/mail.err | $mailer "$hostname: mail errors"
/usr/sbin/logtail -f/var/log/syslog | grep 'status=' | egrep -v 'status=sent' | $mailer "$hostname: undelivered mail"

SPF

Here’s your SPF record:

v=spf1 include:authsmtp.com include:aspmx.googlemail.com ~all

If you’re not using Google Apps to send mail for your domain remove include:aspmx.googlemail.com. If you want to create your own SPF record there’s a good SPF record generator at spfwizard.com.

Don’t spoof the From field

You should only send mail from somebody@yourdomain.com. If you try to send mail from somebody@pauldowman.com, for example, the receiver will see that pauldowman.com has an SPF record, and that it doesn’t authorize your mail server. Then into the spam folder you go.

To get around this you can send from something like noreply@yourdomain.com, and set the Reply-To header to somebody@pauldowman.com. You can even set the name in the from field, for example: “Paul Dowman via yoursite” <noreply@pauldowman.com>. The Reply-To header will make sure that most people’s replies go to the correct address, but a few will inevitably end up at noreply@yourdomain.com so it’s probably a good idea to set up an autoresponder at that address, or at least make sure the message bounces so the user eventually realizes the mistake.

EC2 on Rails now with multiple instance support, Ubuntu 7.10, 64-bit version, Capistrano tasks

I’ve been working hard on EC2 on Rails, version 0.9.5 is now available. Since my last post here there have been some major changes:

Capistrano tasks

There is now a rubygem available that provides Capistrano tasks to manage the instance. There are tasks to set the server’s timezone, install packages and rubygems, backup, restore, create and delete the database, set the MySQL root password, and more. To use these in your Rails project type:

> sudo gem install ec2onrails

Put Capfile in the root of your rails folder, and put deploy.rb in the config folder.

Then, from the root of your project type:

> cap ec2onrails:setup

This automatically sets your server’s timezone, installs any custom rubygems and Ubuntu packages, and creates your database for you. You can now deploy your rails app as you normally would:

> cap deploy:migrations

Another useful task for testing is:

> cap ec2onrails:restore_db_and_deploy

This recreates the database, restores data from an S3 bucket (specified in your deploy.rb), and deploys the app. I use this to prepare a staging server with the current production data and current production version of the app. After running this task I have an exact copy of my production server. I then deploy the latest version to this server before deploying it to production. This is a good way to be really sure your production deployment won’t fail (especially your migrations).

To see a list of all available Capistrano tasks:

> cap -T

New Ubuntu version

It’s now built with Ubuntu 7.10 “Gutsy”.

Support for new instance types

There are both i386 and x86_64 versions available to support the new EC2 instance types. So you can now use large and extra-large instances.

Multiple instances

The earlier versions only worked if your rails app was running on a single server. That was lame! Now you can have multiple instances using any combination of these roles: web server, app server, primary database. I’m working on adding a MySQL slave role and eventually a Memcache role.

For full instructions and details see the project web site.