[nycbug-talk] Amazon Web Services replacing tradition?

Steven Kreuzer skreuzer at exit2shell.com
Mon May 5 17:25:54 EDT 2008

On Fri, May 02, 2008 at 12:37:07PM -0400, Matt Juszczak wrote:
> I feel like this is mostly on-topic because it deals with the underlying 
> hardware that the operating system we use on a day to day basis 
> would/could run on.
> I'm noticing more and more organizations, start-ups especially, switching 
> their infrastructure over to virtualization, such as Amazon EC2/S3.
> While I think these sorts of setups (launching/removing instances on the 
> fly, scripting the launch of new infrastructure, etc.) have their purpose, 
> I don't think they are a catch-all for every sort of possible setup.
> For instance, I can't see how the slow disk I/O of s3 and the lack of 
> ability to specify physical location of servers (on a topology level) 
> could be good for database and database replication?
> So, are these things the wave of the future, and will dedicated/co-located 
> data center setups fade, or will these types of solutions find their niche 
> but expand no further?
> Just wondering what everyone's thoughts are.

I think AWS is a nice crutch which allows you to not have to worry too much
about the underlying infrastructure and focus on building your
application and community around your application. As it starts to gain
traction you can quickly and easily provision additional virtual
machines to help meet the demand. What is also really nice is that if
the demand increases for a short period of time and then decreases (i.e
getting slashodtted or dugg) you can simply spin up new
instances and then take them down once the traffic subsides. All that
really happens is that your bill from Amazon will be slightly larger
then the previous month, but you will have managed to keep your site up
and online, which I figure is more important.

As for the physical location of your database servers, that really
should not make a difference. For the most part, your database will
always be the bottleneck regardless as to how close you put it to your
web server. Adding 50ms to the amount of time it will take to open up a
connection to the database is nothing in comparison to how long it takes
to execute a query, parse the results, build the page and then send it
back to the end user. If you have to keep going back to the database to
render every page, your site is not going to scale regardless. Your best
course of action is to prefectch as much as possible, and cache as much
as humanly possible and put those caches as close to your clients. If
you can tailer your site to use prefetched content, you can deploy an EC2
machine to run a batch job for a few hours each for less then a buck,
which is alot more cost effective then purchasing a server for $3,000
and having to pay for electricity and cooling for something that is
going to be idle most of the time.

For caching, look into deploy something like memcached, varnish and
nginx early one, as opposed to waiting until you hit capacity issues and
scramble to find that silver bullet that makes all your problems

Personally, I find S3 to be the most interesting offering from Amazon
mainly because from my initial research, it seems like it may be cheaper
to serve static content (images, css) through them then to do it on your
own especially if you happen to have a site the pushes alot of static
content, such as a photo sharing community.

A bonus side effect is that by using S3, you are splitting components
across domains which allows you to maximize the browsers ability to
perform parallel downloads. That alone can help you achieve quicker page
loads. In addition, S3 will also act as a cookie-free domain for your
content. When the browser makes a request for a static content and sends
cookies together with the request, the server doesn't have any use for
those cookies. This increases network traffic for no good reason.
Another benefit of hosting static content on a cookie-free domain is
that some proxies might refuse to cache the components that are
requested with cookies.

The thing that to me seems like a huge issue is that they don't offer an
SLA with their services. In the past, they had had uptime in excess of 4
hours. They can decide to take it offline for days at a time and you
really have no recourse.

However, since they are just using Xen, it allows you to deploy an image
of your choice, configured to your needs. If for whatever reason you
need to migrate off EC2, the process would be fairly painless. (Unlike
AppEngine, which is basically Google somehow making vendor lock in cool)

I think EC2 and S3, when deployed in a thought out manner, can end up
being extremely beneficial and cost effective method to meet the
capacity demands of your site.

Steven Kreuzer

More information about the talk mailing list