Getting Started with Amazon Web Services and Fully Automated Resource Provisioning in 15 Minutes

While waiting for a new project, I wanted to learn something useful. And because on many projects we need to assess and test the performance of the application being developed while only rarely there is enough hardware for generating a realistic load, I decided to learn more about provisioning virtual machines on demand in the Cloud, namely Amazon Web Services (AWS). I've learned a lot about the tools available to work with AWS and the automation of the setup of resources (machine instances, security groups, databases etc.) and automatic customization of virtual machine instances in the AWS cloud. I'd like to present a brief introduction into AWS and a succinct overview of the tools and automation options. If you are familiar with AWS/EC2 then you might want to jump over the introduction directly to the automation section.



Why AWS?



Amazon is a leading provider of "infrastructure as a service" and is constantly adding new services to its offerings. AWS allow you to create virtual machines on demand, load balance them, connect them to a "database as a service" (with several advantages over a manually managed database) and various other services such as notifications, e-mail, and a queuing service. You get access to built-in monitoring and you can deploy applications to their "platform as a service" built on top of that while retaining control over those lower-level resources.

Follow the official AWS Blog to be informed of new services, features etc.

Getting started with AWS



To create an AWS account you'll need a phone and a credit card (which will be charged if you use any of the paid services or exceed any of the free usage limits). Be careful during the sign-up process as the UI isn't exactly error-proof. It might take up to two hours before your account becomes fully functional.

The next thing to do is to browse through the AWS Management Console, which let you create and configure various services and resources, the most interesting being the Elastic Compute Cloud (EC2), where you can start new virtual machines. The management console is quite self-explanatory though not as user-friendly as I might wish. You might want to check these screenshots showing how to create an EC2 instance in the Management Console.

Brief Overview



The core rule of AWS is that you only pay for what you use, i.e. runtime hours of your instances and the traffic - see the AWS Simple Monthly Calculator.

The most important resource is EC2 as it allows you to create virtual machines, called "instances." There are different types of instances regarding their memory and computational power. By default they are transient and discarded (terminated) once you stop using them. You may also have an instance backed by the Elastic Block Storage (EBS) which enables you to stop and start the instance again with any state and changes preserved, Amazon charges $0.10/GB/month for that. You can also mount EBS storage as a volume to your instance if you need to persist only some data. There is no quick way to re-create a terminated instance, you have to go through the wizard again - that's where the command-line tools and automation become handy.

When setting up your EC2 instances, you'll likely also need to assign them into the same security group and configure what ports are open to whom in the security group (by default you cannot even SSH in).

If you want to learn all about EC2, go to the Amazon EC2 User Guide.

Aside of EC2 there are also many other interesting services such as the Elastic Beanstalk (PaaS, currently for Java webapps, using Tomcat), the distributed storage S3 etc. There are also some additional services such as the Amazon CloudWatch, which is a (performance) monitoring tool for your AWS infrastructure. (Which can be complemented by New Relic monitoring for even more insight into the application.)

Leveraging the Amazon Free Tier



Amazon provides new customers with a certain amount of resources for free for one year, you only need to pay if you consume more than that. It includes for example a non-stop running micro EC2 instance backed by EBS (i.e. it is persistent, you can stop and start it again), 15 GB traffic, 10 GB of EBS storage, 5 GB of S3 storage, 10 CloudWatch metrics etc. It unfortunately doesn't include the Amazon-managed MySQL/Oracle database (RDS) though they provided 1GB of space in the Amazon SimpleDB (a NoSQL key-value store).

This means that you can have a constantly running EC2 Micro instance (613 MB memory) for free. You can use it as your base in the cloud, for example because the traffic between two EC2 instances is faster/cheaper and because it has full access to machines within the same security group.

The best choice is likely to base your instance on the Amazon Linux AMI, which is a variation of RedHat Linux optimized for AWS, equipped with most of the AWS API command-line tools and CloudInit for automated system setup (described later). I'd recommend you to browse through the user guide that describes what tools are available and how to use CloudInit.

What about Automation?



The AWS Management Console is great if you do something for the first time but the wizards are too time-consuming for repeated tasks. Especially if you need to set up more than a single instance - let's say a RDS database instance, an EC2 machine instance and the corresponding security groups or a number of identical instances. We will look into ways how to automate this.

Aside of setting up the infrastructure, you usually also need to customize the EC2 instances (at least by installing and starting the software you need them for). You can log into them via SSH but wouldn't it be great to be able to automate that, especially if you need multiple similar instances?

Notice that I'm now concerned only with automating the work of the AWS user. It is also possible to configure AWS to automatically start new EC2 instances when needed (e.g. when the load exceeds a limit) but that is a different story.

Overview:

  • Infrastructure provisioning automation:
  • AWS API command-line tools (or AWS Java API or third-party tools/libraries)
  • AWS CloudFormation
  • Instance OS & SW setup automation:
    • Canonical CloudInit (Ubuntu and Amazon Linux AMIs) - perhaps leveraging Puppet or Chef
    • Creating a customized AMI


    Automating Infrastructure Provisioning



    There are two prominent options for creating your EC2 instances and other resources without the AWS Management Console: AWS API command-line tools and AWS CloudFormation.

    AWS API command-line tools



    Amazon offers command-line tools for most of its services such as EC2 and RDS.

    EC2: Robert Sosinovski published very good instructions for starting with Amazon EC2 command-line tools (which isn't specific to Mac OS X despite its title) back in 2008 but they are still valid so just follow them, there is no point in repeating them here (basically download, unpack, set environment variables, provide credentials). Alternatively, you can go to the download page and follow the official instructions. I'd recommend you to create a folder to include all the tools => $AWS_FOLDER/ec2/ etc. instead of ~/.ec2/.

    If you want to use another AWS region than the default us-east-1 then you need to set also the environmental variable EC2_URL, see the list of regional endpoints or the command ec2-describe-regions. Ex. (my URL has ec2 in the middle contrary to the list of endpoints but evidently it works too):

    export EC2_URL=https://eu-west-1.ec2.amazonaws.com


    Authentication setup for the other tools: While the documentation for the EC2 tools describes only authentication via the X.509 certificate (environmental variables EC2_PRIVATE_KEY, EC2_CERT), the other tools (at least RDS, CloudFormation) support uniform authentication via the environmental variable AWS_CREDENTIAL_FILE pointing to a file containing your AWS Access Key Id and Secret Key (which you can find in your AWS account under Security Credentials - Access Keys), the configuration is described in the tools' readme files.

    RDS: Setup of the RDS command-line tools is quite similar to EC2, just download them and add environmental variables as described in the included readme.txt.

    As with EC2, you may want to change your default RDS region:

    export RDS_URL=https://eu-west-1.rds.amazonaws.com


    Examples from the Vaadin Test Setup


    My original plan was to try the performance testing described in Vaadin Scalability Testing with Amazon Web Services, which unfortunately proved to be impossible because the test application failed to run. During the process I've automated the individual setup steps, shown below. You may want to check the blog post to understand the context.

    I didn't need to create a security group and allow access to it via the command-line as I've done it via the Management Console. You could open the SSH port as follows:

    ec2-authorize  -p 22


    Create two EC2 instances:

    ec2-run-instances ami-1a0f3d6e -t m1.large -k VaadinAS --instance-count 2 -z eu-west-1c -g quick-start-1


    • -k specifies the name of an existing key-pair (the Management Console offers you to create it the first time you create an instance) that will be associated with the instance to make ssh login without a password possible
    • -z specifies the availability zone (AZ) within the region (you can see the available ones when creating an instance in the Mgmt Console), it's likely better to have all resources in the same AZ
    • -g specifies an existing security group (again created in the Console); default is "default", I believe


    The ec2-run-instances command also supports the --user-data or --user-data-file attribute to pass setup instructions to CloudInit, as described later on.

    To log into instances you will need their public domain name/IP (printed when the command finishes) and user name, which depends on the AMI used (easiest: right-click on the instance in the Mgmt Console and select "Connect" to get a complete SSH connect command), and the key file (./VaadinAS.pem in my case). Thus I would log into my first instance as follows (provided that I've already opened port 22 in the security group):

    ssh -i VaadinAS.pem ubuntu@ec2-46-137-136-253.eu-west-1.compute.amazonaws.com


    Create a RDS instance using MySQL (it may take few minutes before its startup finishes):

    rds-create-db-instance quicktickets --allocated-storage 5 -c db.m1.large  -e MySQL5.1 -u quicktickets -p V3ryS3cr3t  -z eu-west-1c --backup-retention-period 0 --db-name quicktests


    • quicktickets will be the name of the instance
    • the max. size will be 5 GB (can be changed later)
    • -c - it will be based on the db.m1.large instance
    • -e - the DB type is MySQL, -u username quicktickets, -p password V3ryS3cr3t
    • -z eu-west-1c puts it into the same AZ as the EC2 instances
    • --backup-retention-period 0 - don't keep backups (default: 1 day)
    • --db-name quicktests - needed for connecting to it


    Next I need to make the DB accessible from my EC2 instances (that are in the security group quick-start-1 ):

    rds-authorize-db-security-group-ingress default --ec2-security-group-name quick-start-1 --ec2-security-group-owner-id 


    • You can find your AWS Account ID in your AWS account under the Security Credentials


    To find out the hostname of the instance execute rds-describe-db-instance, which will also tell you whether it is still launching or already running.

    Now you can connect to the DB from an EC2 instance in the security group:

    mysql -h quicktickets.cpokd2djuazy.eu-west-1.rds.amazonaws.com -u quicktickets --password=V3ryS3cr3t quicktickets


    AWS CloudFormation



    CloudFormation is a new (2/2011) free service from Amazon that enables you to describe the resources you want and their dependencies in a text format and to use this "template" to instantiate them ("to create a stack") either via the AWS Management Console or via the CloudFormation Command Line Tools. You can also share you template and use and combine templates created by others. The templates also support the UserData attribute that you can use to pass setup instructions to CloudInit, as described later on. Check out this screenshot-based post about setting up a CF stack via the Management Console.

    An example template file:

    
    {  "AWSTemplateFormatVersion": "2010-09-09",
      "Description" : "One EC2 instance with a security group open for SSH",

    "Parameters": { "KeyName": { "Description" : "Name of an existing EC2 KeyPair to enable SSH access", "Type": "String" }, "InstanceType": { "Default": "m1.large", "Type": "String" } },

    "Resources": {

    "EC2SecurityGroup": { "Properties": { "SecurityGroupIngress": [ { "FromPort": "22", "CidrIp": "0.0.0.0/0", "ToPort": "22", "IpProtocol": "tcp" } ], "GroupDescription": "SSH access" }, "Type": "AWS::EC2::SecurityGroup" },

    "Ec2Instance": { "Properties": { "SecurityGroups": [{"Ref": "EC2SecurityGroup"}], "ImageId": { "Fn::FindInMap": ["AWSRegionArch2AMI", {"Ref": "AWS::Region"}, "64" ] }, "UserData": { "Fn::Base64": { "Fn::Join": ["", [ "#!/bin/bash -v\n", "# you init bash script here...\n" ]]} }, "KeyName": { "Ref": "KeyName" }, "InstanceType": { "Ref": "InstanceType" } }, "Type": "AWS::EC2::Instance" } },

    "Mappings": { "AWSInstanceType2Arch" : { "m1.large" : { "Arch" : "64" }, "m1.xlarge" : { "Arch" : "64" }, ... } },

    "Outputs" : { "InstanceId" : { "Description" : "InstanceId of the newly created EC2 instance", "Value" : { "Ref" : "Ec2Instance" } }, "AZ" : { "Description" : "Availability Zone of the newly created EC2 instance", "Value" : { "Fn::GetAtt" : [ "Ec2Instance", "AvailabilityZone" ] } }, "PublicIP" : { "Description" : "Public IP address of the newly created EC2 instance", "Value" : { "Fn::GetAtt" : [ "Ec2Instance", "PublicIp" ] } } } }


    • 4: As you can see, you can define properties (with defaults) for which values may be supplied when a new stack is being created from the template
    • 16, 31: Next it defines two resources: a security group and an EC2 instance (which uses some mappings because the names of AMIs differ based in region)
    • 38: Setup instructions can be supplied to CloudInit via base64-encoded UserData
    • 56: You can also define what information should be available via the DescribeStacks function (command line: cfn-describe-stacks)


    CloudFormation let you define any resource (EC2 instances, RDS instances, load balancers, security groups, ...), their dependencies, and, via CloudInit, various boot-time actions such as SW installation. The templates are valid JSON documents.

    Example: Using CloudFormation and Cloud-Init to install and start a RoR application (featuring the WaitCondition) - it is not too long and describes the individual sections of the template file. You can also browse through the public template files, for example this: Single EC2 Instance web server with Amazon RDS database instance.

    In June 2011 Amazon launched also CloudFormer, a prototype tool that enables you to create CloudFormation templates from the existing AWS resources in your account.

    If you still desire more information, read the CloudFormation User Guide.

    Customizing an Instance with CloudFormation Metadata and Helper Scripts


    From Bootstrapping Applications via AWS CloudFormation:

    AWS CloudFormation allows you to define the set of packages, files and operating system services through metadata in a template. In addition, it provides helper functions to interpret the metadata and act on it, installing packages, creating files and starting or restarting services on the instance. The AWS CloudFormation scripts build on the basic CloudInit functionality and enable you to create a common, simple CloudInit startup script that is data-driven through metadata. You describe what needs to be installed on the host in metadata and AWS CloudFormation takes care of the how.


    See the document for instructions on how to use the metadata and helper scripts such as cfn-init, which installs packages, downloads and unpacks archives, starts services, and creates files based on data in the metadata section. It also mentiones the integration of CloudFormation and Chef or Puppet, which is described in more detail in the whitepapers Integrating AWS CloudFormation with Opscode Chef and Integrating AWS CloudFormation with Puppet. If you intended to use CloudFormation then you should absolutely read this 22 pages guide.

    (Note: cfn-init supports downloading and unpacking packages, which can be used e.g. to fetch the latest source code of you application provided on-demand by GitHub.)

    You can see an example of usage in this template leveraging metadata & helper scripts.

    Other Alternatives



    • AWS Java API (the command-line tools use it, it's based on web service calls)
    • Third-party tools/libraries, e.g. the Ruby gem Fog.
    • Chef + Knife, Puppet (I believe they provide their own wrappers for the AWS WS calls and leverage CloudInit)


    Automating EC2 Instance OS/SW Setup



    To customize the software inside EC2 instances and its configuration you can either create a customized AMI or use Canonical's CloudInit with AMIs that support it (Amazon Linux and Ubuntu and maybe others). If you use CloudFormation, you have also another possibility based on CloudInit, described in the C.F. section above.

    Canonical CloudInit and Instance User Data



    You can pass whatever text data to a new instance via the User Data attribute (up to 16KB), the data is then available from within the instance at http://169.254.169.254/latest/user-data (you can also access various metadata in a similar way). CloudInit is a Linux utility developed by Canonical, the company behind Ubuntu, that reads these data and processes any instructions embedded in them at boot time (approximately when rc.local runs). For example if it starts with #! then it is run as a shell script under root.

    CloudInit accepts different types of instructions in user data, distinguished by the first line: a script (#!...), cloud config data i.e. packages to install etc. (#cloud-config), URLs of files to process (#include ...), #upstart-job to add something to /etc/init (run on each boot) and more. It can even handle gzip-compressed user data and multi-part data, combining several of the instruction types (see cloud-utils and the command write-mime-multipart).

    The type #cloud-config is quite useful as it is a simpler way to install packages and execute commands than a bash script. It contains YAML-formatted instructions, f.ex. "runcmd" to run a command-line tool, "packages" to install packages via the OS' package manager. Example: Installing Jenkins CI with #cloud-config.

    CloudInit isn't very well documented yet, you may ocassionaly need to read the Python source codes. If something goes wrong, you can check the logs in the instance's /var/log/cloud-init.log.

    Aside of the official documentation you might want to see Xebia's CloudInit introduction presentation and read the section on CloudInit in the Amazon Linux AMI user guide.

    Creating a Customized Amazon Machine Image



    CloudInit installs and configures software at launch time and thus the instance takes longer to become fully available. If that is a problem then you may prefer to create your own customized Amazon Machine Image (AMI) with all the software already installed and configured. It's described e.g. in this brief how to create a new AMI from an existing one (2007) or in the official AMI customization docs and you may also want to have a look at the EC2 AMI command line tools. You'd then create new EC2 instances based on the customized AMI.

    Some Related Stuff



    If your EC2 instances need to communicate and use a technology that requires them to be on the same subnet then you can use Amazon Virtual Private Cloud (VPC; free) and even connect it to your data center via VPN ($0.05/h). This may be necessary for example for running multiple JMeter instances.

    Regarding JMeter, Jörg Kalsbach has created an AMI that simplifies the creation of JMeter master-slaves farms (3/2010): JMeter In The Cloud - A cloud based load testing environment (read the doc). (The trick is that the master instance starts the slave instances and thus knows their IPs. I guess that something similar could be done with CloudFormation, Auto Scale and user data/CloudInit.)

    Summary



    AWS is a dynamically developing platform with continually improving tooling and a growing offer of services. It's very easy to get started with using the web-based Management Console but it becomes soon more convenient to move to a more automated interface, such as the command-line tools or even CloudFormation for whole infrastructure stack setup. The support for customizing instances either by creating custom images or at launch time via CloudInit and/or CloudFormation's metadata and scripts is very good and people have already been combining it with their favorite DevOps tools Chef and Puppet.

    I'd recommend you to start with AWS using the Management Console and then to switch to the command-line tools and CloudInit when you become comfortable with the concepts and usage. If you need to provision multiple resources repeatedly, you should use CloudFormation with its metadata and helper scripts (perhaps leveraging also Puppet/Chef).

    Published originally at blog.iterate.no.


    Tags: DevOps


    Copyright © 2024 Jakub Holý
    Powered by Cryogen
    Theme by KingMob