Working with R and Bioconductor on the cloud (Amazon EC2)

Some Bioconductor-based projects may be computationally challenging and require a lot of resources. If a powerful workstation is not available, it may be a good idea to work with R and Bioconductor at scale using Amazon Web Services (AWS). Setting up an instance of R and Bioconductor on the cloud is easy and straightforward. Indeed, Software Engineers at Bioconductor.org developed (and keep maintaining) an Amazon Machine Image (AMI) that is optimized for running Bioconductor in the Amazon Elastic Compute Cloud (or EC2). A good overview of how to use this AMI is available at the following URL: https://www.bioconductor.org/help/bioconductor-cloud-ami/

This post only summarizes the key steps required to quickly start an R instance. For more details, please refer to the bioconductor.org website.

  • Sign-in to AWS at the following URL: http://aws.amazon.com/
  • Click on EC2 and then click on the key pairs button. Make sure you have a key pair (it means you also have the corresponding pem file on your client machine) or create a new key (just follow the instructions and save the resulting pem file on your local machine)
  • Click on this link (AMI with SSH) and make sure that the option “Specify an Amazon S3 template URL” is selected. Click on the “Next” button
  • In the next page, you need to define the following fields:Screenshot from 2016-03-29 18:58:43
    • StackName: select a name for the Bioconductor stack. This name has to be unique
    • BiocVersion: it is usually a good idea to select the recommended version, unless a specific Bioconductor version is required
    • Instance Type: this defines the amount of resources you will have access to on the EC2 platform. Check the following URL for a description of costs and resources for each instance type: https://aws.amazon.com/ec2/pricing/. For computation-demanding jobs, C3 and C4 instances are usually recommended. If a lot of RAM memory is required, an R3-type instance may be a better fit.
    • KeyName: specify the name of the key that will be used for ssh access. Use the same Key Name you specified before (the name has to match the name as reported in the AWS concole, https://console.aws.amazon.com/)
  • Proceed with the creation of the instance. Please, note that by clicking the “Create” button, Amazon will start billing your account.
  • Monitor the new instance in your EC2 console (https://console.aws.amazon.com/ec2). Once the stack is running, copy the public IP address of your instance and paste it in a new tab of your Internet browser. Login to RStudio using the credentials of the Bioconductor AMI: username = ubuntu password = bioc
  • R comes with Bioconductor and a rich selection of packages. However, you may want to install other packages before starting with the analysis. In particular, parallelizzation packages are not default installed in the system. Parallelization will allow you to make use of the full computational power of the EC2 instance. For example, in the R console run the following:

install.packages(c(“devtools”, “parallel”, “doParallel”, “foreach”))

  • For moving data from and to the EC2 server, you may use ssh.
  • To login to the remote server, type in a terminal shell the following command. Of course, change the path/name of the key-pair (pem) file and the name of the Amazon Server (check your AWS console to retrieve the correct URL)

ssh -i RStudio_instance.pem ubuntu@ec2-52-207-229-205.compute-1.amazonaws.com.

  • To force the termination of your instance, login to the server via ssh (as shown above) and then type:

sudo halt

      To copy data from the EC2 server to your local machine, you can use the

scp

    command from a terminal shell (on your local machine):

scp -i RStudio_instance.pem ubuntu@ec2-52-207-229-205.compute-1.amazonaws.com:~/batch_to_move/* dest/

    That’s it. Success! You are good to go. Just remember of terminating your instance at the end of the analysis (otherwise, Amazon will keep billing your account)

About Author

Damiano
Postdoc Research Fellow at Northwestern University (Chicago)

Leave a Comment

Your email address will not be published. Required fields are marked *