Sean Collins

sean [at] seanmcollins [dot] com

GPG Key ID: 0xf60f564978913931

sean [at] coreitpro [dot] com

GPG Key ID: 0xA1D7E590

profile for Sean at Stack Overflow, Q&A for professional and enthusiast programmers

High Performance Computation with Amazon EC2

We’ve been doing quite a lot of work in computational chemistry, virtual screening and docking to validate a set of compounds identified by a client as potentially binding to a particular receptor. This type of work is very computationally intensive, and a typical run can consume a multiple CPU cluster for days on end. As this project expanded, we simply ran out of CPU cycles. The option to expand the compute server or buy a cluster was evaluated and rejected. It was hard to justify that expense unless the server was running nearly 24/7, which never happens.

As a result, I prepared and deployed Linux instances on Amazon EC2 with the required software to perform the experiments and created management tools to allow quick provisioning and distribution of jobs. ###Results:

  • Task: Run tests on multiple compounds, sequentially, on one system.
  • Benchmark: A local 2.13ghz quad-core, 4GB RAM rack mounted server running CentOS 5.4.
  • Time to completion: 17.5 hours
  • Amazon EC2: High CPU Instance, 8 cores, 7gb memory, CentOS 5.4
  • Time to completion: 8.75 hours

Running tests on multiple compounds, on multiple systems

  • Background: “Ligand set was 10,000 compounds from Zinc (subset 3) and same receptor. Use EC2 instances for load balancing across the instances and collate results.”
  • Time to completion: 32.8 hours

Summary:

By using Linux images on Amazon EC2, B-Tech consulting was able to conduct experiments while radically reducing the total cost of ownership of IT resources. The first run on EC2 used $7 worth of compute time, while the second run with multiple machines cost less than $40.

B-Tech Consulting Newsletter: Virtual screening of chemical libraries using Amazon EC2 Cloud Computing