Sciene and Technology Innovation Breaking News, Blogs and Articles

Science and Technology Innovation Journal

Subscribe to Science and Technology Innovation Journal: eMailAlertsEmail Alerts newslettersWeekly Newsletters
Get Science and Technology Innovation Journal: homepageHomepage mobileMobile rssRSS facebookFacebook twitterTwitter linkedinLinkedIn

Science and Technology Innovation Authors: Salvatore Genovese, Jason Stowe, Bryan O'Rourke, Dmitriy Stepanov, John Savageau

Related Topics: Cloud Computing, Virtualization Magazine, Cloud Data Analytics, Science and Technology Innovation Journal, Cloud Application Management


Specialized HPC Clusters in the Cloud

A new frontier for life sciences and beyond

There are hundreds of life science labs in the U.S. using next-generation sequencing, bioinformatics, proteomics, and molecular modeling to identify the genes behind, and potential drug targets to cure, many diseases including diabetes, cancer and Alzheimer's disease.

With increasing data coming off of modern scientific instruments, the demand for compute power to analyze the data is increasing dramatically. Currently, life science researchers in bioinformatics, next-generation sequencing, and molecular modeling need to spend tens to hundreds of thousands of dollars to buy server clusters to run their scientific calculations.

High performance computing (HPC) has come a long way for life sciences. Twenty years ago, expensive parallel supercomputers were required to render proteins in three dimensions and run software that helped researchers understand their shapes. Now 3D rendering can be done on graphics cards in workstations, laptops and even phones.

It is important to note that there are two types of HPC. There's the sprinter type, where users try to run a highly parallel application, and then there's the marathon runner type of HPC, in which applications are pleasantly parallel. For sprinter applications, latency is of key importance and performance must be optimized at every level to get results. Currently these applications are best run on a single multi-core server in the cloud; however, infrastructure from various providers may make this use case able to run on many servers. For the marathon applications, also called high throughput computing, many commodity servers can run jobs faster by taking advantage of the parallel nature of the work.

In either of these applications, compute clusters using many commodity servers have replaced expensive parallel supercomputers, but the data and problems being solved have grown to demand increased compute capacity. This leaves companies with large capital investments in fixed-size clusters that have all the traditional challenges of maximizing utilization, minimizing operational costs and shortening time-to-result for users.

Rise of Cloud HPC Clusters-as-a-Service
Cloud computing promises to help solve these issues. It makes provisioning servers easier and cost efficient. The cloud delivers virtualized servers and storage via the Internet, at a large scale where you're billed only for what you use. However, getting started working in the cloud is not easy. For example, Amazon EC2 requires programming to provision nodes and administrators and security staff to manage the servers through its Application Programming Interfaces (APIs).

This provisioning challenge has led to cloud HPC clusters, built upon infrastructure providers like Amazon EC2. Instead of building out a datacenter, procuring servers, network equipment, racks and hiring IT personnel, companies can tap into these compute clusters as a service, which are provisioned automatically.

Cloud HPC cluster users can start up clusters without having to worry about putting in place various applications, operating systems, security, encryption and other software. Scientists can create clusters that automatically add servers when work is added and turn the servers off when the work is completed. This enables life science researchers to run calculations only when they need compute power.

HPC Before Cloud Computing
To understand these costs, let's look at HPC clusters before cloud computing. Before cloud computing, when buying a cluster, end users would size it to be able to complete their largest set of calculations in a desired time. For example, if a 20,000 hour calculation for a quarterly process needed to finish in a day, the cluster might be sized to 1,000 cores. On the storage side, the same sizing would occur to ensure that enough space existed to hold the working data and final results of the calculations.

Purchasing a pre-cloud cluster required large up-front capital expenditures for the machines required to do calculations and storage for the results, as well as lengthy procurement and provisioning processes. In addition, IT staff is required to maintain the cluster, and ensure that its operating systems and applications are up to date. When a cluster is operational for the first time, it isn't full to capacity as it is provisioned and researchers only have a fixed-sized cluster to do their calculations.

After the cluster is in production, researchers have a fixed number of cores to run their research. If they have a 4,000 compute-hour calculation to run on a 40-core cluster, it will always take 100 hours at best to get the result.

Once these clusters are purchased, they are typically only used about 30 percent of the time. For example, they could run during the day or when an instrument produces data. The larger the cluster, the faster the calculations run, but the more money and manpower are wasted when the cluster is 70 percent idle. Renting servers from the cloud could solve these problems, but requires programming, needs IT experience to maintain, and comes with severe security concerns.

Increasing Data, Computation and Time-to-Results
Modern scientific instruments, like mass spectrometers for proteomics or next-generation genomic sequencers, are compounding this problem. They require large quantities of compute power. The data generated when sequencing a human genome is more than one trillion bytes, or a terabyte, in size.

This scale of data puts the project in a unique place: it is large enough to be unwieldy to analyze for the labs that generate it, but small enough that with some data scheduling it can be moved over the Internet. As this data comes off an instrument, it needs to be processed using differing numbers of computers and return results as quickly as possible.

This bursty availability of data by instruments poses a problem for traditional, fixed clusters that cannot grow or shrink to efficiently run the calculations. It also increases costs. A traditional, in-house compute cluster with 30-percent utilization costs three times the amount of money to run per calculation consumed as a fully utilized cluster. However, fully utilized clusters are up to 10 times slower to complete the calculations because the cluster is not large enough to run the calculations as fast as possible. For drug discovery processes, clinical trial design or bioinformatics, this 10-times slower time to result translates to slower time-to-market, which also costs money.

Changing the Math for Compute and Storage Costs
For compute clusters as a service, the math is different: having 40 processors work for 100 hours costs the same as having 1,000 processors run for 4 hours. Yet with 1,000 processors, the results of most life science calculations would come back the same day, rather than four days later.  This kind of disruptive decrease in time to result can lead to shorter times to develop products, discover drugs or isolate important genes in a genome.  The results also come back tens of times quicker at no additional cost.

This key shift in high-performance calculations also applies to storage. A hard disk capable of storing a terabyte can be bought for $150 at a local office store. However, filers with redundancy, de-duplication and hundreds of terabytes of storage can cost $12,000 or more per terabyte. Traditional filers cost 10 times more per terabyte for large capacities and reliability than the cost of hard drives bought off the shelf. In the cloud, all storage is redundant and highly available. The cost per terabyte goes down at large scales.

Improving Time-to-Market
These advantages create great incentives that improve time-to market and reduce costs. As an example, Varian Inc. is a producer of scientific instruments and ion traps. Researchers at the company run calculation-intensive Monte Carlo simulations to help develop better future products. In one instance, a simulation for a mass spectrometer was scheduled to take several thousand compute hours and nearly six calendar weeks on an internal pool of processors. With product design and conference deadlines looming it needed to get results faster.

Rather than purchasing a traditional cluster, Varian Inc. was able to run this calculation using a cloud HPC cluster service on Amazon EC2 that helps companies run calculations easily and securely. The elastic cluster added nodes to run its calculations, and stopped the servers when there was no more work was left to compute. Utilizing a service to automate provisioning, security, encryption, administration and support made using the cluster cost-effective and easy to use. With the cloud HPC cluster, this six-week calculation ran in less than one day.

Applications for Life Sciences
For researchers in life sciences, including bioinformatics, proteomics and computational chemistry, these clusters can support all the applications that users expect on internal clusters with minimum effort required for installation. Both open source and proprietary software applications can be run in the cloud. Domain scientists can then have access to a full range of pre-installed domain appropriate tools. Pipelines for standard applications like Gromacs, Bowtie, Velvet, OMSSA, Tandem, HMMER, and BLAST, an algorithm for comparing primary biological sequence information such as the amino-acid sequences of different proteins or the nucleotides of DNA sequences, are some examples of applications that can be run in the cloud. These applications must generally be tuned to work optimally in the cloud, which has a more flexible architecture than fixed internal clusters.

As an example, Schrödinger, a leading supplier of molecular-simulation and computational-chemistry software to the pharmaceutical industry, made its Glide docking program available on the cloud. Glide is used for virtual screening, a process that determines potential drug candidates from a large database of compounds based upon their fit with a given target site.

Shortening Product Pipelines: 1.5 Years of Drug Target Screens in 1.5 Days
Molecular modeling and simulation are central to drug discovery, although they are often rate-limiting. Local computational resources are often insufficient to perform the massive burst-mode computations needed to bring a drug project forward in a timely fashion.

Recently Schrödinger decided to show how on-demand availability of large, secure and trouble-free cloud computational resources can fill this gap. As test data, it screened 1.8 million candidate compounds against a target site to find potential matches. Using a 600-processor cloud HPC cluster, 18 months worth of screening was completed in 36 hours.

Other Benefits: Audit, Disaster Recovery and Security
Cloud HPC clusters enable scientists to only consume calculations when they need them and pay for what they use. The scalability of the cloud allows them to size HPC clusters to their jobs to minimize the time to result. Storage in the cloud can get cheaper with economies of scale, not more expensive like traditional filers.

Another benefit is that cloud HPC clusters are virtualized. This means that it is possible to provision repeatable clusters that have standardized images for qualification purposes and reliable application environments every time they are provisioned. Disaster recovery scenarios are easier to manage because the entire cluster environment is repeatable through virtualization.

Cloud HPC clusters are the same every time they are provisioned from their virtual machine images. Security can be handled in a consistent way, with guaranteed encryption and encryption-key management for sensitive data and applications at rest on disk or over the network between cluster servers. As an example, hard disks containing user data can be encrypted using the Advanced Encryption Standard (AES) 128-bit, 192-bit or 256-bit encryption. Data communicated to the cluster via Web services use the same SSL encryption that protects credit-card information for holiday purchases.

Is the Future of HPC Cloudy?
HPC's future is clear: it will be in the clouds. Calculations play an important role in helping researchers efficiently design better drugs, run more efficient clinical trials and develop better crops.

Increasing data from scientific instruments is requiring analysis for the large, transferrable data being generated. There is increased pressure to lower costs and speed up product development timelines for crops, clinical trials and cures for diseases. These factors have led to the creation of cloud HPC clusters that have helped pharmaceutical companies perform calculations that lead to better scientific results.

More Stories By Jason Stowe

Jason Stowe is a seasoned entrepreneur, and the founder and CEO of Cycle Computing, the leader in Condor Grid and Cloud Computing Solutions. In 2005, Jason started Cycle, an employee-owned company, to help clients easily use open-source Condor to provide more innovative grid functionality and reduce costs. Not having investors lets Jason and the Cycle-team focus on customers' needs and execution, rather than hype.

Starting with three initial Fortune 100 clients in Insurance, Financial Services, and Defense, Cycle has grown to deploy production grids at Fortune 500s, SMBs, government research, and academic institutions alike, for a wide variety of industries and applications.

For over a year, our CycleCloud service has provided the same production-quality grids on demand in the cloud, and is used for computations in bioinformatics, statistics, product and hardware simulation, and financial risk analysis, among others.

Jason attended Carnegie Mellon and Cornell Universities, and volunteered/guest lectured for the Entrepreneurship program at Cornell's Johnson Business School.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.