Clinical Research Firm Achieves Secure, Elastic, High Performance Computing AWS Environment with Flux7

Profile

This firm is a privately-held clinical research organization that provides research services across the entire drug development process to its clients. Its trusted research is used by leading companies within the pharmaceutical, biotechnology, and medical device industries as well as by government and academic organizations.

Challenge

This research firm approached the AWS experts at Flux7 as it had two specific – yet different – goals it was hoping to achieve through a new AWS infrastructure. First, the company wanted to update the system its internal team of research scientists used for data analysis as the team’s large data-related demands had outgrown its on-premise system. Second, the company had bid on a federal government project that should they win it, would require cloud capabilities, such as a secure, elastic, high performance computing environment. While the federal government project could be announced at any time, the firm and Flux7 had a looming deadline to meet.

In addition, to meet its pharmaceutical and government clients’ compliance needs, this research firm needed its new environment to be FISMA and NIST 800-53 compliant and have security best practices built-in. For both its internal and external audiences, the firm required that the new environment offer scalability and high availability that will meet both audience’s big data needs. The environment also needed to support two technologies the company uses regularly: Galaxy, a solution for building distributed apps for fine-tune control over data placement, and RStudio’s data analysis software.

Solution

The Flux7 DevOps team worked closely with this clinical research company to fast-track the new, compliant AWS environment running Galaxy and RStudio. The teams started by using the Flux7 Enterprise DevOps Framework as a foundation to create a high performance computing environment for this company’s research scientists, and an almost identical environment for a federal agency to explore data from real studies.

To start, they created a service-agnostic landing zone where the firm’s services will deploy. To build the landing zone, the teams created an AWS CloudFormation template that defines VPC, Subnets, NAT Gateways, Internet Gateways, Security Groups, and NACLs. In this way, the firm is able to track changes and reproduce the same landing zone in different environments — production, development and staging — growing consistency and reliability.

CloudFormation templates were also used to define the infrastructure for RStudio and Galaxy, ensuring the right number of instances, S3 bucket configurations, and more. All CloudFormation templates are available in the company’s new CodeCommit service repository.

Based on work by Matt Chambers at Vanderbilt University, the Flux7 team built a single Galaxy cluster within AWS (that can be easily reproduced across its environments) using CfnCluster, Amazon’s framework that deploys and maintains high performance computing clusters on AWS. Using CfnCluster, the Flux7 team was able to create a single Galaxy cluster for the company that would scale based on the number of jobs in the queue. This was important to to the scientific team as:

  1. All scientists could now use the same cluster and not be concerned about tracking and managing an unwieldy number of clusters;
  2. Scaling for jobs, rather than CPU usage or other compute metrics, ensures scalability that directly addresses their needs;
  3. CfnCluster supports spot instances, allowing the team to take advantage of steep discounts on resources.

In all, these steps have helped the research firm easily share their work and results using Galaxy, in a scalable environment that is significantly decreasing costs while increasing productivity of the research team.

Security & Compliance

For security, Flux7 implemented AWS CIS Framework recommendations and NIST 800-level controls as a moderate baseline. In addition, it ensured security best practices were built in through:

  • Data encryption in transit and at rest.
  • Disaster preparedness with EC2 instance back-ups — using a Lambda function — which are then deleted after a defined retention period.
  • CloudWatch monitoring for defined metrics and the collection of aggregate application logs for review and debugging, as needed.
  • Site-to-site IPSec tunnels to securely enable communication between the Internal Production, Internal Service, and Internal Staging accounts, and the on-premise network.
  • IAM and CloudTrail-ensured separation of duties.
  • An AWS Inspector review of all EC2 instances for vulnerable software.

Through these mechanisms and more, Flux7 and the research firm built controls into the architecture to ensure the new environment is secure and meets NIST 800-53 and FISMA standards.

Ensuring Success

In addition to a secure, scalable AWS environment, knowledge transfer was a key gating factor of success, with this company wanting to efficiently manage its own infrastructure moving forward. As a result, Flux7 taught the DevOps and security teams along the way ‘how to fish’ to ensure they could effectively manage and extend the infrastructure moving forward. The teams then worked together to train the research scientists on their new Galaxy environment.

Benefits

Flux7 enabled this company’s scientists — as well as its potential federal government customer — to effectively analyze critical data in a new high performance computing environment. And, the Flux7 Enterprise DevOps Framework-based environment meets the company’s internal and external customers’ FISMA and NIST compliance objectives.

Moreover, fully committed to AWS, the firm now has an optimized architecture to maximize AWS benefits for decreased maintenance. In the past, this firm had to configure its environment manually and if it was out of capacity, bring it all down and rebuild it. Following the new AWS Galaxy cluster implementation, the company now has a fully scalable cluster where resources are able to spin up and down based on jobs in the queue, maximizing scientist productivity, reducing costs, and ensuring long-term success.

Business Needs

  • High performance computing environment
  • Secure and meets NIST 800-53 and FISMA standards
  • Self-service IT environment for research scientists

Solution

  • AWS CloudFormation
  • AWS CodeCommit
  • CfnCluster
  • Galaxy