Q&A with Flux7’s New HPC Consulting Practice Lead

Q&A with Flux7’s New HPC Consulting Practice Lead

By Flux7 Labs
October 3, 2019

This week Flux7 is proud to announce our new consulting practice dedicated to helping organizations take advantage of cloud-based High Performance Computing (HPC) to accelerate innovation. Our new HPC practice allows enterprises to innovate at scale, and minimize time to market. Leading the new practice is Flux7 Solution Architect, Derek Magill, who joins us today to share more about himself and cloud-based HPC opportunities and challenges.

To stay up-to-date on DevOps best practices, cloud security, and IT Modernization, subscribe to our blog here:

Q: Derek, can you start by telling us a little about yourself and your experience in HPC?


A: Sure. I have worked in the semiconductor industry for more than 20 years, supporting Engineering IT in some of the largest HPC clusters in the industry. I’ve spent the better share of my career working in engineering environments, optimizing infrastructure to handle the varied and demanding workloads you see in engineering. Whether optimizing for the short, bursty workloads of simulation and verification or the longer, CPU intensive back-end runs which can go for days, I’ve spent a lot of time balancing the demands of engineers eager to have their job run and of finance who seeks high utilization of assets. 

 

Specifically, I spent 16 years at Texas Instrument where I focused on our initial deployment of Linux-based HPC to an engineering environment (yes, I’m that old!).  At Qualcomm I initially focused on scaling our EDA license infrastructure for one of the largest, most demanding compute environments in Semiconductor. I then began working to help bring engineering workloads to the public cloud in order to deal with the frequent burst capacity needs we had. It is this experience where I learned some hard-earned lessons about HPC workloads in the cloud, and the challenges as well as the opportunities that go with it.

 

Q: Tell us more. How would you characterize the challenges of moving to the cloud?

 

A: Interestingly, a lot of the challenges stem from the financial side. For example, some companies have significant investments in the form of data centers, equipment, tools, people and expertise to manage their HPC environments. And those investments don’t just go away because you’re moving to the cloud. This often leads to TCO analysis of cloud HPC. Companies tend to reduce the cloud to simply compute and storage, but there’s so much more than that.  The ability to automate and turn your infrastructure into code isn’t just a buzzword, it’s a reality.  But it’s hard to put that into a spreadsheet.

 

Q: In your view, what should be included in that analysis? 

 

A: Elasticity and flexibility are often overlooked in this equation. On-premises compute power is not elastic. As a result, invariably you will find times that your demand for compute outstrips — sometimes far outstrips — your ability to meet that demand with on-premises resources. The result of that is often two-fold. 

 

First, you spend a couple of weeks scrounging around to see if you have additional internal resources to meet the need. Or you try and barter with groups who have dedicated resources.  If you can’t find anything under the proverbial couch cushion, then you second, have to put together a financial justification. While the process is a little different for everyone, it boils down to getting approval, cutting a PO, and then (and only then) begin waiting 8-12 weeks as a vendor puts together and ships your order. 

 

One of the great things about the cloud is the ability to shortcut a significant part of the process and meet the need on a much closer timescale. While you may still need to get approval to run a certain number of cloud resources at a given price, you can get going much more quickly, cutting 2-3 months from the process, giving your engineers the relief they need to hit that tight timeline they have to get a chip out with the right PPA (Power, Performance and Area).

 

The bottom line of all of this is that time to market is incredibly important. Enterprises should be asking themselves questions like, ‘What is the value of pulling in a schedule by several weeks to a month’? 

 

Q: What about HPC-related security and risks in the cloud? 

 

A: There is rightful concern in the industry about securing intellectual property (both your own and third party IP). While it’s always right and good to be protective of IP, in my experience, much of the concern is based on stories people have read in the press of other companies who did things like leave an S3 bucket wide open and then thousands of social security or credit card numbers were left exposed. However, these breaches are not the fault of the cloud provider, but rather a configuration error. The cloud actually has better security controls and inspection methods than traditional on-premise environments, it’s simply a matter of knowing how to use them.  


Q: How about compliance and governance?

 

A: Information governance is an absolutely critical component of any modern HPC environment. After all, data is what fuels our compute cycles; data is the most critical piece. Yet many companies have grown up with a collegial environment where they widely and openly shared data. The rise of third party IP has put additional compliance burdens on IT departments that didn’t exist 20 years ago, complicating information governance efforts. Here again, it comes back to balance — a balance between productivity and access. Applying the Principle of Least Privilege combined with information governance policies, automation and agility that is built in can help you achieve an effective and still productive balance. 

 

Q: Any additional opportunities that cloud-based HPC provides that we haven’t discussed yet? 

 

A: Absolutely. I see four main areas of opportunity for enterprises looking at cloud-based HPC. 1) As I mentioned earlier, cloud HPC can really cut into those long procurement cycles, getting simulations and verifications to run sooner, leading to chips produced on time with proper quality metrics.


2) Data centers are expensive and eventually they need to be refreshed in some way.  One day you will go to your CFO to ask for money for more physical space, more power, a core network upgrade, an HVAC upgrade, or some other big ticket item. And one day, your CFO will ask you what your plan for cloud is. It’s better to have an answer than a blank stare. The good news is that cloud removes a lot of your burden and responsibility as cloud providers take it upon themselves to update and provision capacity, allowing you, the IT department, to be the customer for a change.

 

3) There are a million TCO analyses out there for HPC workloads, and they say that cloud is more expensive than on-premises. And there’s a lot of truth in those. But cloud can be less expensive, depending on your workload and use model. Taking advantage of the spot market for simulation and verification loads can let you run more simulations for less compute spend, for instance.

 

4) Cloud HPC gives teams much-needed agility. For example, it enables teams to take a hybrid approach, bursting into the cloud when on-premises capacity is running at full utilization and going back to on-premises when the short-term need has subsided. In this way, teams can meet ebbs and flows with cloud capacity, serving the company’s needs with greater agility and speed.

 

Q: Can you give us an example of a workload that would be ideal for the cloud?

 

A: AI and ML are two significant areas I’d highlight here. While these are not engineering use cases per se, they have an incredible amount of potential in optimizing how much compute you use. They do so by analyzing the terabytes of data your compute produces and giving you insights into how to target your jobs in a way that would be impossible using simple human analysis. Many on-premises setups have a hard time with ML and AI due to the challenges involved with tooling such as Python versions and packages.  On top of that, many shops don’t keep up with the latest GPUs or have specialized processors to speed up ML algorithms like Tensorflow the same way that cloud providers do. And even if you do have that gear in house, you probably aren’t updating it as often as cloud providers are.  

 

Q: Our last question is a little self-serving. Can you tell us a little about the new HPC Consulting Practice?

A: I’m thrilled to be leading the HPC practice at Flux7 and look forward to working with firms across industries to build and extend their cloud HPC initiatives. The new practice will help companies remain competitive by taking advantage of the flexibility and elasticity that the cloud offers, while giving access to the latest technologies such as specialized compute instances, a myriad of high performance storage options, 25Gb and even 100Gb networking, network fabric adapters, high performance data analytics, AI, ML, and deep learning. Truly the list is dizzying.  

Selecting the right choices to build and deploy systems for these workloads requires the right skills and specialized approach and that’s where the experience of the entire Flux7 team comes in, helping companies access these capabilities at scale, automating functions, and allowing enterprise IT organizations to focus on deployment rather than development of these critical capabilities.

 

At the end of the day, Flux7’s HPC Consulting practice helps organizations take advantage of high-performance infrastructure and compute power to speed time to result, accelerate insights and lead to better business results.


Interested in learning more?

Attend our AWS HPC and Spot Instances Immersion Day

Join Flux7 for our AWS HPC Spot Instances Immersion Day on October 11, in Houston, Texas.  This one-day technical workshop is designed to provide hands-on training in AWS HPC services including Batch, Parallel Cluster, Elastic Fabric Adapter (EFA), FSX for Lustre and others. To learn more and register, please visit: https://www.flux7.com/event-hpc-workshop-houston-all-audiences/ 

 

And to learn more about Flux7 HPC Services, please visit: https://www.flux7.com/services/high-performance-computing-services/

Technology is always changing. Stay in the loop with the Flux7 Blog

Written by Flux7 Labs

Flux7 is the only Sherpa on the DevOps journey that assesses, designs, and teaches while implementing a holistic solution for its enterprise customers, thus giving its clients the skills needed to manage and expand on the technology moving forward. Not a reseller or an MSP, Flux7 recommendations are 100% focused on customer requirements and creating the most efficient infrastructure possible that automates operations, streamlines and enhances development, and supports specific business goals.

Subscribe Here!