Creating Kubernetes based UberCloud HPC Application Clusters using Containers
UberCloud provides all necessary automation for integrating cloud based self-service HPC application portals in enterprise environments. Due to the significant differences inside the IT landscape of large organizations we are continuously challenged providing the necessary flexibility within our own solution stack. Hence we are continuously evaluating newly adopted tools and technologies about the readiness to interact with UberCloud’s technology.
Recent adoption of Kubernetes not just for enterprise workloads but for all sorts of applications, be it on the edge, AI, or for HPC, has strong focus. We created hundreds of Kubernetes clusters on various cloud providers hosting HPC applications like Ansys, COMSOL, OpenFoam and many more. We can deploy fully configured HPC clusters which are dedicated to an engineer on Google’s GKE or Azure’s AKS within minutes. We also use Amazon’s EKS but the deployment time of an EKS cluster is at this point in time significantly slower than on the other platforms (around 3x times). While GKE is excellent and has been my favorite service (due to its deployment speed and its good APIs), AKS has begun in the last months to get really strong. Many features which are relevant for us (like using spot instances and placement groups) and its speed in terms of AKS cluster allocation time (now even almost one minute faster than GKE - 3:30 min. from 0 to a fully configured AKS cluster) have been implemented on Azure. Great!
When managing HPC applications in dedicated Kubernetes clusters one challenge remains: How to manage fleets of clusters distributed across multiple clouds? At UberCloud we are building simple tools which take HPC application start requests and turn it into a fully automated cluster creation and configuration job. One very popular way is to put this logic behind self-service portals where the user selects an application he/she wants to use. Another way is creating those HPC applications based on events in workflows, CI/CD and gitops pipelines. Use cases are automated application testing, running automated compute tasks, cloud bursting, infrastructure as code integrations, and more. To support those tasks we’ve developed a container which turns an application and infrastructure description into a managed Kubernetes cluster independent of where the job runs and on which cloud provider and regions the cluster is created.
Due to the flexibility of containers UberCloud’s cluster creation container can be used in almost all modern environments which support containers. We are using it as a Kubernetes job and as CI/CD tasks. When the job is finished, the engineer has access to a fully configured HPC desktop including a HPC cluster attached.
Another integration we just tested is Argo. Argo is a popular workflow engine targeted and working on top of Kubernetes. We have a test installation running on GKE. As the UberCloud HPC cluster creation is fully wrapped inside a container running a single binary the configuration required to integrate it in an Argo workflow is very minimal. It just requires a simple cluster workflow definition with a single task.
After the workflow (task) is finished, the engineer gets automatic access to the freshly created remote visualization application running on a newly allocated AKS cluster spanning two node pools having GUI based remote Linux desktop access setup.
Destruction of UberCloud’s HPC application clusters running on Kubernetes can be integrated in Argo very similar. All that is required is instructing UberCloud’s uc tool (internal command line tool for managing Kubernetes based deployments) based container with the right parameters given as environment variables in a dedicated workflow.
The overall AKS cluster creation, configuration, and deployment of our services and HPC application containers just took a couple of minutes. Similar to cluster destruction. More complex workflows can add tasks in between those calls which send computational tasks to the clusters when computation should be automated and the cluster is not used interactively by an engineer. This solution is targeted for IT organizations which are challenged by the task of rolling out HPC applications for their engineers but are required to work with modern cloud based technologies.