XLcloud Use Case - Compute Plants (HPC clouds Research & Industry)

The Context

Research organizations as well as Industry want to operate a major/disruptive change in externalizing their HPC (High Performance Computing) infrastructures, considered too heavy and complex to manage and to bring an acceptable ROI. This change in HPC use cases occurs as a growing demand from the HPC community for HPC public or private clouds, also known as "compute plants" and hosted and administrated by service providers in IaaS, PaaS or SaaS modes.
These HPC clouds slowly show up on the market but face several difficulties due to HPC application integration complexity (especially because of parallelism, interconnects, load management and performance issues), the numerous kinds of businesses covered, data size, licensing management and the need for transparent 3D remote visualization. Indeed, once the computation result available and stored on the cloud, its size (form GB to TB) and network latency require that the result files stay on the cloud and are post-processed (viewed) interactively and remotely.

Description

Extreme Factory is an HPC-as-a-Service provider which helps end users access a large number of pre-installed HPC (high performance parallel digital simulation) and the related 3D viewing applications. Access is provided via a secured web browser or SSH command line connections and high performance clustering architecture and load balancing is automated and hidden to the customers.

xfdc.jpg

The goal of this use case is to make it possible to progressively migrate and extend Extreme Factory offers to additional vertical markets (today : HPC only, tomorrow : CAD, gaming, rendering, VoD…) segments and layers (today : SaaS/HPCaaS, tomorrow : high performance IaaS and PaaS).
In order to minimize investments and operation costs, we need to build Extreme factory’s next generation on top of new building blocks provided by XLcloud:

  • A real cloud management system (CMS), providing easy and modern ways to provision and automate our compute and visualization clusters, storage systems and networks
  • APIs for cluster management and end user software
  • Powerful additional functionality like rich accounting, resource reservation, ERP integration, energetic optimization…

Risks, locks

  1. Average enterprise WANs and regional public networks suffer from 10s to 100s millisecond latency which does not allow fluid remote user interaction with 3D models
  2. So far, no fast, reliable and supported streamer 3D has been identified as suitable for HPC visualization purposes (we are developing our own)
  3. Fluid visualization typically requires sustained 5+ Mb/s per user session as of today and latency below 100 ms as far as possible (based on TurboVNC + VirtualGL). This is incompatible with most client Internet access points as well as in enterprise global networks. Only large national / international Research organisations happen to benefit from very large networks
  4. Windows and Direct3D are not better handled and are not portable which adds complexity to 3D graphical clusters
  5. X11/3D servers evolve but do not match the growing Windows 3D application market
  6. GPU virtualization first depends on market leaders (so to say, nVidia with Grid/VGX product line) which are slow to provide all the building blocks. Big virtualization software vendors are only starting to base new offers on that technology but od goperformance-cost ratio is not there yet

State-of-the-art

Bull and fully-owned subsidiary Serviware already host production HPC and visualization jobs for several industrial customers on their Extreme Factory platform that is a first successful exposure of an HPC-as-a-Service. However, is not based on a real Cloud Management System (CMS)
 

  1. "Partial" HPC cloud infrastructure where several physical clusters are isolated one from the other, resizable and can be provisioned/unprovisioned depending on multiple customer's activity and orders.
  2. "Partial" virtualization : compute nodes are still physical due to several technical problems (GPU and Infiniband virtualization not available at the time of writing). Service nodes are virtual (scheduler, monitoring, directory, deployment server...) and highly available
  3. Web portal for compute job and remote visualization end-user experience being based on 3D streaming and remote desktop technology that are developed by very small commununities and like support, documentation and stability. 3D streaming is preferably achieved via TurboVNC/VirtualGL at the time of writing and the new Bull XRV technology (eXtreme Remote Visualizer, which prodie) released in 2013.

ExtremeFactoryComputingStudio.jpg

XRV.jpg


This wiki is licensed under a Creative Commons 2.0 license
XWiki Enterprise 5.4.6 - Documentation - Legal Notice

Site maintained by