HPC savvy members of HPC**2

Karsten Gaier: HPC executive, entrepreneur

The key to do pre/post processing in the Cloud: Click-to-Photon: Measuring local and remote display latencies

Nick Mayes: FORTRAN code & optimization, parallel computing, storage technologies

  • A UK research facility was using a symmetric multiprocessor supercomputer to model coolant flows in a nuclear reactor, but only using one CPU due to being programmed in serial FORTRAN. It took two days’ work to effectively exploit all the CPUs by analysing the data dependencies and scope for parallelism, resulting in a delighted customer seeing run times reduced dramatically, and the gift of a bottle of fine wine. The sting in the tail/tale is that another 3 days were spent tracking down a memory leak in the I/O library. 
  • A defence customer was using the finite difference time domain method to model electro-magnetic phenomena, coded in serial FORTRAN. Working in a team of three coding experts, this code was ported based on MPI spatial decomposition to a massively parallel architecture and ultimately benchmarked on 1024 CPUs. This successful proof-of-concept resulted in the customer contracting for time on a national supercomputer resource, with the code becoming the fastest in its field in the UK.
  • As part of a benchmarking exercise for academia, the requirement was to modify a sequential molecular dynamics model to run on an MPP prototype. The code was modified with manual data decomposition and the insertion of shmem function calls, to create a pseudo-shared memory parallelism that could have been implemented if a parallel FORTRAN compiler existed. The benchmark was successful for the vendor. 

Joseph Pareti: HPC on Linux clusters, parallel computing using MPI and OpenMP

  • Supporting an aerospace customer migrating CFD applications from an SMP system to a Linux cluster
  1. golden image cloning and cluster-wide OS installation using a proprietary tool, after having detected limitations in an open source tool available at the time. 
  2. installation and customization of a batch scheduler system
  3. benchmarking the main customer’s application, Fluent, and resolving some specific issues with Fluent reporting intervals that caused inconsistent timings 
  4. (depending on the OS, at the time of benchmarking Fluent was waiting to coalesce ACKs prior to sending packets)
  5. demonstrating overall shorter time-to-solution than on their legacy platform, and running the acceptance tests. 
  6. Additional services were on premise customer’s workshops to train the users and to provide custom scripts integrating the batch queuing system and the application programs.
  • Working on a parallel computing application for a research institute: porting and performance tuning

A quantum mechanic application required a substantial change of communication patterns using MPI-1, in order to overlap as much as possible communications and computations. Starting from VAMPIRE traces, it was possible to determine the bottlenecks that were then mitigated by optimizing point-to-point communications while restructuring the loop constructs for boundary exchanges, and optimizing the selection and usage of collective communications such as allreduce.

The target benchmark is a sort of iterative algorithm for a certain problem of quantum mechanics. The solution is sought in a region of four dimensional space-time by an iterative method using Chebyshev approximation of the inverse square root function.

For parallelization, the problem is mapped onto a virtual three dimensional process grid (the Z direction was not distributed). This process grid is in turn mapped onto the available hardware processors, also taking into account the larger intra-node processors bandwidth in relation to the message passing network bandwidth.

The result was a performance improvement in the order of 10-20% compared to the as-is program performance.

Oliver Schreiber, Ph.D.: flight simulator, CFD applications, number crunching and visualization, rendering

  • At Boeing/McDonnell Douglas-Long Beach, led two 8-processor Onyx, Open GL/Performer/Sirius flight simulator S/W projects including a PMMW (Passive Millimeter Wave) enhanced vision system Pilot in the loop real-time simulation with noise, jitter, motion blur and variable aperture
  • At Boeing North American Downey, developed Guidance/Navigation & Control Embedded Real Time Simulation Software with ADA/MatrixX/Matlab for US Propulsion Module & International Space Station
  • For Ph.D. thesis, developed a time marching multi-process interactive Computational Fluid Dynamics code with concurrent graphical 3D visualization and user interface to predict rotor-fuselage interactions which led to five publications and SGI Technology VP’s Graphics/Parallel Processing Demo, ’97 Dev. Forum. Departure from usual batch computing and post processing has graphics rendering running concurrently with simulation parallel computational threads and processes. Geometry and simulation conditions are modifiable interactively down to mouse picking and dragging of individual vertices or simulation objects and effects visible immediately.

Henry Strauß: project management, HPC business, architecture design

  • managed joint project for the development of a special purpose hpc system, generating funding from the public sector, getting partners from different sides on board
  • helped a big car manufacturer moving its hpc infrastructure from SMPs to clusters, showing what potential pitfalls to take into account, but at the end resulting in a much better ROI, i.e. a competitive advantage
  • handled multiple hpc procurement processes including benchmarking and system design for customers in academia and industry