Optimization techniques

In the Chair for Compiler Construction we are interested in optimization methods, in general. This includes traditional compiler optimizations, optimizations for programs written as dataflow and process networks, optimizations for energy efficiency (e.g., in the HAEC project), optimization for domain-specific languages and for new emerging technologies. As such, this research line permeates all projects performed at the chair. 

Autotuning for DSLs in Particle-based Simulations

In recent years, computer simulations have become a cornerstone of scientific computing, standing alongside theory and experiment. Applications ranging from weather prediction to systems biology and astrophysics now harness advanced computational power—from distributed clusters to GPUs. Despite remarkable progress in hardware and the development of powerful libraries and tools, programming for high-performance computing (HPC) remains a complex challenge.

Particle methods, in particular, are widely used for simulations and are supported by various libraries that help computational scientists build simulations more easily. Among these, OpenFPM [1] stands out as a highly efficient library. However, its intricate C++ template-based syntax presents a steep learning curve. To mitigate this complexity, we designed OpenPME [2], a Problem-Solving Environment (PSE) that introduces a domain-specific language (DSL) built atop a versatile domain metamodel capable of expressing a wide range of numerical simulations.

Although OpenPME successfully provides a more user-friendly interface, further performance optimization remains essential. Our goal is to fine-tune hyperparameters available at the DSL level for particle discretization methods (SPH, PSE, and DC-PSE), identifying optimal configurations for specific simulation needs. The key tuning parameters include:

  • spatial sampling (h)
  • time step size (dt)
  • smoothing kernel width (ε)
  • cutoff radius (rc)

We developed a data-driven, regression-based autotuner composed of three main steps [3]. The first step uses domain knowledge about theoretical convergence to predict the time step size, while fixing the other parameters. An accuracy model is built using linear least-squares regression. In the second step, we search for the optimal spatial sampling h, assisted by the accuracy model and the predicted dt. For the remaining parameters, no analytical performance model is available therefore we explore the full range for ε and use a bisection search for rc. The search concludes when the configuration yielding the lowest runtime is identified.

As shown in the figure for the nonlinear Gray-Scott reaction-diffusion system use-case, our approach identifies significantly faster configurations compared to general-purpose autotuners—even when those autotuners are allowed 8x or 16x more optimization time. On average, our model-based autotuning outperforms generic approaches by a factor of 4.2x, effectively balancing accuracy and runtime.

References:

[1] P. Incardona, A. Leo, Y. Zaluzhnyi, R. Ramaswamy, and I. F. Sbalzarini, “Openfpm: Ascalable open framework for particle and particle-mesh codes on parallel computers,” Computer Physics Communications, vol. 241, pp. 155–177, 2019.

[2] Nesrine Khouzami, Lars Schütze, Pietro Incardona, Landfried Kraaz, Tina Subic, Jeronimo Castrillon, Ivo F. Sbalzarini, "The OpenPME Problem Solving Environment for Numerical Simulations", In Proceeding: International Conference on Computational Science (ICCS’21)

[3] Nesrine Khouzami, Friedrich Michel, Pietro Incardona, Jeronimo Castrillon, Ivo F. Sbalzarini, "Model-based Autotuning of Discretization Methods in Numerical Simulations of Partial Differential Equations", In Journal of Computational Science, vol. 57, pp. 1–11, Dec 2021

Go back