Optimization techniques

In the Chair for Compiler Construction we are interested in optimization methods, in general. This includes traditional compiler optimizations, optimizations for programs written as dataflow and process networks, optimizations for energy efficiency (e.g., in the HAEC project), optimization for domain-specific languages and for new emerging technologies. As such, this research line permeates all projects performed at the chair.

Autotuning for DSLs in Particle-based Simulations

In recent years, computer simulations have become a cornerstone of scientific computing, standing alongside theory and experiment. Applications ranging from weather prediction to systems biology and astrophysics now harness advanced computational power—from distributed clusters to GPUs. Despite remarkable progress in hardware and the development of powerful libraries and tools, programming for high-performance computing (HPC) remains a complex challenge.

Particle methods, in particular, are widely used for simulations and are supported by various libraries that help computational scientists build simulations more easily. Among these, OpenFPM [1] stands out as a highly efficient library. However, its intricate C++ template-based syntax presents a steep learning curve. To mitigate this complexity, we designed OpenPME [2], a Problem-Solving Environment (PSE) that introduces a domain-specific language (DSL) built atop a versatile domain metamodel capable of expressing a wide range of numerical simulations.

Although OpenPME successfully provides a more user-friendly interface, further performance optimization remains essential. Our goal is to fine-tune hyperparameters available at the DSL level for particle discretization methods (SPH, PSE, and DC-PSE), identifying optimal configurations for specific simulation needs. The key tuning parameters include:

spatial sampling (h)
time step size (dt)
smoothing kernel width (ε)
cutoff radius (rc)

We developed a data-driven, regression-based autotuner composed of three main steps [3]. The first step uses domain knowledge about theoretical convergence to predict the time step size, while fixing the other parameters. An accuracy model is built using linear least-squares regression. In the second step, we search for the optimal spatial sampling h, assisted by the accuracy model and the predicted dt. For the remaining parameters, no analytical performance model is available therefore we explore the full range for ε and use a bisection search for rc. The search concludes when the configuration yielding the lowest runtime is identified.

As shown in the figure for the nonlinear Gray-Scott reaction-diffusion system use-case, our approach identifies significantly faster configurations compared to general-purpose autotuners—even when those autotuners are allowed 8x or 16x more optimization time. On average, our model-based autotuning outperforms generic approaches by a factor of 4.2x, effectively balancing accuracy and runtime.

References:

[1] P. Incardona, A. Leo, Y. Zaluzhnyi, R. Ramaswamy, and I. F. Sbalzarini, “Openfpm: Ascalable open framework for particle and particle-mesh codes on parallel computers,” Computer Physics Communications, vol. 241, pp. 155–177, 2019.

[2] Nesrine Khouzami, Lars Schütze, Pietro Incardona, Landfried Kraaz, Tina Subic, Jeronimo Castrillon, Ivo F. Sbalzarini, "The OpenPME Problem Solving Environment for Numerical Simulations", In Proceeding: International Conference on Computational Science (ICCS’21)

[3] Nesrine Khouzami, Friedrich Michel, Pietro Incardona, Jeronimo Castrillon, Ivo F. Sbalzarini, "Model-based Autotuning of Discretization Methods in Numerical Simulations of Partial Differential Equations", In Journal of Computational Science, vol. 57, pp. 1–11, Dec 2021

Compiler-Guided Optimization Techniques for Emerging In-Memory Technologies

As we improve our understanding of new technologies and materials, we see how new application domains open up. We also get a better grasp of the potential benefit that these technologies can bring for already established application domains. To accelerate progress in these domains, programming and design tools are needed, which is the matter of this research line at the Chair for Compiler Construction.

These novel architectures promise significant gains in energy efficiency and performance but require intelligent software support to fully exploit their capabilities.

At the our chair, we have designed optimization techniques tailored to different categories of such systems. For example, while many existing methods for Racetrack memories rely on additional hardware to reduce shift operations, these approaches often increase chip area, latency, and energy consumption, and they fail to adapt to the application's memory behavior. We at the chair for compiler construction have developed a set of data placement techniques for RTMs that maximize the likelihood that consecutive references access the same or nearby memory locations at runtime, thereby minimizing the number of shifts. We have formulated the data placement problem in RTMs as an integer linear program (ILP) and have developed a novel heuristic called ShiftsRedcue that provides a near-optimal solution [1].

In conventional CMOS-based Von-Neumann machines, increasing the off-chip bandwidth is becoming increasingly expensive and is strictly constrained by the chip package and system models. On the contrary, non-Von-Neumann system models like computing-near-memory (CNM) and computing-in-memory computing (CIM) have shown great promise by outperforming the conventional Von-Neumann system models by orders of magnitude, both in terms of latency and energy consumption. The idea consists in bringing computations closer to the data or processing data where it makes more sense. At the chair for compiler construction, we are exploring CNM and CIM systems based on various memory technologies, for different use cases. We are also developing tools and software methods for the design space exploration and effective utilization of these systems [2, 3].

References:

Asif Ali Khan, Hamid Farzaneh, Karl Friedrich Alexander Friebel, Clément Fournier, Lorenzo Chelini, and Jeronimo Castrillon. "CINM (Cinnamon): A compilation infrastructure for heterogeneous compute in-memory and compute near-memory paradigms." ASPLOS 2024.
Fazal Hameed, Asif Ali Khan and Jeronimo Castrillon, "ALPHA: A Novel Algorithm-Hardware Co-design for Accelerating DNA Seed Location Filtering", in IEEE Transactions on Emerging Topics in Computing, doi: 10.1109/TETC.2021.3093840.
A. A. Khan, F. Hameed, T. Shahroodi, A. K. Jones and J. Castrillon, "Efficient Memory Layout for Pre-Alignment Filtering of Long DNA Reads Using Racetrack Memory", IEEE Computer Architecture Letters, 2024.

Go back