Jobs
Student Projects
At the Compiler Design Lab, we can offer a range of bachelor and master thesis topics. A selection of currently available topics can be found below.
For further topics or Hiwi jobs, please contact Sebastian Hack.
Formal Verification of the Preservation of Security Properties through Compiler Passes
Compiler passes or optimisations often don't preserve important security properties which are widely used to prevent side-channel attacks.
There are many open questions about which optimisations do preserve which security properties.
Hyper-simulations (a hyper-simulation is basically a two-dimensional simulation) provide the means to formally verify the preservation of security properties.
Security properties to be examined include the constant-time property, non-interference, equal execution time (with some dedicated timing model) and non-malleable information flow.
In particular, it would be interesting to show the preservation of non-interference through a linearisation, spilling and register allocation pass.
Another option would be to prove the preservation of equal execution time or non-malleable information flow through some compiler passes.
Your own ideas within this general topic are also very welcome.
To do the formal proofs the student should have a solid Coq background (e.g. by having passed ICL).
Contact: Julian Rosemann.
Exploring Program Equivalence Techniques: From Invariance Mining to Fuzzing
This thesis will investigate approximate program equivalence techniques, including invariance mining, automatic verification, functional equivalence using fuzzing and runtime checks, and language-independent program equivalence. The goal is to compare and contrast these techniques, identify their strengths and limitations, and evaluate their effectiveness in detecting program equivalence. An additional focus lies on identifying a minimal knowledge base to establish functional equivalence.
Research Questions:
- What are the different techniques for program equivalence, and how do they differ from one another?
- How effective are these techniques in detecting program equivalence, and what are their limitations?
- How can these techniques be combined to improve their effectiveness?
- How can program equivalence tools be adapted for different programming languages and platforms? Can equivalence be established independent of concrete languages?
Related: Fuzzing, Synthesis, Separation Logic, Automated Reasoning, Verification, Model Checking
Contact: Marcel Ullrich
Evaluation of SK Calculus as a Compiler Backend
This thesis aims to investigate the potential use of the SK calculus as a backend for compilers. The study will focus on analyzing the performance overhead of SK calculus's simplistic representation, determining the minimum set of instructions and environment required for optimal performance, comparing integer encoding versus native encoding trade-offs, and evaluating the advantages of using the SK calculus as a compiler backend.
Research questions:
- What is the performance overhead of the SK calculus's simplistic representation when used as a compiler backend?
- What is the minimum set of instructions and environment required for optimal performance when using the SK calculus as a compiler backend?
- How does integer encoding compare with native encoding when using the SK calculus as a compiler backend?
- How do optimizations relate and influence the representation?
- What are the advantages of using the SK calculus as a compiler backend?
Related concepts: BF, GRIN, Miranda, Bootstrapping, RISC vs CISC
Contact: Marcel Ullrich
Compiler Optimizations for Factorio: A DSL Approach to Factory Optimization
This research idea proposes to use compiler optimization techniques to improve the performance of Factorio factories by developing a domain-specific language (DSL) for designing and optimizing factory layouts. The study will compare the efficiency of different circuit layouting techniques, including FPGA synthesis and circuit languages such as Verilog, and explore the relationship between compiler optimizations and factory optimizations.
Research questions:
- What compiler optimization techniques can be applied to improve the performance of Factorio factories?
- How can a DSL be developed to facilitate the design and optimization of Factorio factory layouts?
- What are the comparative advantages and limitations of different circuit layouting techniques in the context of Factorio factory optimization?
- How can compiler optimizations be related to factory optimizations, and what insights can be gained from this relationship?
Related: Synthesis, Circuit design, Cross-language communication, modding, optimization (scheduling, layout, code optimization), DSL
Contact: Marcel Ullrich
SIMT Programming Models (CUDA/SYCL) on CPUs: Implementation Techniques for Sub-Group Algorithms
SIMT programming models like CUDA and SYCL are commonly used to program accelerators like GPUs and FPGAs. However, it is desirable that the SIMT kernels are efficiently executable on multi-core CPUs as well, to make the applications portable.
While we investigated in earlier work, how this can be achieved for most kernels, it is still unclear how to best map sub-group algorithms onto CPUs.
Sub-group algorithms are typically used on GPUs to perform register-level sub-group-internal communication to avoid communication via slower memory. SYCL e.g. provides shift_left
/shift_right
/all_of
/any_of
/none_of
/reduce
/… as sub-group algorithms.
There are two known approaches to implement sub-groups algorithms on CPUs:
- perform whole-function (or outer-loop) vectorization, e.g. using the Region Vectorizer, and use the vector front as sub-group; use intrinsics to implement the sub-group algorithms,
- split the kernel at the sub-group functions, using either deep-loop fission or continuation-based synchronization, to perform implicit synchronization at those points; use shared memory to implement the sub-group algorithms.
The topic of this thesis will be comparing those two approaches with regard to generality and achievable performance.
You will work with/on popular open-source projects such as LLVM and Open SYCL (formerly hipSYCL) with the potential to have your work upstreamed.
Contact: Joachim Meyer
Leveraging Abstraction to Generalize Pattern Rewrites
Production compilers, such as LLVM, use pattern rewriting to optimize code and bring it into a normalized form. These pattern rewrites match specific trees of operators and rewrite them into specific output patterns. This thesis explores to what extend the specific pattern rewrites in LLVM can be generalized to rewrite a wider range of operators. For example, arithmetic pattern rewrites can be lifted to work on complex or tensor maths or match across broadcasts of scalar values.
A reference implementation and more background info on the approach is available here: https://reviews.llvm.org/D92086
Theses in this topic could venture in various directions, from hands-on to formal:
- Finding sound abstractions for new classes of operators (for LLVM).
- Formally proving the generalized pattern matching approach (in an idealized formal IR that is not LLVM) and deriving preconditions for correctness on the abstractions.
- Finding approaches to combine different abstraction classes (either in LLVM or formally).
- Developing an efficient and useful implementation for generalized pattern matching based on the prototype (there are trade offs between code size, pattern relevance, …).
Contact: Simon Moll
Integrating the Region Vectorizer
The Region Vectorizer is a vectorization framework for LLVM developed at the Compiler Design Lab. RV has been very successful in vectorizing SIMT code with complex control flow and is the most versatile framework for LLVM in that area out there. Yet, wider popularity and adoption of RV has been held back by the lack of integration with a major programming language compiler. Your thesis in this topic would deal with the integration of RV in a popular LLVM-based compiler. This may include working the community surrounding that language. A successful integration would be highly visible in the LLVM and language communities.
Candidate languages:
- Numba (https://numba.pydata.org/).
- Julia (https://julialang.org/).
- Chapel (https://chapel-lang.org/).
- Rust (https://rust-lang.org).
- ispc (https://ispc.github.io/).
Contact: Simon Moll