In cooperation with the Computer Graphics Group we develop a unified shading system that is independent of source language, target architecture and rendering engine without sacrificing runtime performance.
Our goal is to eventually provide a shading-system that uses a portable shader-format to allow integration into any kind of rendering engine (e.g. ray-tracing, rasterization, global illumination). Additionally, integration of existing shading-languages only requires minimal effort while the compiler technology of AnySL still enables maximum performance.
Shaders denote program fragments that extend the functionality of a rendering system for specific tasks such as computing emission, light-material interaction, or geometry processing --- similar to plug-ins used elsewhere. The key difference to such function-call and library-based plug-ins is that shading code usually needs to be transformed to meet the needs of the target applications regarding performance or program structure and should provide convenience for the programmer. However, to support a certain shading language, the compiler has to provide a compiler framework for it. Hence, the renderer's implementor ends up in investing a large part of his time in creating compilers; something he did not want to do in the first place.
AnySL is a novel approach to ease the integration of a shading language into a renderer. We compile shaders into a program representation that is independent of the shading language, the renderer, and the target hardware platform. The renderer has to provide the implementation of the basic constructs of the shading language. By augmenting the renderer with a just-in-time compiler library, the shaders are loaded and "glued" to the renderer's interface at runtime. Afterwards, the shader is mapped to the underlying hardware platform. With this approach, all performance obstacles incurred by common programming abstraction mechanisms are optimized away, resulting in high performance while keeping the maximum flexibility.
The AnySL Shading System uses an embedded just-in-time compiler (the "Low-Level Virtual Machine" (LLVM)) to load, specialize and optimize shaders at runtime. This allows us to recompile on the fly, e.g. after modifications of shader parameters, without sacrificing performance.
For ray tracing engines that employ packet tracing, the scalar shader code is automatically transformed to packet code that operates on packets of data (that are sized depending on the target architecture's SIMD width). This allows to exploit the SIMD instruction sets of CPUs (e.g. SSE, AltiVec) without putting the burden of writing such complex and error-prone code on the shader programmer. The only option to this is sequential shading of all rays of a packet, which incurs a lot of overhead if the ray tracer operates on SIMD datatypes because packets have to be split before execution and results have to be merged again.
Compared to sequential shading, we obtain an average speedup factor of 3.9 of the entire rendering process in RTfact. At the same time, we reach over 90% of the performance of the hand-written, native shaders.
See the project page for more details: Whole-Function Vectorization.
LLVM PTX Backend
As part of the AnySL system we implemented an LLVM backend for NVIDIA's "Parallel Thread Execution" (PTX) assembly language. PTX is the low-level representation fed to NVIDIA GPGPU graphics drivers and is usually generated by compilers for the "Compute Unified Device Architecture" (CUDA).
The backend is similar to LLVM's C-backend and generates .ptx files directly from LLVM's intermediate representation (IR).
The backend already supports most of the PTX features:
- simple arithmetic (add, mul, ...)
- control flow
- structs and arrays
- simple function calls (no recursion, no struct returns)
- global, shared, constant, and texture memory access
- mathematical functions (e.g. sin, cos, sqrt, pow, ...)
- special registers (e.g. thread_id)
Performance has not yet been optimized to a larger degree. Register pressure lowering optimizations are necessary for more performant code.
The backend was written as part of the bachelor's thesis of Helge Rhodin. The source code is released under the University of Illinois/NCSA Open Source License (BSD-style) and is hosted at SourceForge.
Code contributions to the backend are very welcome! :)
- Whole Function Vectorization
Karrenberg, R. and Hack, S.
International Symposium on Code Generation and Optimization, 2011. [doi] [url] [slides] [bib]
- AnySL: Efficient and Portable Shading for Ray Tracing - HPG 2010
Karrenberg, R., Rubinstein, D., Slusallek, P. and Hack, S.
Proceedings of the Conference on High Performance Graphics, pages 97–105, Eurographics Association, 2010. [url] [slides] [bib]
- Decompilation of LLVM IR
B.Sc. Thesis, Saarland University, 2011. [pdf] [bib]
- A PTX Code Generator for LLVM
B.Sc. Thesis, Saarland University, 2010. [pdf] [bib]
- Ralf Karrenberg
- Dmitri Rubinstein
- Roland Leißa
- Helge Rhodin
- Simon Moll
- Sebastian Hack
- Philipp Slusallek