NVCC: The NVIDIA CUDA Compiler
NVCC, which stands for NVIDIA CUDA Compiler, is a proprietary compiler by NVIDIA that compiles CUDA C/C++ code for execution on CUDA-enabled GPUs (Graphics Processing Units).
NVCC acts as a compiler driver, controlling the compilation flow and linking process, while delegating the actual code generation to other tools like the host compiler and the CUDA backend compiler.
CUDA Programming Model
CUDA follows a heterogeneous programming model where the host code runs on the CPU and the device code, also known as kernels, run on the GPU.
The host code is responsible for memory allocation on the device, data transfer between host and device, and launching kernels on the GPU. Kernels are C++ functions marked with the global keyword, indicating that they are callable from the host and execute on the device.
NVCC Workflow
NVCC processes CUDA source files (typically with a .cu extension) and separates the device code from the host code.
It then compiles the device code using the CUDA backend compiler, which generates a PTX (Parallel Thread Execution) assembly file or a cubin (CUDA binary) object file.
The host code is modified to include the necessary CUDA runtime function calls and is then passed to a standard C++ compiler for compilation.
Supported Host Compilers
NVCC relies on a host compiler for preprocessing, parsing, and code generation of the host code.
It supports various host compilers such as GCC, Clang, and Microsoft Visual C++ (MSVC) on different platforms. The specific host compiler used can be specified using the -ccbin option followed by the path to the compiler executable.
CUDA Compilation Trajectory
The CUDA compilation trajectory involves several stages:
Preprocessing: The CUDA source files are preprocessed to handle includes, macros, and conditional compilation.
Compilation:
Device code is compiled to PTX assembly or cubin object files.
Host code is modified and compiled using the host compiler.
Linking:
Device object files are linked together using nvlink.
The resulting device code is embedded into the host object files.
Host object files are linked using the host linker to create an executable.
NVCC Compiler Options
NVCC provides a wide range of compiler options to control the compilation process.
Some key options include:
-gpu-architecture (-arch): Specifies the target GPU architecture (e.g., compute_80 for NVIDIA Ampere).
-gpu-code (-code): Specifies the target GPU code (e.g., sm_80 for NVIDIA Ampere).
-rdc: Enables relocatable device code, allowing separate compilation and linking of device code.
-dc: Compiles device code only, without host code compilation.
-Xcompiler: Passes options directly to the host compiler.
-Xlinker: Passes options directly to the host linker.
Separate Compilation and Linking
NVCC supports separate compilation and linking of device code.
This allows device code to be split across multiple files and linked together using nvlink.
To enable separate compilation, the -rdc option is used to generate relocatable device code.
The compiled objects can then be linked using nvlink, and the resulting device code is embedded into the host executable.
Optimisations: NVCC provides various optimisation options to improve the performance of CUDA code. Some notable optimisations include:
-O3: Enables aggressive optimisations.
-ftz: Flushes denormal values to zero.
-prec-div: Controls the precision of division operations.
-use_fast_math: Enables fast math optimisations.
Code Generation
NVCC generates device code in two forms:
PTX assembly and cubin object files.
PTX is a low-level virtual machine and instruction set architecture that provides a stable interface for CUDA code across different GPU architectures.
PTX code is compiled to binary code by the CUDA runtime during execution, allowing for portability and forward compatibility.
Cubin, on the other hand, is a pre-compiled binary format specific to a particular GPU architecture.
Virtual Architectures and Just-in-Time Compilation
To enable forward compatibility and optimisation for specific GPU architectures, NVCC introduces the concept of virtual architectures.
Virtual architectures (compute_) define a set of features and capabilities that are common across a range of physical architectures (sm_).
NVCC compiles device code to a virtual architecture, which is then compiled to binary code for a specific physical architecture at runtime through Just-in-Time (JIT) compilation.
This allows CUDA applications to run on newer GPU architectures without recompilation.
Debugging and Profiling
NVCC provides options for debugging and profiling CUDA code.
The -g option enables debugging symbols, allowing for source-level debugging using tools like cuda-gdb.
The -lineinfo option generates line number information for device code, enabling profiling and performance analysis using tools like NVIDIA Visual Profiler.
Conclusion
NVCC is a powerful compiler that simplifies the process of compiling and linking CUDA C/C++ code for execution on NVIDIA GPUs.
It handles the intricate details of separating device code from host code, compiling device code to PTX or cubin, and linking everything together into a final executable.
With its wide range of compiler options, optimizations, and support for separate compilation and linking, NVCC provides developers with the tools necessary to write efficient and high-performance CUDA applications.
Last updated