Experimentation with CUDA Graphs
This example will involve setting up a basic CUDA graph that performs a vector addition followed by a vector multiplication, showcasing the explicit construction of a graph with kernel launches and dependencies between them.
Example: Vector Addition and Multiplication using CUDA Graphs
First, let's set up a simple CUDA kernel for vector addition and vector multiplication:
Now, let's construct and execute the graph:
Explanation
Memory Allocation and Initialization: We allocate unified memory for vectors A, B, and C, and initialize vectors A and B.
Graph Capture: We start capturing the CUDA stream to automatically build a graph. The
VecAdd
kernel is recorded, followed by theVecMul
kernel.Instantiate and Execute: After capturing, we instantiate the graph and then execute it. This separates the setup from execution, allowing the graph to be reused without setup overhead.
Results: The program prints the first ten elements of vector C to verify the computations.
Compilation and Execution
Compile this program using nvcc
:
This example should give you a practical look at how CUDA graphs can be used to optimise workflows that involve multiple dependent kernel executions.
It illustrates the efficiency of setting up the graph once and executing it multiple times if needed, particularly useful in iterative algorithms or repeated computations in simulations and machine learning inference.
Last updated