Graph Rewriting (GW) module
The Graph Rewriting (GW) module in TensorRT-LLM is a powerful tool for manipulating and optimizing the underlying graph of a neural network.
It allows you to modify the network structure at the ILayer/INetworkDefinition level, which is a lower-level representation compared to the high-level Module abstraction.
Let's dive into the details of the Graph Rewriting module and explore its usage and best practices.
When to Use Graph Rewriting
Graph Rewriting is particularly useful in the following scenarios:
When you only have access to the ILayer/INetworkDefinition representation of the network and want to perform optimizations at that level.
When modifying the network using Module Rewriting would lead to complex control flow or scattered functionality across multiple Module instances.
Graph Rewriting APIs: TensorRT-LLM provides several core APIs for Graph Rewriting:
Tensor-Related Methods
Tensor.get_parent
: Retrieves the ILayer that produces a given tensor.Tensor.get_users
: Retrieves the consumer ILayers of a given tensor.replace_all_uses_with
: Replaces a tensor with another tensor in all its consumer ILayers.
FLayerInfo
FLayerInfo is a high-level signature that holds original input information for layers defined in
functional.py
. It provides a mapping between ILayers and their corresponding high-level information.FLayerInfo.replace_input_with
: Replaces an input tensor with another tensor.FLayerInfo.replace_output_uses_with
: Redirects the usage of original output tensors to a set of new tensors.FLayerInfoMemo.instance()
: Retrieves the singleton instance of FLayerInfoMemo.FLayerInfoMemo.get
: Retrieves the corresponding FLayerInfo for an ILayer.
Pattern and Pattern Manager
TensorRT-LLM defines two types of patterns: PatternRewriter and PatternAnalyzer.
PatternRewriter is used for defining rewriting patterns that actually alter the network structure. It provides methods like
match
,rewrite
, andmatch_and_rewrite
.PatternAnalyzer is used for defining analysis patterns that collect information from the network. It provides methods like
match
andanalyze
.RewritePatternManager and AnalysisPatternManager are used to manage multiple PatternRewriter or PatternAnalyzer instances, respectively.
Best Practices for Using Graph Rewriting
Understand the Network Structure
Before applying Graph Rewriting, familiarize yourself with the structure of the network and the layers you want to manipulate.
Identify the specific subgraphs or patterns you want to optimize or modify.
Follow the Four-Stage Rewriting Process
When rewriting a layer or subgraph, follow the four-stage process:
Retrieve the input and output tensors of the subgraph to be replaced.
Create a new subgraph that takes the old subgraph's inputs.
Redirect the layers depending on the outputs of the old subgraph to the new subgraph.
Mark the layers in the old subgraph as removed.
Avoid directly rewriting layers; instead, create new layers and redirect the usage of the original outputs to the new layers.
Leverage FLayerInfo for Plugin Layers
When working with TensorRT plugin layers, use FLayerInfo to access the original input information.
FLayerInfo provides a high-level abstraction for plugin layers, allowing you to retrieve and modify their inputs and outputs.
Use the @record_signature Decorator
If you are adding new Graph Rewriting patterns that involve functionals, ensure that the functionals are decorated with the
@record_signature
decorator.This decorator records the FLayerInfo for a functional, making it available for analysis and rewriting.
Test and Validate Rewritten Networks
After applying Graph Rewriting, thoroughly test and validate the rewritten network to ensure its correctness and performance.
Compare the results of the original and rewritten networks to verify that the desired optimizations or modifications have been achieved.
Consider the Impact on Performance
While Graph Rewriting can lead to optimizations and improved performance, be mindful of the potential impact on inference speed and memory usage.
Profile and benchmark the rewritten network to assess its performance characteristics and ensure that the optimizations are beneficial for your specific use case.
Use Graph Rewriting Judiciously
Graph Rewriting is a powerful tool, but it should be used judiciously and only when necessary.
Overusing Graph Rewriting or applying complex rewriting patterns may lead to reduced readability and maintainability of the network definition.
By following these best practices and leveraging the Graph Rewriting APIs provided by TensorRT-LLM, you can effectively optimize and manipulate the underlying graph of your neural network.
Graph Rewriting allows you to fine-tune the network structure, fuse layers, and apply custom optimizations to improve performance and efficiency.
Remember to test and validate the rewritten network thoroughly to ensure that the desired optimisations are achieved without introducing any unintended side effects.
Additionally, keep in mind that Graph Rewriting operates at a lower level compared to Module Rewriting, so it may require a deeper understanding of the network structure and the TensorRT APIs.
Overall, the Graph Rewriting module in TensorRT-LLM provides a flexible and powerful way to optimize and customize your neural network graphs, enabling you to achieve better performance and efficiency in your TensorRT-based applications.
Last updated