Graph Rewriting (GW) module

The Graph Rewriting (GW) module in TensorRT-LLM is a powerful tool for manipulating and optimizing the underlying graph of a neural network.

It allows you to modify the network structure at the ILayer/INetworkDefinition level, which is a lower-level representation compared to the high-level Module abstraction.

Let's dive into the details of the Graph Rewriting module and explore its usage and best practices.

When to Use Graph Rewriting

Graph Rewriting is particularly useful in the following scenarios:

When you only have access to the ILayer/INetworkDefinition representation of the network and want to perform optimizations at that level.
When modifying the network using Module Rewriting would lead to complex control flow or scattered functionality across multiple Module instances.

Graph Rewriting APIs: TensorRT-LLM provides several core APIs for Graph Rewriting:

Tensor.get_parent: Retrieves the ILayer that produces a given tensor.
Tensor.get_users: Retrieves the consumer ILayers of a given tensor.
replace_all_uses_with: Replaces a tensor with another tensor in all its consumer ILayers.

FLayerInfo

FLayerInfo is a high-level signature that holds original input information for layers defined in functional.py. It provides a mapping between ILayers and their corresponding high-level information.
FLayerInfo.replace_input_with: Replaces an input tensor with another tensor.
FLayerInfo.replace_output_uses_with: Redirects the usage of original output tensors to a set of new tensors.
FLayerInfoMemo.instance(): Retrieves the singleton instance of FLayerInfoMemo.
FLayerInfoMemo.get: Retrieves the corresponding FLayerInfo for an ILayer.

Pattern and Pattern Manager

TensorRT-LLM defines two types of patterns: PatternRewriter and PatternAnalyzer.
PatternRewriter is used for defining rewriting patterns that actually alter the network structure. It provides methods like match, rewrite, and match_and_rewrite.
PatternAnalyzer is used for defining analysis patterns that collect information from the network. It provides methods like match and analyze.
RewritePatternManager and AnalysisPatternManager are used to manage multiple PatternRewriter or PatternAnalyzer instances, respectively.

Best Practices for Using Graph Rewriting

Understand the Network Structure

Before applying Graph Rewriting, familiarize yourself with the structure of the network and the layers you want to manipulate.
Identify the specific subgraphs or patterns you want to optimize or modify.

Follow the Four-Stage Rewriting Process

When rewriting a layer or subgraph, follow the four-stage process:
1. Retrieve the input and output tensors of the subgraph to be replaced.
2. Create a new subgraph that takes the old subgraph's inputs.
3. Redirect the layers depending on the outputs of the old subgraph to the new subgraph.
4. Mark the layers in the old subgraph as removed.
Avoid directly rewriting layers; instead, create new layers and redirect the usage of the original outputs to the new layers.

Leverage FLayerInfo for Plugin Layers

When working with TensorRT plugin layers, use FLayerInfo to access the original input information.
FLayerInfo provides a high-level abstraction for plugin layers, allowing you to retrieve and modify their inputs and outputs.

Use the @record_signature Decorator

If you are adding new Graph Rewriting patterns that involve functionals, ensure that the functionals are decorated with the @record_signature decorator.
This decorator records the FLayerInfo for a functional, making it available for analysis and rewriting.

Test and Validate Rewritten Networks

After applying Graph Rewriting, thoroughly test and validate the rewritten network to ensure its correctness and performance.
Compare the results of the original and rewritten networks to verify that the desired optimizations or modifications have been achieved.

Consider the Impact on Performance

While Graph Rewriting can lead to optimizations and improved performance, be mindful of the potential impact on inference speed and memory usage.
Profile and benchmark the rewritten network to assess its performance characteristics and ensure that the optimizations are beneficial for your specific use case.

Use Graph Rewriting Judiciously

Graph Rewriting is a powerful tool, but it should be used judiciously and only when necessary.
Overusing Graph Rewriting or applying complex rewriting patterns may lead to reduced readability and maintainability of the network definition.

By following these best practices and leveraging the Graph Rewriting APIs provided by TensorRT-LLM, you can effectively optimize and manipulate the underlying graph of your neural network.

Graph Rewriting allows you to fine-tune the network structure, fuse layers, and apply custom optimizations to improve performance and efficiency.

Remember to test and validate the rewritten network thoroughly to ensure that the desired optimisations are achieved without introducing any unintended side effects.

Additionally, keep in mind that Graph Rewriting operates at a lower level compared to Module Rewriting, so it may require a deeper understanding of the network structure and the TensorRT APIs.

Overall, the Graph Rewriting module in TensorRT-LLM provides a flexible and powerful way to optimize and customize your neural network graphs, enabling you to achieve better performance and efficiency in your TensorRT-based applications.

PreviousRuntime NextFasterTransfomer Library

Last updated 1 year ago

Was this helpful?