functional.py
tensorrt_llm/functional.py
The functionals in TensorRT-LLM are advanced computational tools for building neural network models, particularly for large language models.
They provide high-level abstractions for common tensor operations, allowing for more efficient and optimised model construction. Let's break down some of these functionals:
AllReduceStrategy
Purpose: Determines the strategy for distributed computation reduction (aggregation).
Types:
AUTO
: Automatically selects the best strategy.ONESHOT
: Performs the reduction in a single step.RING
: Uses a ring-based approach for reduction.TWOSHOT
: Splits the reduction into two steps.
AttentionMaskType
Purpose: Specifies the type of mask used in attention mechanisms.
Types:
bidirectional
: For processing where attention is not restricted by position.bidirectionalglm
: A variant of bidirectional, possibly for specific models.causal
: Masks future tokens, commonly used in language generation.padding
: Masks padding tokens for handling variable sequence lengths.
DimRange
Purpose: Represents the range of dimensions for a tensor, useful in defining dynamic shapes for model inputs or outputs.
Attributes: Includes minimum, optimum, and maximum values for each dimension.
LayerNormPositionType
Purpose: Indicates the position of layer normalization in a model architecture.
Types:
post_layernorm
: Applied after the main operation (like attention or MLP).pre_layernorm
: Applied before the main operation.
LayerNormType
Purpose: Specifies the type of normalization layer.
Types:
GroupNorm
: Normalizes across groups of channels.LayerNorm
: Standard layer normalization.RmsNorm
: Root mean square layer normalization.
MLPType
Purpose: Defines the type of Multilayer Perceptron (MLP) layer.
Types:
FusedGatedMLP
: A gated MLP where operations are fused for efficiency.GatedMLP
: MLP with gating mechanisms.MLP
: Standard multilayer perceptron.
PositionEmbeddingType
Purpose: Specifies the type of position embeddings used in a model.
Types: Various types, including
learned_absolute
,relative
,rope_gpt_neox
, etc., each suitable for different model architectures.
RotaryScalingType
Purpose: Determines the scaling approach for rotary position embeddings.
Types:
dynamic
: Adjusts scaling dynamically.linear
: Uses a linear scaling approach.none
: No additional scaling is applied.
Tensor
Purpose: Represents a dense tensor in the model, containing typed elements with a defined shape.
Methods: Includes functions like
abs
,cast
,permute
,transpose
, etc., to manipulate tensor data.
Functional Operations
abs
,sqrt
,exp
,sin
,cos
: Apply corresponding unary operations on tensors.add
,sub
,mul
,div
: Perform basic arithmetic operations on tensors.allgather
: Gathers tensors from different GPUs in a distributed setting.allreduce
: Aggregates tensors across different GPUs, often used in parallel training.arange
: Creates a tensor with a range of values.argmax
: Returns the indices of maximum values along a specified dimension.softmax
: Applies the softmax function, useful in classification tasks.softplus
: Provides a smooth approximation to the ReLU function.split
: Splits a tensor into multiple sub-tensors.transpose
: Permute the dimensions of a tensor.unsqueeze
: Adds a singleton dimension to a tensor.view
: Reshapes a tensor without changing its data.where
: Selects elements from two tensors based on a condition.
tensorrt_llm.functional.activation(input, act_type) → Tensor
tensorrt_llm.functional.activation(input, act_type) → Tensor
Purpose: Applies an activation function to an input tensor.
Usage:
input
: A tensor to which the activation function will be applied.act_type
: Type of activation function (e.g., RELU, TANH, SIGMOID).
Example: To apply a RELU activation to a tensor
x
, you would useactivation(x, ActivationType.RELU)
.
tensorrt_llm.functional.add(left, right, op) → Tensor
tensorrt_llm.functional.add(left, right, op) → Tensor
Purpose: Performs an element-wise operation on two tensors or a tensor and a scalar.
Usage:
left
,right
: The tensors (or a tensor and a scalar) on which the operation is performed.op
: The operation type (e.g., SUM, SUB, MUL).
Example: To add two tensors
a
andb
, useadd(a, b, ElementWiseOperation.SUM)
.
tensorrt_llm.functional.allgather(tensor, group, gather_dim) → Tensor
tensorrt_llm.functional.allgather(tensor, group, gather_dim) → Tensor
Purpose: Gathers tensors from multiple GPUs in a distributed setting.
Usage:
tensor
: The tensor to be gathered.group
: List of GPU ranks involved in the operation.gather_dim
: The dimension along which tensors are gathered.
Example: If you have tensors distributed across 4 GPUs and want to gather them, use
allgather(tensor, [0, 1, 2, 3])
.
tensorrt_llm.functional.allreduce(tensor, group, workspace, instance_id, strategy) → Tensor
tensorrt_llm.functional.allreduce(tensor, group, workspace, instance_id, strategy) → Tensor
Purpose: Reduces tensors across multiple GPUs by computing their sum and replicating the result on each GPU.
Usage:
tensor
: The tensor to be reduced.group
: List of GPU ranks participating in the operation.workspace
: Optional tensor for memory pointers visible to all GPUs.instance_id
: Identifier for synchronization.strategy
: Strategy for reduction (AUTO, ONESHOT, RING, TWOSHOT).
Example: For an all-reduce operation across 4 GPUs with a tensor
x
, useallreduce(x, [0, 1, 2, 3])
.
In summary, these functionals are integral components of building and optimizing neural networks in TensorRT-LLM.
They facilitate complex operations like activation functions, arithmetic operations, and distributed computing across multiple GPUs, making them essential for efficient large-scale model training and inference.
Last updated