functional.py

tensorrt_llm/functional.py

The functionals in TensorRT-LLM are advanced computational tools for building neural network models, particularly for large language models.

They provide high-level abstractions for common tensor operations, allowing for more efficient and optimised model construction. Let's break down some of these functionals:

AllReduceStrategy

Purpose: Determines the strategy for distributed computation reduction (aggregation).
Types:
- AUTO: Automatically selects the best strategy.
- ONESHOT: Performs the reduction in a single step.
- RING: Uses a ring-based approach for reduction.
- TWOSHOT: Splits the reduction into two steps.

AttentionMaskType

Purpose: Specifies the type of mask used in attention mechanisms.
Types:
- bidirectional: For processing where attention is not restricted by position.
- bidirectionalglm: A variant of bidirectional, possibly for specific models.
- causal: Masks future tokens, commonly used in language generation.
- padding: Masks padding tokens for handling variable sequence lengths.

DimRange

Purpose: Represents the range of dimensions for a tensor, useful in defining dynamic shapes for model inputs or outputs.
Attributes: Includes minimum, optimum, and maximum values for each dimension.

LayerNormPositionType

Purpose: Indicates the position of layer normalization in a model architecture.
Types:
- post_layernorm: Applied after the main operation (like attention or MLP).
- pre_layernorm: Applied before the main operation.

LayerNormType

Purpose: Specifies the type of normalization layer.
Types:
- GroupNorm: Normalizes across groups of channels.
- LayerNorm: Standard layer normalization.
- RmsNorm: Root mean square layer normalization.

MLPType

Purpose: Defines the type of Multilayer Perceptron (MLP) layer.
Types:
- FusedGatedMLP: A gated MLP where operations are fused for efficiency.
- GatedMLP: MLP with gating mechanisms.
- MLP: Standard multilayer perceptron.

PositionEmbeddingType

Purpose: Specifies the type of position embeddings used in a model.
Types: Various types, including learned_absolute, relative, rope_gpt_neox, etc., each suitable for different model architectures.

RotaryScalingType

Purpose: Determines the scaling approach for rotary position embeddings.
Types:
- dynamic: Adjusts scaling dynamically.
- linear: Uses a linear scaling approach.
- none: No additional scaling is applied.

Tensor

Purpose: Represents a dense tensor in the model, containing typed elements with a defined shape.
Methods: Includes functions like abs, cast, permute, transpose, etc., to manipulate tensor data.

Functional Operations

abs, sqrt, exp, sin, cos: Apply corresponding unary operations on tensors.
add, sub, mul, div: Perform basic arithmetic operations on tensors.
allgather: Gathers tensors from different GPUs in a distributed setting.
allreduce: Aggregates tensors across different GPUs, often used in parallel training.
arange: Creates a tensor with a range of values.
argmax: Returns the indices of maximum values along a specified dimension.
softmax: Applies the softmax function, useful in classification tasks.
softplus: Provides a smooth approximation to the ReLU function.
split: Splits a tensor into multiple sub-tensors.
transpose: Permute the dimensions of a tensor.
unsqueeze: Adds a singleton dimension to a tensor.
view: Reshapes a tensor without changing its data.
where: Selects elements from two tensors based on a condition.

`tensorrt_llm.functional.activation(input, act_type) → Tensor`

Purpose: Applies an activation function to an input tensor.
Usage:
- input: A tensor to which the activation function will be applied.
- act_type: Type of activation function (e.g., RELU, TANH, SIGMOID).
Example: To apply a RELU activation to a tensor x, you would use activation(x, ActivationType.RELU).

`tensorrt_llm.functional.add(left, right, op) → Tensor`

Purpose: Performs an element-wise operation on two tensors or a tensor and a scalar.
Usage:
- left, right: The tensors (or a tensor and a scalar) on which the operation is performed.
- op: The operation type (e.g., SUM, SUB, MUL).
Example: To add two tensors a and b, use add(a, b, ElementWiseOperation.SUM).

`tensorrt_llm.functional.allgather(tensor, group, gather_dim) → Tensor`

Purpose: Gathers tensors from multiple GPUs in a distributed setting.
Usage:
- tensor: The tensor to be gathered.
- group: List of GPU ranks involved in the operation.
- gather_dim: The dimension along which tensors are gathered.
Example: If you have tensors distributed across 4 GPUs and want to gather them, use allgather(tensor, [0, 1, 2, 3]).

`tensorrt_llm.functional.allreduce(tensor, group, workspace, instance_id, strategy) → Tensor`

Purpose: Reduces tensors across multiple GPUs by computing their sum and replicating the result on each GPU.
Usage:
- tensor: The tensor to be reduced.
- group: List of GPU ranks participating in the operation.
- workspace: Optional tensor for memory pointers visible to all GPUs.
- instance_id: Identifier for synchronization.
- strategy: Strategy for reduction (AUTO, ONESHOT, RING, TWOSHOT).
Example: For an all-reduce operation across 4 GPUs with a tensor x, use allreduce(x, [0, 1, 2, 3]).

In summary, these functionals are integral components of building and optimizing neural networks in TensorRT-LLM.

They facilitate complex operations like activation functions, arithmetic operations, and distributed computing across multiple GPUs, making them essential for efficient large-scale model training and inference.

PreviousFunctionals Nexttensorrt_llm.functional.embedding

Last updated 2 months ago