# tensorrt\_llm.functional.embedding

The <mark style="color:yellow;">**`tensorrt_llm.functional.embedding`**</mark> function in TensorRT-LLM is used to perform an embedding lookup, which is a common operation in neural network models, particularly in natural language processing.&#x20;

This function maps discrete objects, such as words in text, to vectors of real numbers. Let's break down how this function works and explain its parameters:

### <mark style="color:blue;">Function Purpose</mark>

* **Embedding Lookup**: It performs the embedding lookup operation where the `input` tensor contains identifiers (like word indices), and the `weight` tensor is the embedding table where each row corresponds to an embedding vector.

### <mark style="color:blue;">Parameters</mark>

**input (Tensor)**:

* Contains the indices for which embeddings are to be fetched.
* For instance, in a language model, this could be a tensor of word indices.

**weight (Tensor)**:

* The embedding table where each row represents an embedding vector.
* Size is typically `[vocab_size, embedding_dim]` where `vocab_size` is the total number of unique items (e.g., words) and `embedding_dim` is the dimensionality of the embeddings.

**tp\_size (int)**:

* Indicates the number of GPUs used for distributed computing (tensor parallelism).
* If greater than 1, it implies the embedding operation is distributed across multiple GPUs.

**tp\_group (Optional\[List\[int]])**:

* The group of ranks (GPUs) participating in the operation, relevant in the case of distributed computing.

**sharding\_dim (int)**:

* Dictates how the embedding table is split among different GPUs.
* `sharding_dim = 0` means sharding by rows (vocab dimension).
* `sharding_dim = 1` means sharding by columns (embedding dimension).

**tp\_rank (int)**:

* The specific rank of the GPU in the tensor parallelism setup.
* Used to calculate the offset in the embedding table.

**workspace (Optional\[Tensor])**:

* Used for memory allocation required during the operation, especially in the distributed context.

**instance\_id (int)**:

* An identifier used for synchronization purposes in distributed setups.

### How Parameters are Chosen

* **Choosing `input` and `weight`**: Based on your model's architecture and the specific task (like word embeddings in an NLP task).
* **Distributed Settings (`tp_size`, `tp_group`, `tp_rank`)**:
  * Decided based on the computational resources (number of GPUs) and how you want to distribute the computation.
  * In a single GPU setup, `tp_size` would be 1.
* **`sharding_dim`**:
  * Based on whether you want to shard the embedding table by rows or columns across multiple GPUs. This is typically a design choice depending on the model architecture and memory constraints.
* **`workspace` and `instance_id`**:
  * These are more technical and are often determined by the system architecture and memory management requirements.

### Returns

* **Tensor**: The output tensor after performing the embedding lookup.

### Use Case

In a typical scenario, you would use this function to convert indices (like word indices) into their corresponding embedding vectors using a pre-trained or dynamically trained embedding table.&#x20;

This is crucial in models where you need to convert categorical data into a form that can be processed by neural networks.&#x20;

The distributed computing parameters come into play in large-scale models where the computation is spread across multiple GPUs.
