General Notes on Model Architecture

Dimensions

Dimensions refer to the size of the embedding space used to represent words or tokens in the model.

Typically, a larger dimensionality allows for more expressiveness in the representation, but also increases the complexity and computational cost of the model.

The size of the embedding dimension affects the number of parameters in the model, as well as its ability to capture the semantics of the input language. Typically, larger embedding dimensions require more computation and memory, but may lead to better model performance.

n Heads

The number of heads in a neural language model refers to the number of parallel self-attention mechanisms used in the model.

Each head attends to a different subset of the input and produces a separate output, which is then combined to produce the final output of the model. Increasing the number of heads can improve the quality of the attention mechanism, but also increases the computational cost of the model.

The number of heads in the model affects the model's ability to attend to different parts of the input language and can increase the model's ability to capture complex patterns. However, more heads require more computation, memory, and data to train effectively.

n Layers

The number of layers in a neural language model refers to the depth of the model, or the number of layers of computation that are applied to the input before producing the final output.

Increasing the number of layers can improve the model's ability to capture complex patterns in the data, but also increases the computational cost and the risk of overfitting to the training data.

The depth of the model affects its ability to capture long-range dependencies and patterns in the input language, but deeper models require more computation, memory, and data to train effectively.

n Tokens

The number of tokens refers to the size of the vocabulary used by the model.

This is typically the number of unique words or subword units in the training data. A larger vocabulary allows the model to handle more diverse language, but also increases the complexity and computational cost of the model.

The size of the vocabulary affects the complexity and computational cost of the model, as well as its ability to handle diverse language. Larger vocabularies require more computation and memory but may improve the model's ability to handle rare words and out-of-vocabulary (OOV) terms.

PreviousInput QKV tensor NextBest Practices for Tuning the Performance of TensorRT-LLM

Last updated 1 year ago

Was this helpful?