LLama3 configurations
After comparing the configuration of the LLaMA-2.7B model built using TensorRT-LLM and the LLaMA-3B model configuration files, I noticed a few differences and inconsistencies.
Here are my findings and recommendations:
Vocabulary Size
The LLaMA-2.7B model built with TensorRT-LLM has a vocabulary size of 32,000, while the LLaMA-3B model has a vocabulary size of 128,256.
Recommendation: Update the
vocab_size
parameter in theconvert_checkpoint.py
andbuild.py
scripts to match the LLaMA-3B model's vocabulary size of 128,256.
Hidden Size and Intermediate Size
The LLaMA-2.7B model has a hidden size of 4,096 and an intermediate size of 11,008, whereas the LLaMA-3B model has a hidden size of 4,096 and an intermediate size of 14,336.
Recommendation: Adjust the
hidden_size
andintermediate_size
parameters in the configuration files for theconvert_checkpoint.py
andbuild.py
scripts to match the LLaMA-3B model's values.
Max Position Embeddings
The LLaMA-2.7B model has a maximum position embedding size of 4,096, while the LLaMA-3B model has a maximum position embedding size of 8,192.
Recommendation: Update the
max_position_embeddings
parameter in the configuration files to match the LLaMA-3B model's value of 8,192.
Number of Key-Value Heads
The LLaMA-2.7B model configuration specifies 32 key-value heads, while the LLaMA-3B model configuration specifies 8 key-value heads.
Recommendation: Modify the
num_key_value_heads
parameter in the configuration files to match the LLaMA-3B model's value of 8.
RoPE (Rotary Position Embedding) Parameters
The LLaMA-2.7B model configuration uses
rotary_base
with a value of 10,000.0, while the LLaMA-3B model configuration usesrope_theta
with a value of 500,000.0.Recommendation: Update the RoPE-related parameters in the configuration files to match the LLaMA-3B model's values. Replace
rotary_base
withrope_theta
and set its value to 500,000.0.
Data Type
The LLaMA-2.7B model uses float16 as the data type, while the LLaMA-3B model uses bfloat16.
Recommendation: Consider updating the data type in the configuration files to match the LLaMA-3B model's data type of bfloat16. Modify the
dtype
parameter in theconvert_checkpoint.py
andbuild.py
scripts accordingly.
Token IDs
The LLaMA-3B model configuration specifies
bos_token_id
as 128,000 andeos_token_id
as 128,001, while the LLaMA-2.7B model configuration doesn't mention these token IDs.Recommendation: Add the
bos_token_id
andeos_token_id
parameters to the configuration files for theconvert_checkpoint.py
andbuild.py
scripts, and set their values to match the LLaMA-3B model's values.
By making these adjustments to the configuration files for the convert_checkpoint.py
and build.py
scripts, you can align the LLaMA-2.7B model configuration with the LLaMA-3B model configuration.
This will ensure consistency and compatibility between the models when building and running them using TensorRT-LLM.
Please note that some of these changes may have an impact on the model's performance and resource requirements, so it's important to consider the available hardware resources and adjust the parameters accordingly.
Last updated