LLama3 configurations
After comparing the configuration of the LLaMA-2.7B model built using TensorRT-LLM and the LLaMA-3B model configuration files, I noticed a few differences and inconsistencies.
Here are my findings and recommendations:
Vocabulary Size
The LLaMA-2.7B model built with TensorRT-LLM has a vocabulary size of 32,000, while the LLaMA-3B model has a vocabulary size of 128,256.
Recommendation: Update the
vocab_sizeparameter in theconvert_checkpoint.pyandbuild.pyscripts to match the LLaMA-3B model's vocabulary size of 128,256.
Hidden Size and Intermediate Size
The LLaMA-2.7B model has a hidden size of 4,096 and an intermediate size of 11,008, whereas the LLaMA-3B model has a hidden size of 4,096 and an intermediate size of 14,336.
Recommendation: Adjust the
hidden_sizeandintermediate_sizeparameters in the configuration files for theconvert_checkpoint.pyandbuild.pyscripts to match the LLaMA-3B model's values.
Max Position Embeddings
The LLaMA-2.7B model has a maximum position embedding size of 4,096, while the LLaMA-3B model has a maximum position embedding size of 8,192.
Recommendation: Update the
max_position_embeddingsparameter in the configuration files to match the LLaMA-3B model's value of 8,192.
Number of Key-Value Heads
The LLaMA-2.7B model configuration specifies 32 key-value heads, while the LLaMA-3B model configuration specifies 8 key-value heads.
Recommendation: Modify the
num_key_value_headsparameter in the configuration files to match the LLaMA-3B model's value of 8.
RoPE (Rotary Position Embedding) Parameters
The LLaMA-2.7B model configuration uses
rotary_basewith a value of 10,000.0, while the LLaMA-3B model configuration usesrope_thetawith a value of 500,000.0.Recommendation: Update the RoPE-related parameters in the configuration files to match the LLaMA-3B model's values. Replace
rotary_basewithrope_thetaand set its value to 500,000.0.
Data Type
The LLaMA-2.7B model uses float16 as the data type, while the LLaMA-3B model uses bfloat16.
Recommendation: Consider updating the data type in the configuration files to match the LLaMA-3B model's data type of bfloat16. Modify the
dtypeparameter in theconvert_checkpoint.pyandbuild.pyscripts accordingly.
Token IDs
The LLaMA-3B model configuration specifies
bos_token_idas 128,000 andeos_token_idas 128,001, while the LLaMA-2.7B model configuration doesn't mention these token IDs.Recommendation: Add the
bos_token_idandeos_token_idparameters to the configuration files for theconvert_checkpoint.pyandbuild.pyscripts, and set their values to match the LLaMA-3B model's values.
By making these adjustments to the configuration files for the convert_checkpoint.py and build.py scripts, you can align the LLaMA-2.7B model configuration with the LLaMA-3B model configuration.
This will ensure consistency and compatibility between the models when building and running them using TensorRT-LLM.
Please note that some of these changes may have an impact on the model's performance and resource requirements, so it's important to consider the available hardware resources and adjust the parameters accordingly.
Last updated
Was this helpful?

