LLama3 configurations

After comparing the configuration of the LLaMA-2.7B model built using TensorRT-LLM and the LLaMA-3B model configuration files, I noticed a few differences and inconsistencies.

Here are my findings and recommendations:

Vocabulary Size

  • The LLaMA-2.7B model built with TensorRT-LLM has a vocabulary size of 32,000, while the LLaMA-3B model has a vocabulary size of 128,256.

  • Recommendation: Update the vocab_size parameter in the convert_checkpoint.py and build.py scripts to match the LLaMA-3B model's vocabulary size of 128,256.

Hidden Size and Intermediate Size

  • The LLaMA-2.7B model has a hidden size of 4,096 and an intermediate size of 11,008, whereas the LLaMA-3B model has a hidden size of 4,096 and an intermediate size of 14,336.

  • Recommendation: Adjust the hidden_size and intermediate_size parameters in the configuration files for the convert_checkpoint.py and build.py scripts to match the LLaMA-3B model's values.

Max Position Embeddings

  • The LLaMA-2.7B model has a maximum position embedding size of 4,096, while the LLaMA-3B model has a maximum position embedding size of 8,192.

  • Recommendation: Update the max_position_embeddings parameter in the configuration files to match the LLaMA-3B model's value of 8,192.

Number of Key-Value Heads

  • The LLaMA-2.7B model configuration specifies 32 key-value heads, while the LLaMA-3B model configuration specifies 8 key-value heads.

  • Recommendation: Modify the num_key_value_heads parameter in the configuration files to match the LLaMA-3B model's value of 8.

RoPE (Rotary Position Embedding) Parameters

  • The LLaMA-2.7B model configuration uses rotary_base with a value of 10,000.0, while the LLaMA-3B model configuration uses rope_theta with a value of 500,000.0.

  • Recommendation: Update the RoPE-related parameters in the configuration files to match the LLaMA-3B model's values. Replace rotary_base with rope_theta and set its value to 500,000.0.

Data Type

  • The LLaMA-2.7B model uses float16 as the data type, while the LLaMA-3B model uses bfloat16.

  • Recommendation: Consider updating the data type in the configuration files to match the LLaMA-3B model's data type of bfloat16. Modify the dtype parameter in the convert_checkpoint.py and build.py scripts accordingly.

Token IDs

  • The LLaMA-3B model configuration specifies bos_token_id as 128,000 and eos_token_id as 128,001, while the LLaMA-2.7B model configuration doesn't mention these token IDs.

  • Recommendation: Add the bos_token_id and eos_token_id parameters to the configuration files for the convert_checkpoint.py and build.py scripts, and set their values to match the LLaMA-3B model's values.

By making these adjustments to the configuration files for the convert_checkpoint.py and build.py scripts, you can align the LLaMA-2.7B model configuration with the LLaMA-3B model configuration.

This will ensure consistency and compatibility between the models when building and running them using TensorRT-LLM.

Please note that some of these changes may have an impact on the model's performance and resource requirements, so it's important to consider the available hardware resources and adjust the parameters accordingly.

Last updated