Here's the updated configuration file for convert_checkpoint.py based on the LLaMA-3B model:
model:model_dir:./llama-3b-hfoutput_dir:../llama-3b-hf-outputdtype:bfloat16# Choices: float32, bfloat16, float16# Suggestion:# - Use bfloat16 for a balance between performance and accuracy, as used in the LLaMA-3B modelcheckpoint:tp_size:1# Tensor parallelism sizepp_size:1# Pipeline parallelism size# Suggestions:# - Increase tp_size and pp_size for distributed training across multiple GPUs# - Keep tp_size and pp_size as 1 for single GPU trainingvocab_size:128256# Suggestion:# - Update vocab_size to match the LLaMA-3B model's vocabulary sizen_positions:8192# Suggestion:# - Update n_positions to match the LLaMA-3B model's max position embeddingsn_layer:32# Suggestions:# - Adjust n_layer based on the desired model depth# - Keep n_layer as 32 to match the LLaMA-3B model's configurationn_head:32# Suggestions:# - Adjust n_head based on the desired number of attention heads# - Keep n_head as 32 to match the LLaMA-3B model's configurationn_embd:4096# Suggestions:# - Adjust n_embd based on the desired hidden size# - Keep n_embd as 4096 to match the LLaMA-3B model's configurationinter_size:14336# Suggestion:# - Update inter_size to match the LLaMA-3B model's intermediate size# Additional checkpoint argumentsmeta_ckpt_dir:null# ./path/to/meta/checkpointn_kv_head:8# Suggestion:# - Update n_kv_head to match the LLaMA-3B model's number of key-value headsrms_norm_eps:1e-5# Suggestion:# - Update rms_norm_eps to match the LLaMA-3B model's configurationuse_weight_only:falsedisable_weight_only_quant_plugin:falseweight_only_precision:int8# Choices: int8, int4, int4_gptqsmoothquant:null# 0.5per_channel:falseper_token:falseint8_kv_cache:falseammo_quant_ckpt_path:null# ./path/to/ammo/quant/checkpointper_group:falseload_by_shard:falsehidden_act:silurope_theta:500000.0# Suggestion:# - Update rotary_base to rope_theta and set its value to 500000.0 to match the LLaMA-3B model's configurationgroup_size:128dataset_cache_dir:null# ./path/to/dataset/cacheload_model_on_cpu:falseuse_parallel_embedding:falseembedding_sharding_dim:0# Choices: 0, 1use_embedding_sharing:falseworkers:1moe_num_experts:0moe_top_k:0moe_tp_mode:0moe_renorm_mode:1save_config_only:false# Additional configurations to match LLaMA-3Bbos_token_id:128000eos_token_id:128001tie_word_embeddings:falseuse_cache:truetorch_dtype:bfloat16
I have made the following changes and additions to align the configuration with the LLaMA-3B model:
Updated vocab_size to 128256.
Updated n_positions to 8192.
Updated inter_size to 14336.
Updated n_kv_head to 8.
Updated rms_norm_eps to 1e-5.
Replaced rotary_base with rope_theta and set its value to 500000.0.
Added bos_token_id and set it to 128000.
Added eos_token_id and set it to 128001.
Added tie_word_embeddings and set it to false.
Added use_cache and set it to true.
Added torch_dtype and set it to bfloat16.
Please note that the convert_checkpoint.py script looks good and covers all the necessary configurations. The updated configuration file should work well with the convert_checkpoint.py script to convert the LLaMA-3B model checkpoint.