# Checkpoint List - Arguments

Here's a table summarising the arguments that you can parse in the <mark style="color:yellow;">**`convert_checkpoint.py`**</mark> script, along with their default values:

<table><thead><tr><th width="253" align="center">Argument</th><th width="134" align="center">Default Value</th><th>Description</th></tr></thead><tbody><tr><td align="center">--model_dir</td><td align="center">None</td><td>Path to the Hugging Face model directory</td></tr><tr><td align="center">--meta_ckpt_dir</td><td align="center">None</td><td>Path to the meta checkpoint directory</td></tr><tr><td align="center">--tp_size</td><td align="center">1</td><td>N-way tensor parallelism size</td></tr><tr><td align="center">--pp_size</td><td align="center">1</td><td>N-way pipeline parallelism size</td></tr><tr><td align="center">--dtype</td><td align="center">'float16'</td><td>Data type ('float32', 'bfloat16', 'float16')</td></tr><tr><td align="center">--vocab_size</td><td align="center">32000</td><td>Vocabulary size</td></tr><tr><td align="center">--n_positions</td><td align="center">2048</td><td>Number of positions</td></tr><tr><td align="center">--n_layer</td><td align="center">32</td><td>Number of layers</td></tr><tr><td align="center">--n_head</td><td align="center">32</td><td>Number of attention heads</td></tr><tr><td align="center">--n_kv_head</td><td align="center">None</td><td>Number of key-value heads (defaults to n_head if not specified)</td></tr><tr><td align="center">--n_embd</td><td align="center">4096</td><td>Hidden size</td></tr><tr><td align="center">--inter_size</td><td align="center">11008</td><td>Intermediate size</td></tr><tr><td align="center">--rms_norm_eps</td><td align="center">1e-06</td><td>RMS normalization epsilon</td></tr><tr><td align="center">--use_weight_only</td><td align="center">False</td><td>Quantize weights for the various GEMMs to INT4/INT8</td></tr><tr><td align="center">--disable_weight_only_quant_plugin</td><td align="center">False</td><td>Use OOTB implementation instead of plugin for weight quantization</td></tr><tr><td align="center">--weight_only_precision</td><td align="center">'int8'</td><td>Precision for weight-only quantization ('int8', 'int4', 'int4_gptq')</td></tr><tr><td align="center">--smoothquant</td><td align="center">None</td><td>Set the α parameter for Smoothquant quantization (float value)</td></tr><tr><td align="center">--per_channel</td><td align="center">False</td><td>Use per-channel static scaling factor for GEMM's result</td></tr><tr><td align="center">--per_token</td><td align="center">False</td><td>Use per-token dynamic scaling factor for activations</td></tr><tr><td align="center">--int8_kv_cache</td><td align="center">False</td><td>Use INT8 quantization for KV cache</td></tr><tr><td align="center">--ammo_quant_ckpt_path</td><td align="center">None</td><td>Path to a quantized model checkpoint in .npz format</td></tr><tr><td align="center">--per_group</td><td align="center">False</td><td>Use per-group dynamic scaling factor for weights in INT4 range (for GPTQ/AWQ quantization)</td></tr><tr><td align="center">--load_by_shard</td><td align="center">False</td><td>Load a pretrained model shard-by-shard</td></tr><tr><td align="center">--hidden_act</td><td align="center">'silu'</td><td>Hidden activation function</td></tr><tr><td align="center">--rotary_base</td><td align="center">10,000</td><td>Rotary base value</td></tr><tr><td align="center">--group_size</td><td align="center">128</td><td>Group size used in GPTQ quantization</td></tr><tr><td align="center">--dataset-cache-dir</td><td align="center">None</td><td>Cache directory to load the Hugging Face dataset</td></tr><tr><td align="center">--load_model_on_cpu</td><td align="center">False</td><td>Load the model on CPU</td></tr><tr><td align="center">--use_parallel_embedding</td><td align="center">False</td><td>Enable embedding parallelism</td></tr><tr><td align="center">--embedding_sharding_dim</td><td align="center">0</td><td>Dimension for sharding the embedding lookup table (0: vocab dimension, 1: hidden dimension)</td></tr><tr><td align="center">--use_embedding_sharing</td><td align="center">False</td><td>Try to reduce the engine size by sharing the embedding lookup table between two layers</td></tr><tr><td align="center">--output_dir</td><td align="center">'tllm_checkpoint'</td><td>Path to save the TensorRT-LLM checkpoint</td></tr><tr><td align="center">--workers</td><td align="center">1</td><td>Number of workers for converting checkpoint in parallel</td></tr><tr><td align="center">--moe_num_experts</td><td align="center">0</td><td>Number of experts to use for MOE layers</td></tr><tr><td align="center">--moe_top_k</td><td align="center">0</td><td>Top_k value to use for MOE layers (defaults to 1 if --moe_num_experts is set)</td></tr><tr><td align="center">--moe_tp_mode</td><td align="center">MoeConfig.ParallelismMode.TENSOR_PARALLEL</td><td>Controls how to distribute experts in TP (check layers/moe.py for accepted values)</td></tr><tr><td align="center">--moe_renorm_mode</td><td align="center">MoeConfig.ExpertScaleNormalizationMode.RENORM</td><td>Controls renormalization after gate logits (check layers/moe.py for accepted values)</td></tr><tr><td align="center">--save_config_only</td><td align="center">False</td><td>Only save the model config without reading and converting weights (for debugging)</td></tr></tbody></table>

These arguments allow you to customise the behavior of the <mark style="color:yellow;">**`convert_checkpoint.py`**</mark> script according to your specific requirements. You can provide the desired values for these arguments when running the script.
