LLama2 installation
Launching the TensorRT-LLM Container
After the TensorRT-LLM image has been created, you can launch the Docker container with a single command:
I have downloaded the Llama2 model into the host TensorRT-LLM repository at the following location:
TensorRT-LLM/examples /llama

make -C docker release_runThis command builds a Docker image that contains everything you need to run TensorRT-LLM, simplifying the setup process and ensuring consistency across environments.
After creation of the TensorRT-LLM-Engine
To create a TensorRT engine for an existing model, there are 3 steps:
Download pre-trained weights,
Build a fully-optimised engine of the model,
Deploy the engine, in other words, run the fully-optimised model.
The following sections show how to use TensorRT-LLM to run the LLama2-7b-chat model (huggingface weights)
Connect to Huggingface Hub
To download the LLama model we first have to connect to the Huggingface Hub:
We access the HuggingFace hub through a command line interface.
The huggingface_hub Python package comes with a built-in CLI called huggingface-cli.
This tool allows you to interact with the Hugging Face Hub directly from a terminal.
First, install the CLI and its extra dependencies, including the [cli] extras, for an improved user experience:
pip3 install -U "huggingface_hub[cli]"Verify that the CLI is correctly set up by running the following command:
huggingface-cli --helpYou should see a list of available options and commands. If you encounter an error like "command not found: huggingface-cli," please refer to the installation guide for troubleshooting.
Now try this command:
huggingface-cli whoami When logged into the HuggingFace Hub this command prints your username and the organisations you are a part of on the Hub.
At this stage, we have not yet logged into the HuggingFace Hub, so the response will be:
Not logged inTo connect to the Huggingface Hub prepare your machine to allow storing of the Huggingface token:
git config --global credential.helper storeThis command tells Git to use a simple storage method to save your credentials, which can be a security risk as credentials are stored in plain text - but is fine during development.
Enter the command to login to the Huggingface Hub
huggingface-cli loginYou will be asked to enter your Huggingface token:
Continum's Huggingface token is:
jhfhgfhgfhhf (not a real token)You will then be asked whether you want to add the Huggingface token to be added as a git credential. Answer yes (y) to this question. The output should be as below:
Add token as git credential? (Y/n) Y
Token is valid (permission: read).
Your token has been saved in your configured git credential helpers (store).
Your token has been saved to /home/jack/.cache/huggingface/token
Login successfulThis command tells Git to use a simple storage method to save your credentials, which can be a security risk as credentials are stored in plain text - but is fine during development.
Now check to ensure you are logged in:
huggingface-cli whoami Now that we have logged into the HuggingFace hub the output should be:
thannon
orgs: ContinuumLabsYou can clearly see your name, and the name of the organisation.
In the Llama folder
Inside the Docker container, move to the LLama folder
cd /app/tensorrt_llm/examples/llamaOnce in the directory, you have to install the requirements:
pip install -r requirements.txtNow install git lfs for large file storage
git lfs installDownload the model weights from HuggingFace
From the LLama example folder, you must download the weights of the model.
You should be in the LLama directory
app/tensorrt_llm/examples/llama Do this:
rm -rf ./llama-2-7b-chat-hfDo this:
mkdir -p ./llama-2-7b-chat-hfDownload the model
git clone https://huggingface.co/meta-llama/Llama-2-7b-chat-hf ./llama-2-7b-chat-hf ./llama-2-7b-chat-hf: This specifies the directory on your local filesystem where you want the cloned repository to be saved. The ./ indicates that the directory will be created in the current directory where the command is being executed.
The folder will be named llama-2-7b-chat-hf.
If this directory is not specified, Git uses the last part of the repository URL (in this case, it would default to Llama-2-7b-chat-hf) as the directory name.
Specifying the directory when cloning a Git repository, such as ./llama-2-7b-chat-hf in your example, serves several practical purposes and provides more control over the organisation of files on your local machine.
Custom Naming: Users might prefer a different name for the local directory than the default name (which is the repository name).
Clear Intent: Specifying the directory explicitly makes scripts or command sequences clearer to others who may read the code, indicating exactly where the repository will be located on the filesystem.
Multiple Versions: If someone needs to work with multiple branches or forks of the same repository and wants to keep them as separate projects on the filesystem, they can clone them into differently named directories.
Existing Directories: If a directory with the default name already exists and contains data, specifying a different directory name can avoid unintentionally mixing or overwriting files.
Automated Setups: In automated scripts that set up environments (like provisioning scripts for virtual machines or containers), specifying the directory ensures that each component is placed exactly where needed without manual intervention.
Predictability: When automating, having predictable paths based on explicit directory names can make other automated tasks (like backups, deployments, or integrations) simpler and less prone to errors due to unexpected directory structures.
Working in a Current Context
Relative Paths: Using ./ to indicate the current directory emphasizes that the action is intended to interact closely with the current working context, possibly integrating with or complementing other components or repositories within the same directory.
Last updated
Was this helpful?


