LLama2 installation
Launching the TensorRT-LLM Container
After the TensorRT-LLM image has been created, you can launch the Docker container with a single command:
I have downloaded the Llama2 model into the host TensorRT-LLM repository at the following location:
TensorRT-LLM/examples /llama

make -C docker release_run
This command builds a Docker image that contains everything you need to run TensorRT-LLM, simplifying the setup process and ensuring consistency across environments.
After creation of the TensorRT-LLM-Engine
To create a TensorRT engine for an existing model, there are 3 steps:
Download pre-trained weights,
Build a fully-optimised engine of the model,
Deploy the engine, in other words, run the fully-optimised model.
The following sections show how to use TensorRT-LLM to run the LLama2-7b-chat model (huggingface weights)
Connect to Huggingface Hub
To download the LLama model we first have to connect to the Huggingface Hub:
We access the HuggingFace hub through a command line interface.
The huggingface_hub
Python package comes with a built-in CLI called huggingface-cli
.
This tool allows you to interact with the Hugging Face Hub directly from a terminal.
First, install the CLI and its extra dependencies, including the [cli] extras, for an improved user experience:
pip3 install -U "huggingface_hub[cli]"
Verify that the CLI is correctly set up by running the following command:
huggingface-cli --help
You should see a list of available options and commands. If you encounter an error like "command not found: huggingface-cli," please refer to the installation guide for troubleshooting.
Now try this command:
huggingface-cli whoami
When logged into the HuggingFace Hub this command prints your username and the organisations you are a part of on the Hub.
At this stage, we have not yet logged into the HuggingFace Hub, so the response will be:
Not logged in
To connect to the Huggingface Hub prepare your machine to allow storing of the Huggingface token:
git config --global credential.helper store
This command tells Git to use a simple storage method to save your credentials, which can be a security risk as credentials are stored in plain text - but is fine during development.
Enter the command to login to the Huggingface Hub
huggingface-cli login
You will be asked to enter your Huggingface token:
Continum's Huggingface token is:
jhfhgfhgfhhf (not a real token)
You will then be asked whether you want to add the Huggingface token to be added as a git credential. Answer yes (y) to this question. The output should be as below:
Add token as git credential? (Y/n) Y
Token is valid (permission: read).
Your token has been saved in your configured git credential helpers (store).
Your token has been saved to /home/jack/.cache/huggingface/token
Login successful
This command tells Git to use a simple storage method to save your credentials, which can be a security risk as credentials are stored in plain text - but is fine during development.
Now check to ensure you are logged in:
huggingface-cli whoami
Now that we have logged into the HuggingFace hub the output should be:
thannon
orgs: ContinuumLabs
You can clearly see your name, and the name of the organisation.
In the Llama folder
Inside the Docker container, move to the LLama folder
cd /app/tensorrt_llm/examples/llama
Once in the directory, you have to install the requirements:
pip install -r requirements.txt
Now install git lfs for large file storage
git lfs install
Download the model weights from HuggingFace
From the LLama example folder, you must download the weights of the model.
You should be in the LLama directory
app/tensorrt_llm/examples/llama
Do this:
rm -rf ./llama-2-7b-chat-hf
Do this:
mkdir -p ./llama-2-7b-chat-hf
Download the model
git clone https://huggingface.co/meta-llama/Llama-2-7b-chat-hf ./llama-2-7b-chat-hf
./llama-2-7b-chat-hf
: This specifies the directory on your local filesystem where you want the cloned repository to be saved. The ./
indicates that the directory will be created in the current directory where the command is being executed.
The folder will be named llama-2-7b-chat-hf
.
If this directory is not specified, Git uses the last part of the repository URL (in this case, it would default to Llama-2-7b-chat-hf
) as the directory name.
Specifying the directory when cloning a Git repository, such as ./llama-2-7b-chat-hf
in your example, serves several practical purposes and provides more control over the organisation of files on your local machine.
Custom Naming: Users might prefer a different name for the local directory than the default name (which is the repository name).
Clear Intent: Specifying the directory explicitly makes scripts or command sequences clearer to others who may read the code, indicating exactly where the repository will be located on the filesystem.
Multiple Versions: If someone needs to work with multiple branches or forks of the same repository and wants to keep them as separate projects on the filesystem, they can clone them into differently named directories.
Existing Directories: If a directory with the default name already exists and contains data, specifying a different directory name can avoid unintentionally mixing or overwriting files.
Automated Setups: In automated scripts that set up environments (like provisioning scripts for virtual machines or containers), specifying the directory ensures that each component is placed exactly where needed without manual intervention.
Predictability: When automating, having predictable paths based on explicit directory names can make other automated tasks (like backups, deployments, or integrations) simpler and less prone to errors due to unexpected directory structures.
Working in a Current Context
Relative Paths: Using ./
to indicate the current directory emphasizes that the action is intended to interact closely with the current working context, possibly integrating with or complementing other components or repositories within the same directory.
Last updated
Was this helpful?