LLama2 installation
Launching the TensorRT-LLM Container
After the TensorRT-LLM image has been created, you can launch the Docker container with a single command:
I have downloaded the Llama2 model into the host TensorRT-LLM repository at the following location:
TensorRT-LLM/examples /llama

This command builds a Docker image that contains everything you need to run TensorRT-LLM, simplifying the setup process and ensuring consistency across environments.
Makefile Arguments
Based on the provided Makefile, there are several ways you can configure the Dockerfile and customise the Docker container for your TensorRT-LLM setup. Here are some ideas:
Persistent Storage with Volumes
You can create volumes or mount host directories to store data that needs to persist across container restarts.
For example, you can add a volume for storing trained models, datasets, or configuration files.
To achieve this, you can modify the
DOCKER_RUN_OPTSvariable in the Makefile to include volume mounting options, such as:
Replace
/path/on/hostwith the desired host directory and/path/in/containerwith the corresponding path inside the container.
Monitoring Container
You can create a separate Dockerfile that sets up a monitoring container to monitor the TensorRT-LLM container.
This monitoring container can include tools like Prometheus, Grafana, or custom monitoring scripts.
In the monitoring Dockerfile, you can specify the necessary dependencies and configurations for the monitoring tools.
You can then build and run the monitoring container alongside the TensorRT-LLM container using the
DOCKER_RUN_OPTSandDOCKER_RUN_ARGSvariables in the Makefile.For example, you can add the following options to run the containers together:
The
--linkoption allows the monitoring container to communicate with the TensorRT-LLM container using a specific alias.
Data Preprocessing Container
You can create a separate Dockerfile for a data preprocessing container that handles data preparation tasks before feeding the data into the TensorRT-LLM container.
This preprocessing container can include scripts or tools for data cleaning, normalization, augmentation, or feature extraction.
You can mount the necessary data volumes or directories to the preprocessing container using the
DOCKER_RUN_OPTSvariable.After preprocessing, you can mount the processed data to the TensorRT-LLM container for inference.
Multi-stage Builds
The provided Makefile already supports multi-stage builds using the
STAGEvariable.You can extend this concept to create additional stages in the Dockerfile for different purposes, such as development, testing, or production.
Each stage can have its own set of dependencies, configurations, and optimizations.
You can use the
STAGEvariable in the Makefile to control which stage to build and run.
GPU Configuration
The Makefile includes the
GPU_OPTSvariable to specify GPU-related options for the Docker container.You can customise this variable to allocate specific GPUs, set GPU memory limits, or enable GPU sharing among multiple containers.
For example, you can modify the
GPU_OPTSvariable like this:
This allocates GPUs 0 and 1 to the container and sets a GPU memory limit of 8192 MB.
These are just a few examples of how you can configure the Dockerfile and customise the Docker container for your TensorRT-LLM setup.
You can explore additional options based on your specific requirements, such as networking, security, resource limits, or integrating with other tools and frameworks.
Remember to modify the Makefile and Dockerfile accordingly to incorporate these configurations and ensure they align with your project's needs.
After creation of the TensorRT-LLM-Engine
To create a TensorRT engine for an existing model, there are 3 steps:
Download pre-trained weights,
Build a fully-optimised engine of the model,
Deploy the engine, in other words, run the fully-optimised model.
The following sections show how to use TensorRT-LLM to run the LLama2-7b-chat model (huggingface weights)
Connect to Huggingface Hub
To download the LLama model we first have to connect to the Huggingface Hub:
We access the HuggingFace hub through a command line interface.
The huggingface_hub Python package comes with a built-in CLI called huggingface-cli.
This tool allows you to interact with the Hugging Face Hub directly from a terminal.
Reference: Details on the HuggingFace Command Line
The huggingface_hub Python package comes with a built-in CLI called huggingface-cli. This tool allows you to interact with the Hugging Face Hub directly from a terminal.
For example, you can login to your account, create a repository, upload and download files, etc. It also comes with handy features to configure your machine or manage your cache.
In this guide, we will have a look at the main features of the CLI and how to use them.
Core Functionalities
User Authentication: Allows login/logout and displays the current user information (
login,logout,whoami).Repository Management: Enables creation and interaction with repositories on Hugging Face (
repo).File Operations: Supports uploading, downloading files, and managing large files on the Hub (
upload,download,lfs-enable-largefiles,lfs-multipart-upload).Cache Management: Provides commands to scan and delete cached files (
scan-cache,delete-cache).
Usage
The CLI supports various commands and options, which can be explored using the
--helpflag.To interact with the Hub, such as downloading private repos or uploading files, users need to authenticate using a User Access Token.
The CLI also supports downloading specific files or entire repositories, filtering files with patterns, and specifying revisions or local directories for downloads.
First, install the CLI and its extra dependencies, including the [cli] extras, for an improved user experience:
Reference: Libraries installed with the Huggingface Hub
The command you executed, pip install -U "huggingface_hub[cli], installs several Python libraries and their dependencies related to the Hugging Face Hub. Here's an explanation of the libraries downloaded:
huggingface_hub
This is the main library that provides access to the Hugging Face Hub, allowing you to interact with models, datasets, and repositories.
It provides functionalities for uploading, downloading, and managing resources on the Hub.
fsspec
fsspec is a Python library for managing filesystem-like abstractions. It is often used for handling remote and cloud-based file systems.
In the context of the Hugging Face Hub, it likely helps in managing the storage and retrieval of model and dataset files.
tqdm
tqdm is a popular library for adding progress bars to loops and other iterables.
It's used in the CLI to display progress when uploading or downloading large files from the Hugging Face Hub.
prompt-toolkit
prompt-toolkit is a library for building command-line interfaces (CLIs) with interactive features.
It's used by the Hugging Face CLI for handling interactive prompts and user interactions.
pfzy
pfzy is a Python library for fuzzy string matching and searching.
It may be used in the CLI for fuzzy matching of commands or resources.
InquirerPy
InquirerPy is a Python library for creating interactive command-line prompts with customizable menus and questions.
It enhances the user experience when using the Hugging Face CLI by providing interactive prompts.
wcwidth
wcwidth is a library for determining the printable width of characters when rendering text in a terminal.
It helps ensure proper formatting and alignment of text in the CLI.
Verify that the CLI is correctly set up by running the following command:
You should see a list of available options and commands. If you encounter an error like "command not found: huggingface-cli," please refer to the installation guide for troubleshooting.
Now try this command:
When logged into the HuggingFace Hub this command prints your username and the organisations you are a part of on the Hub.
At this stage, we have not yet logged into the HuggingFace Hub, so the response will be:
Reference: Huggingface CLI Commands
Use the --help option to get detailed information about a specific command. For example, to learn more about how to upload files using the CLI, run:
Examples
Login to Your Hugging Face Account
Use the following command to log in with your token obtained from huggingface.co/settings/tokens:
Upload Files to a Repository
To upload a file or folder to a repository on the Hub, use the
uploadcommand. Replace<repository_name>with the name of your repository and<path_to_file_or_folder>with the path to the file or folder you want to upload:
Download Files from the Hub
Download files from the Hub using the
downloadcommand. Specify the file or folder you want to download and the destination path:
Managing Repositories
You can interact with your repositories using the
repocommand. For example, to create a new repository, use:
Environment Information
To view information about your environment, use the
envcommand:
These tutorials will help you get started with the Hugging Face CLI for managing models, datasets, and repositories on the Hugging Face Hub.
We will now be connecting to the Huggingface Hub to allow models and datasets to be downloaded to the Axolotl directory.
To connect to the Huggingface Hub prepare your machine to allow storing of the Huggingface token:
This command tells Git to use a simple storage method to save your credentials, which can be a security risk as credentials are stored in plain text - but is fine during development.
Explanation of Git Credentials
A "git credential" refers to the authentication details used by Git to access repositories, especially those requiring user verification, such as private repositories or when pushing changes to remote repositories.
These credentials can be usernames and passwords, personal access tokens, SSH keys, or other forms of identity verification.
Types of Git Credentials
Username and Password: The simplest form, but not recommended for security reasons, especially for repositories over HTTPS.
Personal Access Tokens (PATs): More secure than passwords, these tokens are used especially when two-factor authentication (2FA) is enabled. GitHub, for instance, requires PATs for authenticating over HTTPS.
SSH Keys: Secure and commonly used, SSH keys pair a private key (kept on your machine) with a public key (added to your Git server). They are a popular choice for authentication.
Storage of Git Credentials
Credential Cache: Temporarily stores credentials in memory for a short period. This is more secure than storing them on disk but requires re-entry after the cache timeout.
Credential Store: Saves credentials in plain text in a file on your computer. It's convenient but less secure since the file can be read by anyone with access to your system.
SSH Agent: For SSH keys, an SSH agent can store your private keys and handle authentication.
System Keychain: Some Git clients can store credentials in the system's keychain or credential manager, offering a balance of convenience and security.
Environment Variables: Sometimes used for automation, credentials can be set as environment variables, but this method has significant security downsides.
Best Practices
Prefer Token/Key-based Authentication: Use personal access tokens or SSH keys over passwords for better security.
Keep Software Updated: Ensure your Git client and any related credential management tools are up-to-date to benefit from security patches.
Use SSH Keys Wisely: Protect your SSH private keys with strong passphrases and use ssh-agent for managing them.
Limit Token Scopes and Lifetimes: When creating personal access tokens, grant only the necessary permissions and set a reasonable expiration.
Securely Store Sensitive Information: Avoid storing credentials in plaintext files. Use system keychains or encrypted storage whenever possible.
Be Cautious with Environment Variables: Be mindful of security risks when using environment variables for credentials, especially in shared or public environments.
Regularly Review and Rotate Credentials: Regularly review your access tokens and SSH keys, revoking and replacing them as necessary.
Use Two-Factor Authentication (2FA): Wherever possible, enable 2FA for your Git hosting service accounts for an additional layer of security.
Enter the command to login to the Huggingface Hub
You will be asked to enter your Huggingface token:
Continum's Huggingface token is:
You will then be asked whether you want to add the Huggingface token to be added as a git credential. Answer yes (y) to this question. The output should be as below:
This command tells Git to use a simple storage method to save your credentials, which can be a security risk as credentials are stored in plain text - but is fine during development.
Now check to ensure you are logged in:
Now that we have logged into the HuggingFace hub the output should be:
You can clearly see your name, and the name of the organisation.
In the Llama folder
Inside the Docker container, move to the LLama folder
Once in the directory, you have to install the requirements:
Now install git lfs for large file storage
Download the model weights from HuggingFace
From the LLama example folder, you must download the weights of the model.
You should be in the LLama directory
Do this:
Do this:
Download the model
Explanation of file creation and model download
cd examples/llama
cd examples/llamacdstands for "change directory". This command changes the current working directory of the shell (or command line interface) toexamples/llama.examples/llamaspecifies the target directory relative to the current directory. This means you're moving into thellamadirectory, which is a subdirectory ofexamples.
rm -rf ./llama-2-7b-chat-hf
rm -rf ./llama-2-7b-chat-hfrmis the command used to remove files or directories.-ror-Roption tellsrmto recursively delete directories and their contents. It's required when deleting directories that contain files or other directories.-fstands for "force", instructingrmto ignore nonexistent files and directories and never prompt for confirmation. This makes the command non-interactive../llama-2-7b-chat-hfspecifies the path of the directory to be deleted, relative to the current directory. The./is often optional but signifies that the path is relative.
This command deletes the llama-2-7b-chat-hf directory and all of its contents without asking for confirmation, ensuring that it doesn't exist before proceeding with the next steps.
mkdir -p ./llama-2-7b-chat-hf
mkdir -p ./llama-2-7b-chat-hfmkdiris used to create a directory.-penables the creation of parent directories as necessary. If thellama-2-7b-chat-hfdirectory already exists,mkdirwon't return an error. This option also allows for the creation of nested directories in one command../llama-2-7b-chat-hfis the path of the directory to be created, relative to the current directory.
This command ensures that the llama-2-7b-chat-hf directory exists and is empty by first deleting it (if it exists) and then recreating it.
git clone https://huggingface.co/meta-llama/Llama-2-7b-chat-hf ./llama-2-7b-chat-hf
git clone https://huggingface.co/meta-llama/Llama-2-7b-chat-hf ./llama-2-7b-chat-hfgit cloneis a Git command used to clone a repository into a new directory.https://huggingface.co/meta-llama/Llama-2-7b-chat-hfis the URL of the Git repository to be cloned. This particular URL points to a repository on Hugging Face's model hosting platform, indicating that the repository likely contains a model or related files../llama-2-7b-chat-hfspecifies the local directory into which the repository should be cloned. This path is relative to the current working directory.
By executing this command, you're cloning the contents of the Llama-2-7b-chat-hf repository from Hugging Face into the newly created llama-2-7b-chat-hf directory in your local filesystem.
This setup is typical for preparing a working environment with specific code or data dependencies, ensuring that the working directory is clean and contains only the desired repository's content.
./llama-2-7b-chat-hf: This specifies the directory on your local filesystem where you want the cloned repository to be saved. The ./ indicates that the directory will be created in the current directory where the command is being executed.
The folder will be named llama-2-7b-chat-hf.
If this directory is not specified, Git uses the last part of the repository URL (in this case, it would default to Llama-2-7b-chat-hf) as the directory name.
Specifying the directory when cloning a Git repository, such as ./llama-2-7b-chat-hf in your example, serves several practical purposes and provides more control over the organisation of files on your local machine.
Custom Naming: Users might prefer a different name for the local directory than the default name (which is the repository name).
Clear Intent: Specifying the directory explicitly makes scripts or command sequences clearer to others who may read the code, indicating exactly where the repository will be located on the filesystem.
Multiple Versions: If someone needs to work with multiple branches or forks of the same repository and wants to keep them as separate projects on the filesystem, they can clone them into differently named directories.
Existing Directories: If a directory with the default name already exists and contains data, specifying a different directory name can avoid unintentionally mixing or overwriting files.
Automated Setups: In automated scripts that set up environments (like provisioning scripts for virtual machines or containers), specifying the directory ensures that each component is placed exactly where needed without manual intervention.
Predictability: When automating, having predictable paths based on explicit directory names can make other automated tasks (like backups, deployments, or integrations) simpler and less prone to errors due to unexpected directory structures.
Working in a Current Context
Relative Paths: Using ./ to indicate the current directory emphasizes that the action is intended to interact closely with the current working context, possibly integrating with or complementing other components or repositories within the same directory.
Last updated
Was this helpful?


