After the TensorRT-LLM image has been created, you can launch the Docker container with a single command:
I have downloaded the Llama2 model into the host TensorRT-LLM repository at the following location:
TensorRT-LLM/examples /llama
make-Cdockerrelease_run
This command builds a Docker image that contains everything you need to run TensorRT-LLM, simplifying the setup process and ensuring consistency across environments.
Makefile Arguments
Based on the provided Makefile, there are several ways you can configure the Dockerfile and customise the Docker container for your TensorRT-LLM setup. Here are some ideas:
Persistent Storage with Volumes
You can create volumes or mount host directories to store data that needs to persist across container restarts.
For example, you can add a volume for storing trained models, datasets, or configuration files.
To achieve this, you can modify the DOCKER_RUN_OPTS variable in the Makefile to include volume mounting options, such as:
Replace /path/on/host with the desired host directory and /path/in/container with the corresponding path inside the container.
Monitoring Container
You can create a separate Dockerfile that sets up a monitoring container to monitor the TensorRT-LLM container.
This monitoring container can include tools like Prometheus, Grafana, or custom monitoring scripts.
In the monitoring Dockerfile, you can specify the necessary dependencies and configurations for the monitoring tools.
You can then build and run the monitoring container alongside the TensorRT-LLM container using the DOCKER_RUN_OPTS and DOCKER_RUN_ARGS variables in the Makefile.
For example, you can add the following options to run the containers together:
The --link option allows the monitoring container to communicate with the TensorRT-LLM container using a specific alias.
Data Preprocessing Container
You can create a separate Dockerfile for a data preprocessing container that handles data preparation tasks before feeding the data into the TensorRT-LLM container.
This preprocessing container can include scripts or tools for data cleaning, normalization, augmentation, or feature extraction.
You can mount the necessary data volumes or directories to the preprocessing container using the DOCKER_RUN_OPTSvariable.
After preprocessing, you can mount the processed data to the TensorRT-LLM container for inference.
Multi-stage Builds
The provided Makefile already supports multi-stage builds using the STAGE variable.
You can extend this concept to create additional stages in the Dockerfile for different purposes, such as development, testing, or production.
Each stage can have its own set of dependencies, configurations, and optimizations.
You can use the STAGE variable in the Makefile to control which stage to build and run.
GPU Configuration
The Makefile includes the GPU_OPTS variable to specify GPU-related options for the Docker container.
You can customise this variable to allocate specific GPUs, set GPU memory limits, or enable GPU sharing among multiple containers.
For example, you can modify the GPU_OPTS variable like this:
GPU_OPTS?=--gpus='"device=0,1"'--gpu-memory=8192M
This allocates GPUs 0 and 1 to the container and sets a GPU memory limit of 8192 MB.
These are just a few examples of how you can configure the Dockerfile and customise the Docker container for your TensorRT-LLM setup.
You can explore additional options based on your specific requirements, such as networking, security, resource limits, or integrating with other tools and frameworks.
Remember to modify the Makefile and Dockerfile accordingly to incorporate these configurations and ensure they align with your project's needs.
After creation of the TensorRT-LLM-Engine
To create a TensorRT engine for an existing model, there are 3 steps:
Download pre-trained weights,
Build a fully-optimised engine of the model,
Deploy the engine, in other words, run the fully-optimised model.
The following sections show how to use TensorRT-LLM to run the LLama2-7b-chat model (huggingface weights)
Connect to Huggingface Hub
To download the LLama model we first have to connect to the Huggingface Hub:
We access the HuggingFace hub through a command line interface.
The huggingface_hub Python package comes with a built-in CLI called huggingface-cli.
This tool allows you to interact with the Hugging Face Hub directly from a terminal.
Reference: Details on the HuggingFace Command Line
The huggingface_hub Python package comes with a built-in CLI called huggingface-cli. This tool allows you to interact with the Hugging Face Hub directly from a terminal.
For example, you can login to your account, create a repository, upload and download files, etc. It also comes with handy features to configure your machine or manage your cache.
In this guide, we will have a look at the main features of the CLI and how to use them.
Core Functionalities
User Authentication: Allows login/logout and displays the current user information (login, logout, whoami).
Repository Management: Enables creation and interaction with repositories on Hugging Face (repo).
File Operations: Supports uploading, downloading files, and managing large files on the Hub (upload, download, lfs-enable-largefiles, lfs-multipart-upload).
Cache Management: Provides commands to scan and delete cached files (scan-cache, delete-cache).
Usage
The CLI supports various commands and options, which can be explored using the --help flag.
To interact with the Hub, such as downloading private repos or uploading files, users need to authenticate using a User Access Token.
The CLI also supports downloading specific files or entire repositories, filtering files with patterns, and specifying revisions or local directories for downloads.
First, install the CLIand its extra dependencies, including the [cli] extras, for an improved user experience:
pip3install-U"huggingface_hub[cli]"
Reference: Libraries installed with the Huggingface Hub
The command you executed, pip install -U "huggingface_hub[cli], installs several Python libraries and their dependencies related to the Hugging Face Hub. Here's an explanation of the libraries downloaded:
huggingface_hub
This is the main library that provides access to the Hugging Face Hub, allowing you to interact with models, datasets, and repositories.
It provides functionalities for uploading, downloading, and managing resources on the Hub.
fsspec
fsspec is a Python library for managing filesystem-like abstractions. It is often used for handling remote and cloud-based file systems.
In the context of the Hugging Face Hub, it likely helps in managing the storage and retrieval of model and dataset files.
tqdm
tqdm is a popular library for adding progress bars to loops and other iterables.
It's used in the CLI to display progress when uploading or downloading large files from the Hugging Face Hub.
prompt-toolkit
prompt-toolkit is a library for building command-line interfaces (CLIs) with interactive features.
It's used by the Hugging Face CLI for handling interactive prompts and user interactions.
pfzy
pfzy is a Python library for fuzzy string matching and searching.
It may be used in the CLI for fuzzy matching of commands or resources.
InquirerPy
InquirerPy is a Python library for creating interactive command-line prompts with customizable menus and questions.
It enhances the user experience when using the Hugging Face CLI by providing interactive prompts.
wcwidth
wcwidth is a library for determining the printable width of characters when rendering text in a terminal.
It helps ensure proper formatting and alignment of text in the CLI.
Verify that the CLI is correctly set up by running the following command:
huggingface-cli--help
You should see a list of available options and commands. If you encounter an error like "command not found: huggingface-cli," please refer to the installation guide for troubleshooting.
Now try this command:
huggingface-cliwhoami
When logged into the HuggingFace Hub this command prints your username and the organisations you are a part of on the Hub.
At this stage, we have not yet logged into the HuggingFace Hub, so the response will be:
Notloggedin
Reference: Huggingface CLI Commands
Use the --help option to get detailed information about a specific command. For example, to learn more about how to upload files using the CLI, run:
huggingface-cliupload--help
Examples
Login to Your Hugging Face Account
Use the following command to log in with your token obtained from huggingface.co/settings/tokens:
huggingface-clilogin
Upload Files to a Repository
To upload a file or folder to a repository on the Hub, use the upload command. Replace <repository_name> with the name of your repository and <path_to_file_or_folder> with the path to the file or folder you want to upload:
You can interact with your repositories using the repo command. For example, to create a new repository, use:
huggingface-clirepocreate<repository_name>
Environment Information
To view information about your environment, use the env command:
huggingface-clienv
These tutorials will help you get started with the Hugging Face CLI for managing models, datasets, and repositories on the Hugging Face Hub.
We will now be connecting to the Huggingface Hub to allow models and datasets to be downloaded to the Axolotl directory.
To connect to the Huggingface Hub prepare your machine to allow storing of the Huggingface token:
gitconfig--globalcredential.helperstore
This command tells Git to use a simple storage method to save your credentials, which can be a security risk as credentials are stored in plain text - but is fine during development.
Explanation of Git Credentials
A "git credential" refers to the authentication details used by Git to access repositories,especially those requiring user verification, such as private repositories or when pushing changes to remote repositories.
These credentials can be usernames and passwords, personal access tokens, SSH keys, or other forms of identity verification.
Types of Git Credentials
Username and Password: The simplest form, but not recommended for security reasons, especially for repositories over HTTPS.
Personal Access Tokens (PATs): More secure than passwords, these tokens are used especially when two-factor authentication (2FA) is enabled. GitHub, for instance, requires PATs for authenticating over HTTPS.
SSH Keys: Secure and commonly used, SSH keys pair a private key (kept on your machine) with a public key (added to your Git server). They are a popular choice for authentication.
Storage of Git Credentials
Credential Cache:Temporarily stores credentials in memory for a short period. This is more secure than storing them on disk but requires re-entry after the cache timeout.
Credential Store: Saves credentials in plain text in a file on your computer. It's convenient but less secure since the file can be read by anyone with access to your system.
SSH Agent:For SSH keys, an SSH agent can store your private keys and handle authentication.
System Keychain: Some Git clients can store credentials in the system's keychain or credential manager, offering a balance of convenience and security.
Environment Variables:Sometimes used for automation, credentials can be set as environment variables, but this method has significant security downsides.
Best Practices
Prefer Token/Key-based Authentication: Use personal access tokens or SSH keys over passwords for better security.
Keep Software Updated: Ensure your Git client and any related credential management tools are up-to-date to benefit from security patches.
Use SSH Keys Wisely:Protect your SSH private keys with strong passphrases and use ssh-agent for managing them.
Limit Token Scopes and Lifetimes: When creating personal access tokens, grant only the necessary permissions and set a reasonable expiration.
Securely Store Sensitive Information:Avoid storing credentials in plaintext files. Use system keychains or encrypted storage whenever possible.
Be Cautious with Environment Variables: Be mindful of security risks when using environment variables for credentials, especially in shared or public environments.
Regularly Review and Rotate Credentials: Regularly review your access tokens and SSH keys, revoking and replacing them as necessary.
Use Two-Factor Authentication (2FA):Wherever possible, enable 2FA for your Git hosting service accounts for an additional layer of security.
Enter the command to login to the Huggingface Hub
huggingface-clilogin
You will be asked to enter your Huggingface token:
Continum's Huggingface token is:
jhfhgfhgfhhf (not a real token)
You will then be asked whether you want to add the Huggingface token to be added as a git credential. Answer yes (y) to this question. The output should be as below:
Add token as git credential? (Y/n) Y
Token is valid (permission: read).
Your token has been saved in your configured git credential helpers (store).
Your token has been saved to /home/jack/.cache/huggingface/token
Login successful
This command tells Git to use a simple storage method to save your credentials, which can be a security risk as credentials are stored in plain text - but is fine during development.
Now check to ensure you are logged in:
huggingface-cliwhoami
Now that we have logged into the HuggingFace hub the output should be:
thannonorgs:ContinuumLabs
You can clearly see your name, and the name of the organisation.
In the Llama folder
Inside the Docker container, move to the LLama folder
cd/app/tensorrt_llm/examples/llama
Once in the directory, you have to install the requirements:
pipinstall-rrequirements.txt
Now install git lfs for large file storage
gitlfsinstall
Download the model weights from HuggingFace
From the LLama example folder, you must download the weights of the model.
cd stands for "change directory". This command changes the current working directory of the shell (or command line interface) to examples/llama.
examples/llama specifies the target directory relative to the current directory. This means you're moving into the llama directory, which is a subdirectory of examples.
rm -rf ./llama-2-7b-chat-hf
rm is the command used to remove files or directories.
-r or -R option tells rm to recursively delete directories and their contents. It's required when deleting directories that contain files or other directories.
-f stands for "force", instructing rm to ignore nonexistent files and directories and never prompt for confirmation. This makes the command non-interactive.
./llama-2-7b-chat-hf specifies the path of the directory to be deleted, relative to the current directory. The ./ is often optional but signifies that the path is relative.
This command deletes the llama-2-7b-chat-hf directory and all of its contents without asking for confirmation, ensuring that it doesn't exist before proceeding with the next steps.
mkdir -p ./llama-2-7b-chat-hf
mkdir is used to create a directory.
-p enables the creation of parent directories as necessary. If the llama-2-7b-chat-hf directory already exists, mkdir won't return an error. This option also allows for the creation of nested directories in one command.
./llama-2-7b-chat-hf is the path of the directory to be created, relative to the current directory.
This command ensures that the llama-2-7b-chat-hf directory exists and is empty by first deleting it (if it exists) and then recreating it.
git clone is a Git command used to clone a repository into a new directory.
https://huggingface.co/meta-llama/Llama-2-7b-chat-hf is the URL of the Git repository to be cloned. This particular URL points to a repository on Hugging Face's model hosting platform, indicating that the repository likely contains a model or related files.
./llama-2-7b-chat-hf specifies the local directory into which the repository should be cloned. This path is relative to the current working directory.
By executing this command, you're cloning the contents of the Llama-2-7b-chat-hf repository from Hugging Face into the newly created llama-2-7b-chat-hf directory in your local filesystem.
This setup is typical for preparing a working environment with specific code or data dependencies, ensuring that the working directory is clean and contains only the desired repository's content.
./llama-2-7b-chat-hf: This specifies the directory on your local filesystem where you want the cloned repository to be saved. The ./ indicates that the directory will be created in the current directory where the command is being executed.
The folder will be named llama-2-7b-chat-hf.
If this directory is not specified, Git uses the last part of the repository URL (in this case, it would default to Llama-2-7b-chat-hf) as the directory name.
Specifying the directory when cloning a Git repository, such as ./llama-2-7b-chat-hf in your example, serves several practical purposes and provides more control over the organisation of files on your local machine.
Custom Naming: Users might prefer a different name for the local directory than the default name (which is the repository name).
Clear Intent: Specifying the directory explicitly makes scripts or command sequences clearer to others who may read the code, indicating exactly where the repository will be located on the filesystem.
Multiple Versions: If someone needs to work with multiple branches or forks of the same repository and wants to keep them as separate projects on the filesystem, they can clone them into differently named directories.
Existing Directories: If a directory with the default name already exists and contains data, specifying a different directory name can avoid unintentionally mixing or overwriting files.
Automated Setups: In automated scripts that set up environments (like provisioning scripts for virtual machines or containers), specifying the directory ensures that each component is placed exactly where needed without manual intervention.
Predictability: When automating, having predictable paths based on explicit directory names can make other automated tasks (like backups, deployments, or integrations) simpler and less prone to errors due to unexpected directory structures.
Working in a Current Context
Relative Paths: Using ./ to indicate the current directory emphasizes that the action is intended to interact closely with the current working context, possibly integrating with or complementing other components or repositories within the same directory.