Skip to content

Deployment

Relevant Source Files

This page provides an overview of deployment options and considerations for the Templar decentralized training framework. It covers Docker-based and Ansible-based deployment approaches, environment configuration, and resource requirements for running miner and validator nodes.

For Docker-specific deployment details, see Docker Deployment. For Ansible-specific deployment instructions, see Ansible Deployment.

The Templar framework offers two primary deployment methods:

  1. Docker-based deployment - Containerized approach with Docker and Docker Compose
  2. Ansible-based deployment - Infrastructure-as-code approach for configuring hosts

The choice between these depends on your operational requirements, infrastructure management approach, and team preferences.

flowchart TD
    subgraph "Deployment Options"
        D["Docker Deployment"] --> DC["Docker Compose"]
        D --> DT["Docker Test Environment"]
        A["Ansible Deployment"] --> AP["Ansible Playbooks"]
        A --> AR["Ansible Roles"]
    end
    
    subgraph "Operational Components"
        N["Node Types"]
        N --> M["Miner"]
        N --> V["Validator"]
        RM["Resource Management"]
        RM --> GPU["GPU Assignment"]
        RM --> MEM["Memory Allocation"]
        EV["Environment Variables"]
        WT["Watchtower Updates"]
    end
    
    D --> N
    D --> RM
    D --> EV
    D --> WT
    A --> N
    A --> RM
    A --> EV

Sources: docker/compose.yml , docker/Dockerfile , ansible/playbook.yml , ansible/README.md

Docker is the most streamlined deployment method for Templar, using NVIDIA GPU-enabled containers.

flowchart TD
    subgraph "Docker Host"
        subgraph "Templar Node Container"
            E["Entrypoint.sh"] --> N["Node Process"]
            N --> M["Miner.py"] 
            N --> V["Validator.py"]
            ENV["Environment Variables"]
            VOL["Volumes"]
            VOL --> W["Wallets"]
            VOL --> L["Logs"]
        end
        
        subgraph "Watchtower Container"
            WT["Watchtower Service"]
            WT --> IM["Image Updates"]
            WT --> CR["Container Restart"]
        end
        
        GPUs["NVIDIA GPUs"]
        GPUs --> N
    end
    
    GH["GitHub Container Registry"] --> WT

Sources: docker/compose.yml , docker/Dockerfile , scripts/entrypoint.sh

The Templar Docker image is based on NVIDIA’s CUDA runtime image, with Python and essential dependencies installed:

  • Base image: nvidia/cuda:12.6.0-runtime-ubuntu22.04
  • Python with dependencies installed via uv
  • Entrypoint script for node startup

The official image is published to GitHub Container Registry as ghcr.io/tplr-ai/templar.

Sources: docker/Dockerfile , .github/workflows/docker.yml

The docker-compose.yml file defines the services required for running Templar nodes:

  1. node service - Configures the Templar node (miner or validator)
  2. watchtower service - Provides automatic updates of container images

Key configuration aspects include:

  • Volume mounts for wallet and log persistence
  • Environment variable configuration
  • GPU device assignment
  • Automatic updates via Watchtower

For testing environments, a docker-compose-test.yml is provided that configures a multi-node setup with miners and validators.

Sources: docker/compose.yml , docker/docker-compose-test.yml

Environment Variables for Docker Deployment

Section titled “Environment Variables for Docker Deployment”

Docker deployments require a number of environment variables to configure node behavior, access storage resources, and connect to the Bittensor network.

CategoryVariable NameDescriptionRequired
Node ConfigurationNODE_TYPEEither “miner” or “validator”Yes
WALLET_NAMEBittensor wallet nameYes
WALLET_HOTKEYBittensor wallet hotkeyYes
CUDA_DEVICECUDA device to use (e.g., “cuda:0”)Yes
NETWORKBittensor network (e.g., “finney”, “test”)Yes
NETUIDBittensor subnet UIDYes
DEBUGEnable debug mode (true/false)No
API KeysWANDB_API_KEYWeights & Biases API keyYes
R2 StorageR2_GRADIENTS_ACCOUNT_IDCloudflare R2 account IDYes
R2_GRADIENTS_BUCKET_NAMEBucket name for gradientsYes
R2_GRADIENTS_READ_ACCESS_KEY_IDRead access key IDYes
R2_GRADIENTS_READ_SECRET_ACCESS_KEYRead secret access keyYes
R2_GRADIENTS_WRITE_ACCESS_KEY_IDWrite access key IDYes
R2_GRADIENTS_WRITE_SECRET_ACCESS_KEYWrite secret access keyYes
R2_DATASET_*Similar set of variables for dataset bucketYes
R2_AGGREGATOR_*Similar set of variables for aggregator bucketYes
GitHub IntegrationGITHUB_USERGitHub username for WatchtowerNo
GITHUB_TOKENGitHub token for WatchtowerNo

Sources: docker/compose.yml , scripts/entrypoint.sh

Ansible provides a more infrastructure-focused approach to deployment, suitable for managing multiple machines or complex deployments.

flowchart TD
    subgraph "Control Machine"
        A["Ansible Playbook"]
        IT["Inventory"]
        VT["Vault"]
    end
    
    subgraph "Target Host"
        subgraph "Per GPU"
            TR["Templar Repository"]
            EV["Environment Configuration"]
            VE["Python venv"]
            SD["Systemd Service"]
            G["GPU Assignment"]
        end
        APT["System Packages"]
        PIP["Global Pip Packages"]
    end
    
    A --> IT
    A --> VT
    A --> APT
    A --> PIP
    A --> TR
    A --> EV
    A --> VE
    A --> SD
    EV --> G

Sources: ansible/playbook.yml , ansible/roles/templar/defaults/main.yml , ansible/README.md

The Ansible deployment requires:

  1. Inventory file - Defines target hosts and their GPU configurations
  2. Vault file - Securely stores environment variables and secrets
  3. Playbook - Orchestrates the deployment process

The Ansible setup supports multi-GPU deployments, creating separate instances for each GPU with dedicated directories and services.

Key files:

  • ansible/playbook.yml - Main playbook
  • ansible/roles/templar/* - Role definitions
  • group_vars/all/vault.yml - Encrypted variables

Sources: ansible/playbook.yml , ansible/roles/templar/defaults/main.yml , ansible/README.md , ansible/group_vars/all/vault.yml.example

For persistent operation, the Ansible deployment can configure systemd services to manage Templar processes:

[Unit]
Description=Templar Miner Service
After=network.target
[Service]
WorkingDirectory=/path/to/templar
EnvironmentFile=/path/to/templar/.env
ExecStart=/path/to/templar/.venv/bin/python neurons/miner.py [options]
Restart=always
RestartSec=5s

This ensures automatic restart on failure and proper startup sequence.

Sources: ansible/roles/templar/templates/miner.service.j2

Templar requires:

  • CUDA-capable NVIDIA GPU(s)
  • Sufficient RAM for model operations
  • Storage for checkpoints and logs
  • Network connectivity for Bittensor communication and R2 storage access

The docker-compose configuration allows assigning specific GPUs to containers using the device_ids property:

deploy:
resources:
reservations:
devices:
- driver: nvidia
device_ids: [ '0', '1', '2' ]
capabilities: [ gpu ]

Sources: docker/compose.yml , docker/docker-compose-test.yml

Templar includes a GitHub Actions workflow for building and publishing Docker images:

flowchart TD
    subgraph "GitHub Actions Workflow"
        T["Trigger"] --> C["Checkout Code"]
        C --> S["Setup Docker Buildx"]
        S --> L["Login to Registry"]
        L --> M["Extract Metadata"]
        M --> B["Build Docker Image"]
        B --> P["Push to Registry"]
    end
    
    subgraph "Triggers"
        REL["Release Published"]
        WD["Workflow Dispatch"]
    end
    
    subgraph "Tags Generated"
        SV["Semantic Version Tags"]
        LT["Latest Tag"]
        SHA["SHA Tags"]
    end
    
    REL --> T
    WD --> T
    M --> SV
    M --> LT
    M --> SHA

The workflow automatically builds images when releases are published and tags them appropriately based on semantic versioning.

Sources: .github/workflows/docker.yml

When a Templar container starts, the entrypoint script (scripts/entrypoint.sh) performs several initialization steps:

  1. Validates required environment variables
  2. Activates Python virtual environment
  3. Checks CUDA availability
  4. Logs in to Weights & Biases
  5. Starts the appropriate node type (miner or validator)

The script constructs the correct command-line arguments based on environment variables.

Sources: scripts/entrypoint.sh

Common deployment issues include:

  1. CUDA availability - Ensure NVIDIA drivers are installed and compatible with the container’s CUDA version
  2. Environment variables - Check that all required variables are set correctly
  3. GPU assignment - Verify that GPUs are correctly assigned to containers
  4. Network connectivity - Ensure access to Bittensor network and R2 storage

For Docker deployments, you can check logs with:

docker logs templar-{NODE_TYPE}-{WALLET_HOTKEY}

For Ansible deployments with systemd, check:

systemctl status templar-miner
journalctl -u templar-miner

Sources: scripts/entrypoint.sh , docker/compose.yml , ansible/roles/templar/templates/miner.service.j2