Miners

Relevant Source Files

README.md docs/miner.md docs/validator.md ecosystem.config.js hparams.json neurons/miner.py neurons/validator.py src/tplr/__init__.py src/tplr/comms.py

This document provides a detailed explanation of miners in the Templar decentralized training framework. Miners are computational nodes responsible for training language models on assigned data and sharing their gradients with the network. For information about how validators evaluate miners’ contributions, see Validators.

Miner Architecture and Positioning

Miners are a core component of Templar’s distributed training infrastructure. They work alongside validators and the aggregation server to collaboratively train large language models across distributed nodes.

Miner Position in Templar Architecture

graph TD
    subgraph "Bittensor Network"
        BT["bt.subtensor"]
        MG["metagraph"]
    end
    
    subgraph "Miner Node"
        ML["LlamaForCausalLM Model"]
        OP["SGD Optimizer"]
        TR["TransformDCT"]
        CP["CompressDCT"]
        CO["Comms Module"]
        SC["LR Scheduler"]
        WL["Wallet"]
    end
    
    subgraph "External Components"
        R2["R2 Storage"]
        DS["Dataset Loader"]
        CK["Checkpoints"]
        PE["Peer Miners"]
        VL["Validators"]
    end
    
    BT <--"Block events"--> CO
    WL --"Authentication"--> BT
    ML --"Forward/Backward pass"--> OP
    OP --"Gradients"--> TR
    TR --"Encoded gradients"--> CP
    CP --"Compressed gradients"--> CO
    CO --"Upload gradients"--> R2
    R2 --"Peer gradients"--> CO
    DS --"Training data"--> ML
    CO --"Load/Save model state"--> CK
    SC --"Update learning rate"--> OP
    R2 --"Evaluate gradients"--> VL
    VL --"Set weights"--> BT
    BT --"Rewards"--> WL

Sources: neurons/miner.py:65-226

Miner Implementation

The miner implementation is structured around the Miner class, which coordinates training, gradient processing, and communication.

Key Components

Model: LlamaForCausalLM - The foundational language model being trained
Optimizer: SGD with momentum for updating model parameters
Transformers: TransformDCT and CompressDCT for gradient compression
Communications: Comms module for gradient sharing via R2 storage
Scheduler: Learning rate scheduler combining warm-up and cosine annealing

Sources: neurons/miner.py:107-226

Miner Lifecycle

The following sequence diagram illustrates the complete lifecycle of a miner during operation:

sequenceDiagram
    participant M as Miner
    participant R2 as R2 Storage
    participant P as Peers
    participant B as Blockchain
    participant AS as Aggregation Server
    
    Note over M: Initialization
    M->>B: Register with metagraph
    M->>AS: Get start_window
    
    Note over M: Synchronization
    M->>R2: Load latest checkpoint
    alt checkpoint found
        R2->>M: Return model, optimizer, momentum
    else no checkpoint
        Note over M: Initialize model from scratch
    end
    M->>AS: Catch up with aggregator (if needed)
    
    loop For each window
        B->>M: Block listener detects new window
        M->>B: Update peers (tplr.neurons.update_peers)
        M->>R2: Load dataset pages (R2DatasetLoader.next_pages)
        
        Note over M: Training
        loop For each batch
            Note over M: Forward pass (compute loss)
            Note over M: Backward pass (compute gradients)
        end
        
        Note over M: Gradient Processing
        Note over M: Apply momentum update
        Note over M: Compress gradients with DCT and top-k
        M->>R2: Upload compressed gradients (comms.put)
        
        Note over M: Peer Gradient Exchange
        M->>R2: Request peer gradients (comms.gather)
        R2->>M: Return compressed peer gradients
        
        Note over M: Model Update
        Note over M: Decompress peer gradients
        Note over M: Apply aggregated gradients
        Note over M: Step optimizer and scheduler
        
        alt global_step % checkpoint_frequency == 0
            M->>R2: Save checkpoint
        end
        
        Note over M: Metrics & Logging
        M->>WandB: Log metrics
        M->>InfluxDB: Log system metrics
    end

Sources: neurons/miner.py:229-755

Training Process

The miner’s primary responsibility is to train the language model on assigned data and share the resulting gradients. Here’s how the training process works:

Dataset Assignment

For each window, miners receive specific dataset pages:

pages = await tplr.r2_dataset.R2DatasetLoader.next_pages(
    offset=step_window * self.hparams.pages_per_window,
    n_pages=self.hparams.pages_per_window,
    seed=self.uid,  # Each miner gets unique data based on UID
)

The data assignment is deterministic - miners with the same UID will always receive the same pages for a given window number.

Sources: neurons/miner.py:339-355

Gradient Computation and Accumulation

Miners process batches of data, compute loss, and accumulate gradients:

for i, batch in enumerate(loader):
    input_ids = torch.tensor(batch, dtype=torch.long).to(self.model.device)
    tokens_this_batch = input_ids.numel()
    window_tokens += tokens_this_batch
    labels = input_ids.clone()
    labels = torch.where(
        labels == self.tokenizer.pad_token_id, -100, labels
    )

    with autocast(device_type=self.model.device.type, dtype=torch.bfloat16):
        outputs = self.model(input_ids=input_ids, labels=labels)

    total_loss += outputs.loss.item()
    outputs.loss.backward()
    n_batches += 1

Sources: neurons/miner.py:360-384

After computing gradients, miners process and share them with the network:

Gradient Processing Flow

flowchart TD
    GR["Raw Gradients"] --> MU["momentum = γ*momentum + η*gradient"]
    MU --> EN["transformer.encode() - DCT Transform"]
    EN --> CP["compressor.compress() - Top-K Selection"]
    CP --> UP["comms.put() - Upload to R2"]
    UP --> PR["comms.gather() - Get Peer Gradients"]
    PR --> DC["compressor.batch_decompress() - Reconstruct"]
    DC --> AG["transformer.decode() - Inverse DCT"]
    AG --> MO["p.grad.copy_(new_grad) - Apply Gradients"]
    MO --> SG["p.grad.sign_() - Use Only Direction"]
    SG --> OP["optimizer.step() - Update Model"]
    OP --> SC["scheduler.step() - Update LR"]

Sources: neurons/miner.py:399-402 , neurons/miner.py:560-601

Compression Techniques

Miners use two key techniques to compress gradients efficiently:

DCT Transformation: Converts gradients to frequency domain using Discrete Cosine Transform
Top-K Selection: Only keeps the K most significant coefficients, drastically reducing data size

This compression is essential for efficient sharing over the internet, allowing miners to exchange gradient information without prohibitive bandwidth requirements.

Sources: neurons/miner.py:131-147

Communication System

The communication system enables miners to interact with R2 storage, validators, and other miners:

Gradient Exchange via R2

# Upload own gradients
put_completion_time = await self.comms.put(
    state_dict=processed_state_dict,
    uid=str(self.uid),
    window=step_window,
    key="gradient",
    global_step=self.global_step,
    local=False,
    stale_retention=100,
)

# Gather gradients from peers
gather_result = await self.comms.gather(
    my_uid=self.uid,
    uids=self.comms.peers,
    window=step_window,
    key="gradient",
    timeout=35,
    device="cpu",
    local=False,
    stale_retention=100,
    totalks=self.totalks,
    time_min=time_min,
    time_max=time_max,
)

Sources: neurons/miner.py:417-427 , neurons/miner.py:489-501

Model Synchronization

Miners synchronize their model with the network using checkpoints:

Initial Synchronization: When starting, miners load the latest checkpoint
Catch-up Procedure: If behind, miners catch up with the aggregation server
Periodic Checkpoints: Save model state every checkpoint_frequency windows

if self.global_step % self.hparams.checkpoint_frequency == 0:
    asyncio.create_task(
        self.comms.save_checkpoint(
            model=self.model,
            optimizer=self.optimizer,
            scheduler=self.scheduler,
            momentum=self.momentum,
            global_step=self.global_step,
            current_window=self.current_window,
            start_window=self.start_window,
        )
    )

Sources: neurons/miner.py:732-747

Configuration Parameters

Miners are configured through both command-line parameters and hyperparameter settings:

Command-Line Parameters

Parameter	Description	Default
`--netuid`	Bittensor network UID	268
`--device`	Computing device	”cuda”
`--debug`	Enable debug logging	False
`--trace`	Enable trace logging	False
`--test`	Test mode (use all peers)	False
`--local`	Use toy model for local testing	False

Sources: neurons/miner.py:67-106

Hyperparameters

Key hyperparameters from hparams.json:

Parameter	Value	Description
`sequence_length`	2048	Maximum sequence length for training
`pages_per_window`	6	Number of data pages per window
`batch_size`	6	Batch size for training
`learning_rate`	4e-4	Initial learning rate
`blocks_per_window`	7	Number of blockchain blocks per window
`momentum_decay`	0.999	Decay rate for momentum
`topk_compression`	32	Top-K value for gradient compression
`target_chunk`	64	Chunk size for DCT transform
`checkpoint_frequency`	100	Windows between checkpoint saves

Sources: hparams.json:1-53

Performance Monitoring

Miners track various metrics to monitor performance:

self.wandb.log(
    {
        # Training metrics
        "miner/loss": total_loss / n_batches if n_batches > 0 else 0,
        "miner/tokens_per_sec": tokens_per_sec,
        "miner/batch_duration": duration,
        "miner/total_tokens": self.total_tokens_processed,
        "miner/batch_tokens": window_tokens,
        "miner/global_step": self.global_step,
        # Resource metrics
        "miner/gpu_memory_allocated": torch.cuda.memory_allocated() / 1024**2,
        "miner/gpu_memory_cached": torch.cuda.memory_reserved() / 1024**2,
        # Network metrics
        "miner/gather_peers": len(self.comms.peers),
        "miner/effective_batch_size": len(self.comms.peers) * self.hparams.batch_size,
        # Optimization metrics
        "miner/learning_rate": self.scheduler.get_last_lr()[0],
        # Gradient statistics
        "miner/mean_grad_norm": sum(grad_norms) / len(grad_norms) if grad_norms else 0,
        "miner/max_grad_norm": max(grad_norms) if grad_norms else 0,
        "miner/min_grad_norm": min(grad_norms) if grad_norms else 0,
        "miner/grad_norm_std": torch.tensor(grad_norms).std().item() if grad_norms else 0,
        "miner/mean_weight_norm": sum(weight_norms) / len(weight_norms),
        "miner/mean_momentum_norm": sum(momentum_norms) / len(momentum_norms),
    },
    step=self.global_step,
)

Sources: neurons/miner.py:518-552

Hardware Requirements

To run a miner effectively, you need:

GPU: NVIDIA H100 with 80GB VRAM recommended
Storage: 100GB+ for model and data
Network: Stable internet connection with good bandwidth

Sources: docs/miner.md:369-373

Integration with Validators

Miners work in tandem with validators, who:

Gather and evaluate miners’ gradients
Compute scores based on improvement in loss
Set weights on the blockchain
Determine reward distribution

For more details on validators and the evaluation process, see Validators and Weight Setting.

Sources: neurons/validator.py:489-516

Running a Miner

For detailed setup and running instructions, refer to the documentation in docs/miner.md . This includes:

Installing dependencies
Setting up R2 bucket credentials
Configuring Bittensor wallet
Running via Docker or directly with Python
Monitoring performance and troubleshooting

Sources: docs/miner.md:32-302