Skip to content

Miners

Relevant Source Files

This document provides a detailed explanation of miners in the Templar decentralized training framework. Miners are computational nodes responsible for training language models on assigned data and sharing their gradients with the network. For information about how validators evaluate miners’ contributions, see Validators.

Miners are a core component of Templar’s distributed training infrastructure. They work alongside validators and the aggregation server to collaboratively train large language models across distributed nodes.

graph TD
    subgraph "Bittensor Network"
        BT["bt.subtensor"]
        MG["metagraph"]
    end
    
    subgraph "Miner Node"
        ML["LlamaForCausalLM Model"]
        OP["SGD Optimizer"]
        TR["TransformDCT"]
        CP["CompressDCT"]
        CO["Comms Module"]
        SC["LR Scheduler"]
        WL["Wallet"]
    end
    
    subgraph "External Components"
        R2["R2 Storage"]
        DS["Dataset Loader"]
        CK["Checkpoints"]
        PE["Peer Miners"]
        VL["Validators"]
    end
    
    BT <--"Block events"--> CO
    WL --"Authentication"--> BT
    ML --"Forward/Backward pass"--> OP
    OP --"Gradients"--> TR
    TR --"Encoded gradients"--> CP
    CP --"Compressed gradients"--> CO
    CO --"Upload gradients"--> R2
    R2 --"Peer gradients"--> CO
    DS --"Training data"--> ML
    CO --"Load/Save model state"--> CK
    SC --"Update learning rate"--> OP
    R2 --"Evaluate gradients"--> VL
    VL --"Set weights"--> BT
    BT --"Rewards"--> WL

Sources: neurons/miner.py:65-226

The miner implementation is structured around the Miner class, which coordinates training, gradient processing, and communication.

  1. Model: LlamaForCausalLM - The foundational language model being trained
  2. Optimizer: SGD with momentum for updating model parameters
  3. Transformers: TransformDCT and CompressDCT for gradient compression
  4. Communications: Comms module for gradient sharing via R2 storage
  5. Scheduler: Learning rate scheduler combining warm-up and cosine annealing

Sources: neurons/miner.py:107-226

The following sequence diagram illustrates the complete lifecycle of a miner during operation:

sequenceDiagram
    participant M as Miner
    participant R2 as R2 Storage
    participant P as Peers
    participant B as Blockchain
    participant AS as Aggregation Server
    
    Note over M: Initialization
    M->>B: Register with metagraph
    M->>AS: Get start_window
    
    Note over M: Synchronization
    M->>R2: Load latest checkpoint
    alt checkpoint found
        R2->>M: Return model, optimizer, momentum
    else no checkpoint
        Note over M: Initialize model from scratch
    end
    M->>AS: Catch up with aggregator (if needed)
    
    loop For each window
        B->>M: Block listener detects new window
        M->>B: Update peers (tplr.neurons.update_peers)
        M->>R2: Load dataset pages (R2DatasetLoader.next_pages)
        
        Note over M: Training
        loop For each batch
            Note over M: Forward pass (compute loss)
            Note over M: Backward pass (compute gradients)
        end
        
        Note over M: Gradient Processing
        Note over M: Apply momentum update
        Note over M: Compress gradients with DCT and top-k
        M->>R2: Upload compressed gradients (comms.put)
        
        Note over M: Peer Gradient Exchange
        M->>R2: Request peer gradients (comms.gather)
        R2->>M: Return compressed peer gradients
        
        Note over M: Model Update
        Note over M: Decompress peer gradients
        Note over M: Apply aggregated gradients
        Note over M: Step optimizer and scheduler
        
        alt global_step % checkpoint_frequency == 0
            M->>R2: Save checkpoint
        end
        
        Note over M: Metrics & Logging
        M->>WandB: Log metrics
        M->>InfluxDB: Log system metrics
    end

Sources: neurons/miner.py:229-755

The miner’s primary responsibility is to train the language model on assigned data and share the resulting gradients. Here’s how the training process works:

For each window, miners receive specific dataset pages:

pages = await tplr.r2_dataset.R2DatasetLoader.next_pages(
offset=step_window * self.hparams.pages_per_window,
n_pages=self.hparams.pages_per_window,
seed=self.uid, # Each miner gets unique data based on UID
)

The data assignment is deterministic - miners with the same UID will always receive the same pages for a given window number.

Sources: neurons/miner.py:339-355

Miners process batches of data, compute loss, and accumulate gradients:

for i, batch in enumerate(loader):
input_ids = torch.tensor(batch, dtype=torch.long).to(self.model.device)
tokens_this_batch = input_ids.numel()
window_tokens += tokens_this_batch
labels = input_ids.clone()
labels = torch.where(
labels == self.tokenizer.pad_token_id, -100, labels
)
with autocast(device_type=self.model.device.type, dtype=torch.bfloat16):
outputs = self.model(input_ids=input_ids, labels=labels)
total_loss += outputs.loss.item()
outputs.loss.backward()
n_batches += 1

Sources: neurons/miner.py:360-384

After computing gradients, miners process and share them with the network:

flowchart TD
    GR["Raw Gradients"] --> MU["momentum = γ*momentum + η*gradient"]
    MU --> EN["transformer.encode() - DCT Transform"]
    EN --> CP["compressor.compress() - Top-K Selection"]
    CP --> UP["comms.put() - Upload to R2"]
    UP --> PR["comms.gather() - Get Peer Gradients"]
    PR --> DC["compressor.batch_decompress() - Reconstruct"]
    DC --> AG["transformer.decode() - Inverse DCT"]
    AG --> MO["p.grad.copy_(new_grad) - Apply Gradients"]
    MO --> SG["p.grad.sign_() - Use Only Direction"]
    SG --> OP["optimizer.step() - Update Model"]
    OP --> SC["scheduler.step() - Update LR"]

Sources: neurons/miner.py:399-402 , neurons/miner.py:560-601

Miners use two key techniques to compress gradients efficiently:

  1. DCT Transformation: Converts gradients to frequency domain using Discrete Cosine Transform
  2. Top-K Selection: Only keeps the K most significant coefficients, drastically reducing data size

This compression is essential for efficient sharing over the internet, allowing miners to exchange gradient information without prohibitive bandwidth requirements.

Sources: neurons/miner.py:131-147

The communication system enables miners to interact with R2 storage, validators, and other miners:

# Upload own gradients
put_completion_time = await self.comms.put(
state_dict=processed_state_dict,
uid=str(self.uid),
window=step_window,
key="gradient",
global_step=self.global_step,
local=False,
stale_retention=100,
)
# Gather gradients from peers
gather_result = await self.comms.gather(
my_uid=self.uid,
uids=self.comms.peers,
window=step_window,
key="gradient",
timeout=35,
device="cpu",
local=False,
stale_retention=100,
totalks=self.totalks,
time_min=time_min,
time_max=time_max,
)

Sources: neurons/miner.py:417-427 , neurons/miner.py:489-501

Miners synchronize their model with the network using checkpoints:

  1. Initial Synchronization: When starting, miners load the latest checkpoint
  2. Catch-up Procedure: If behind, miners catch up with the aggregation server
  3. Periodic Checkpoints: Save model state every checkpoint_frequency windows
if self.global_step % self.hparams.checkpoint_frequency == 0:
asyncio.create_task(
self.comms.save_checkpoint(
model=self.model,
optimizer=self.optimizer,
scheduler=self.scheduler,
momentum=self.momentum,
global_step=self.global_step,
current_window=self.current_window,
start_window=self.start_window,
)
)

Sources: neurons/miner.py:732-747

Miners are configured through both command-line parameters and hyperparameter settings:

ParameterDescriptionDefault
--netuidBittensor network UID268
--deviceComputing device”cuda”
--debugEnable debug loggingFalse
--traceEnable trace loggingFalse
--testTest mode (use all peers)False
--localUse toy model for local testingFalse

Sources: neurons/miner.py:67-106

Key hyperparameters from hparams.json:

ParameterValueDescription
sequence_length2048Maximum sequence length for training
pages_per_window6Number of data pages per window
batch_size6Batch size for training
learning_rate4e-4Initial learning rate
blocks_per_window7Number of blockchain blocks per window
momentum_decay0.999Decay rate for momentum
topk_compression32Top-K value for gradient compression
target_chunk64Chunk size for DCT transform
checkpoint_frequency100Windows between checkpoint saves

Sources: hparams.json:1-53

Miners track various metrics to monitor performance:

self.wandb.log(
{
# Training metrics
"miner/loss": total_loss / n_batches if n_batches > 0 else 0,
"miner/tokens_per_sec": tokens_per_sec,
"miner/batch_duration": duration,
"miner/total_tokens": self.total_tokens_processed,
"miner/batch_tokens": window_tokens,
"miner/global_step": self.global_step,
# Resource metrics
"miner/gpu_memory_allocated": torch.cuda.memory_allocated() / 1024**2,
"miner/gpu_memory_cached": torch.cuda.memory_reserved() / 1024**2,
# Network metrics
"miner/gather_peers": len(self.comms.peers),
"miner/effective_batch_size": len(self.comms.peers) * self.hparams.batch_size,
# Optimization metrics
"miner/learning_rate": self.scheduler.get_last_lr()[0],
# Gradient statistics
"miner/mean_grad_norm": sum(grad_norms) / len(grad_norms) if grad_norms else 0,
"miner/max_grad_norm": max(grad_norms) if grad_norms else 0,
"miner/min_grad_norm": min(grad_norms) if grad_norms else 0,
"miner/grad_norm_std": torch.tensor(grad_norms).std().item() if grad_norms else 0,
"miner/mean_weight_norm": sum(weight_norms) / len(weight_norms),
"miner/mean_momentum_norm": sum(momentum_norms) / len(momentum_norms),
},
step=self.global_step,
)

Sources: neurons/miner.py:518-552

To run a miner effectively, you need:

  • GPU: NVIDIA H100 with 80GB VRAM recommended
  • Storage: 100GB+ for model and data
  • Network: Stable internet connection with good bandwidth

Sources: docs/miner.md:369-373

Miners work in tandem with validators, who:

  1. Gather and evaluate miners’ gradients
  2. Compute scores based on improvement in loss
  3. Set weights on the blockchain
  4. Determine reward distribution

For more details on validators and the evaluation process, see Validators and Weight Setting.

Sources: neurons/validator.py:489-516

For detailed setup and running instructions, refer to the documentation in docs/miner.md . This includes:

  1. Installing dependencies
  2. Setting up R2 bucket credentials
  3. Configuring Bittensor wallet
  4. Running via Docker or directly with Python
  5. Monitoring performance and troubleshooting

Sources: docs/miner.md:32-302