Incentive Design
This document explains the incentive mechanisms that drive the distributed training process in the Templar framework. It covers how miners are motivated to contribute quality gradients and how validators evaluate and reward these contributions. For information about the overall system architecture, see System Architecture.
Overview of Incentive Mechanism
Section titled “Overview of Incentive Mechanism”Templar’s incentive mechanism aligns the interests of individual miners with the collective goal of improving model performance. The system uses an evaluation-based approach where validators assess the quality of miners’ gradient contributions by measuring their impact on model loss.
flowchart TD subgraph "Incentive Flow" M["Miners"] -->|"Submit Gradients"| V["Validators"] V -->|"Evaluate Quality"| SC["Score Calculation"] SC -->|"OpenSkill Rating"| WA["Weight Assignment"] WA -->|"Set on Blockchain"| BT["Bittensor Network"] BT -->|"TAO Rewards"| M end
Sources: neurons/validator.py:373-437 , neurons/miner.py:228-329
The incentive flow creates a reinforcement loop: miners producing higher quality gradients receive better scores, leading to higher weights on the blockchain and greater TAO rewards, encouraging continued quality contributions.
Miner Incentive Structure
Section titled “Miner Incentive Structure”Miners are incentivized to generate high-quality gradients through the following mechanisms:
- Direct performance evaluation: Miners’ contributions are scored based on their ability to improve model performance
- Window-based training cycles: Each miner works on assigned data for a specific window
- Gradient sharing system: Miners benefit from both their contributions and the collective improvement
flowchart LR subgraph "Miner Operations" TD["Training Data"] -->|"Train Model"| GC["Gradient Computation"] GC -->|"Transform & Compress"| GS["Gradient Sharing"] GS -->|"R2 Storage Upload"| VS["Validator Scoring"] VS -->|"Evaluate Improvement"| SR["Score & Rank"] SR -->|"Set Weights"| RW["TAO Rewards"] end
Sources: neurons/miner.py:228-329 , neurons/validator.py:486-515
How Miners Maximize Rewards
Section titled “How Miners Maximize Rewards”Miners aim to maximize their rewards by:
- Computing accurate gradients that lead to model improvement
- Maintaining model synchronization with the network
- Consistently contributing across training windows
The optimal strategy for miners is one that genuinely improves the model performance as measured by the validators.
Validator Evaluation System
Section titled “Validator Evaluation System”Validators employ a sophisticated evaluation system to measure the quality of miners’ contributions and assign appropriate weights.
Loss Improvement Calculation
Section titled “Loss Improvement Calculation”The core of the evaluation is measuring how a miner’s gradient improves model performance:
- Compute loss before applying gradient:
L_before
- Apply miner’s gradient to model
- Compute loss after applying gradient:
L_after
- Calculate improvement score:
s_i = L_before - L_after
sequenceDiagram participant M as "Miner" participant V as "Validator" participant B as "Blockchain" M->>V: Submit Compressed Gradient V->>V: Load Assigned Data V->>V: Compute Loss Before (L_before) V->>V: Apply Miner's Gradient V->>V: Compute Loss After (L_after) V->>V: Score = L_before - L_after V->>V: Update OpenSkill Rating V->>V: Calculate Final Weight V->>B: Set Weight on Blockchain B->>M: Distribute TAO Rewards
Sources: neurons/validator.py:486-515 , neurons/validator.py:356-445
OpenSkill Rating System
Section titled “OpenSkill Rating System”Validators use the PlackettLuce model from the OpenSkill library to maintain a probabilistic skill rating for each miner:
flowchart TD subgraph "OpenSkill Rating System" MS["Miner Scores"] -->|"Collected for Window"| GC["Group & Compare"] GC -->|"Update Ratings"| PL["PlackettLuce Model"] PL -->|"Update Mu & Sigma"| MR["Miner Ratings"] MR -->|"Calculate Ordinal"| OS["Ordinal Score"] OS -->|"Multiply with Binary Score"| FS["Final Score"] BS["Binary Moving Average"] -->|"Filter Non-negative"| FS SS["Sync Score"] -->|"Model Synchronization Quality"| FS end
Sources: neurons/validator.py:356-437
The OpenSkill system has these key properties:
- Mu (μ): Represents the estimated skill level of a miner
- Sigma (σ): Represents the uncertainty in the skill estimate
- Ordinal: A conservative estimate of skill (μ - k·σ) that accounts for uncertainty
- Parameters:
- Beta: 20 (controls the dynamics of rating updates)
- Tau: 0.1 (dynamic factor that prevents ratings from stagnating)
Binary Scores and Sync Quality
Section titled “Binary Scores and Sync Quality”In addition to gradient quality, validators track:
- Binary Indicator Scores: Whether miner updates improve or harm the model
- Sync Scores: How well miners’ models stay synchronized with the network
These factors are combined with the OpenSkill ordinal to produce the final score.
Weight Normalization and Assignment
Section titled “Weight Normalization and Assignment”Validator weights are calculated using a power normalization approach to ensure a fair distribution:
flowchart LR subgraph "Weight Assignment Process" FS["Final Scores"] -->|"Filter Positive"| PS["Positive Scores"] PS -->|"Power Normalization"| NW["Normalized Weights"] NW -->|"Set on Blockchain"| BW["Blockchain Weights"] BW -->|"Determine"| RD["Reward Distribution"] end
Sources: neurons/validator.py:446-488
The weight normalization process:
- Creates a mask for peers that have been evaluated
- Creates a mask for evaluated peers with positive scores
- Applies power normalization to only the positive scores
- Verifies that weights sum to approximately 1.0
This approach ensures that:
- Only positive contributions receive rewards
- Higher-quality contributions receive proportionally larger rewards
- The distribution of rewards is balanced across contributors
Reward Allocation and Penalties
Section titled “Reward Allocation and Penalties”Reward Allocation
Section titled “Reward Allocation”Miners who contribute to model improvement receive weights proportional to their contribution quality. The moving average smooths temporary fluctuations, creating a stable reward mechanism.
Penalties for Inactivity and Poor Performance
Section titled “Penalties for Inactivity and Poor Performance”Validators implement several penalty mechanisms:
Penalty Type | Condition | Reduction Rate |
---|---|---|
Inactivity | Peer inactive for a window | 25% per window |
Missing Gradient | Failed to submit gradient | 75% |
Poor Sync | Model out of sync with network | 75% |
Long-term Inactivity | Inactive > 25 windows | Complete reset |
Sources: neurons/validator.py:702-770 , neurons/validator.py:877-912
Security Considerations
Section titled “Security Considerations”The incentive design addresses several potential security concerns:
- Sybil Resistance: Creating multiple identities offers no advantage as rewards are based on contribution quality, not peer count
- Free-Riding Prevention: Miners only receive rewards for genuine, measurable contributions
- Nash Equilibrium: The optimal strategy is honest participation and genuine improvement
- Collusion Resistance: Evaluation is based on objective model improvement metrics
Implementation Details
Section titled “Implementation Details”Key Parameters
Section titled “Key Parameters”The incentive system is configured with these parameters from hparams.json:
Parameter | Value | Function |
---|---|---|
gradient_score_ma_alpha | 0.6 | Weight for gradient score moving average |
binary_score_ma_alpha | 0.05 | Weight for binary indicator score moving average |
final_score_ma_alpha | 0.75 | Weight for final score moving average |
power_normalisation | 2.0 | Exponent for power normalization of weights |
openskill_beta | 20 | Controls dynamics of rating updates |
openskill_tau | 0.1 | Prevents ratings from stagnating |
reset_inactivity_windows | 25 | Windows before peer score is fully reset |
Sources: hparams.json:14-17 , hparams.json:40 , hparams.json:50-51 , hparams.json:47
Code Implementation
Section titled “Code Implementation”The core validation and scoring system is implemented in validator.py, with key functions:
Validator Core Functions
def update_weights(self) -> None: """ Update the weights for all evaluated peers using min power normalization. This method: 1. Creates a mask for peers that have been evaluated. 2. Creates a mask for evaluated peers with positive scores. 3. Applies power normalization to only the positive scores. 4. Verifies that weights sum to approximately 1.0. This approach only assigns weights to peers with positive scores. """ self.weights = torch.zeros_like(self.final_scores) evaluated_mask = torch.zeros_like(self.final_scores, dtype=torch.bool) evaluated_mask[list(self.evaluated_uids)] = True
# Create a mask for positive scores among evaluated peers positive_mask = evaluated_mask.clone() positive_mask[evaluated_mask] = self.final_scores[evaluated_mask] > 0
# Only consider peers with positive scores positive_scores = self.final_scores[positive_mask]
if len(positive_scores) > 0: # Apply power normalization to only the positive scores normalized_weights = min_power_normalization( positive_scores, power=self.hparams.power_normalisation, )
# Assign weights only to peers with positive scores self.weights[positive_mask] = normalized_weights
weight_sum = self.weights.sum().item() tplr.logger.debug(f"Weight sum: {weight_sum}")
if abs(weight_sum - 1.0) > 1e-6: tplr.logger.warning( f"Weights sum to {weight_sum}, expected close to 1.0" ) else: tplr.logger.warning( "No positive scores found among evaluated peers. All weights set to zero." )
def update_openskill_ratings(self): """ Update OpenSkill ratings based on gradient scores and recalculate final scores.
This method: 1. Processes all peers evaluated in the current window. 2. Updates their OpenSkill ratings based on gradient performance. 3. Recalculates final scores using OpenSkill mu value combined with binary and sync scores. 4. Logs the updated ratings to monitoring systems (WandB, InfluxDB).
The OpenSkill rating system provides a probabilistic skill rating that accounts for uncertainty and relative performance between peers. Ratings are updated using the PlackettLuce model where higher gradient scores indicate better performance.
The final score calculation combines: - OpenSkill ordinal (derived from mean skill estimate, mu) - Binary moving average (filtered to non-negative values) - Sync score (model synchronization quality) """ if ( hasattr(self, "current_window_scores") and len(self.current_window_scores) > 1 ): # Get UIDs and scores window_uids = list(self.current_window_scores.keys())
# Scores for OpenSkill (higher is better) scores = [self.current_window_scores[uid] for uid in window_uids]
# Create teams list for OpenSkill teams = [[self.openskill_ratings[uid]] for uid in window_uids]
# Rate the teams using scores rated_teams = self.openskill_model.rate(teams, scores=scores)
# Store updated ratings and recalculate final scores for i, uid in enumerate(window_uids): self.openskill_ratings[uid] = rated_teams[i][0]
openskill_mu = float(self.openskill_ratings[uid].mu) openskill_sigma = float(self.openskill_ratings[uid].sigma) openskill_ordinal = float(self.openskill_ratings[uid].ordinal())
sync_score = float( self.sync_scores[uid].item() if uid in self.evaluated_uids else 0.0 ) binary_moving_avg = max(0, self.binary_moving_averages[uid].item())
self.final_scores[uid] = ( openskill_ordinal * binary_moving_avg * sync_score ) tplr.logger.info( f"Computed Final Score for UID {uid}: {self.final_scores[uid]}" )
# Log to WandB if hasattr(self, "wandb") and self.wandb is not None: self.wandb.log( { f"validator/openskill/mu/{uid}": openskill_mu, f"validator/openskill/sigma/{uid}": openskill_sigma, f"validator/openskill/ordinal/{uid}": openskill_ordinal, }, step=self.global_step, )
# Log to InfluxDB if hasattr(self, "metrics_logger") and self.metrics_logger is not None: self.metrics_logger.log( measurement="validator_openskill", tags={ "eval_uid": str(uid), "window": int(self.sync_window), "global_step": int(self.global_step), }, fields={ "mu": openskill_mu, "sigma": openskill_sigma, "ordinal": openskill_ordinal, }, )
def evaluate_model_on_batches( self, model: torch.nn.Module, batches: list[list[int]], sampled_indices: list[int],) -> tuple[float, int]: """ Evaluates a given model's performance (loss) on specified batches of data.
This method: 1. Sets the model to evaluation mode. 2. Iterates through a list of pre-tokenized input batches. 3. For each batch indicated by `sampled_indices`: a. Converts the batch to tensors and moves to the model's device. b. Creates labels, masking padding tokens. c. Performs a forward pass to get the loss. d. Accumulates total loss and counts the number of batches processed. 4. Uses automatic mixed precision (autocast) for potential performance gains. 5. Clears CUDA cache after processing each batch to manage memory.
Args: model (torch.nn.Module): The model to be evaluated. batches (list[list[int]]): A list of tokenized input batches. sampled_indices (list[int]): A list of indices indicating which batches to evaluate.
Returns: tuple[float, int]: A tuple containing the total loss accumulated over the evaluated batches and the number of batches processed. """ total_loss = 0.0 n_batches = 0 with torch.no_grad(): # Ensure no gradients are computed during evaluation model.eval() # Set the model to evaluation mode # Use autocast for mixed precision inference if supported with autocast(device_type=self.model.device.type, dtype=torch.bfloat16): for i, batch in enumerate(batches): if i not in sampled_indices: continue # Skip batches not in the sampled list
input_ids = torch.tensor(batch, dtype=torch.long).to(model.device) labels = input_ids.clone() # Mask padding tokens for loss calculation labels = torch.where( labels == self.tokenizer.pad_token_id, -100, labels )
outputs = model(input_ids=input_ids, labels=labels) total_loss += outputs.loss.item() n_batches += 1
# Clean up to free GPU memory del input_ids, labels, outputs if torch.cuda.is_available(): torch.cuda.empty_cache()
return total_loss, n_batches
Sources: neurons/validator.py:356-437 , neurons/validator.py:446-488 , neurons/validator.py:489-515
Alignment with Templar’s Goals
Section titled “Alignment with Templar’s Goals”The incentive design aligns with Templar’s core goals by:
- Encouraging Quality: Rewards are proportional to model improvement
- Promoting Collaboration: Miners benefit from the collective improvement of the model
- Ensuring Robustness: Multiple validation metrics create a more reliable evaluation
- Supporting Decentralization: Independent validators assess contributions fairly
- Enabling Heterogeneity: Miners with varying hardware can contribute meaningfully
In summary, Templar’s incentive design creates a self-regulating ecosystem where honest participation and genuine model improvement are the most rewarding strategies.
Sources: neurons/validator.py:356-488 , neurons/miner.py:228-329