Chain Integration
Purpose and Scope
Section titled “Purpose and Scope”This document explains how Templar integrates with the Bittensor blockchain to enable decentralized model training. It covers the core components responsible for blockchain interaction, commitment management, block processing, window tracking, and peer management. For information about model checkpoint management, see Checkpoint Management.
Chain Architecture Overview
Section titled “Chain Architecture Overview”The following diagram illustrates how Templar interacts with the Bittensor blockchain:
graph TD subgraph "Bittensor Network" BT["Bittensor Chain"] MG["Metagraph"] end subgraph "Chain Management" CM["ChainManager"] CB["Commitment Storage"] BP["Block Processing"] WM["Window Management"] PM["Peer Management"] end subgraph "Neuron Integration" MI["Miner Integration"] VI["Validator Integration"] WS["Weight Setting"] PC["Peer Coordination"] end BT <--> CM MG <--> CM CM --> CB CM --> BP CM --> WM CM --> PM CB --> MI CB --> VI BP --> WM WM --> MI WM --> VI PM --> PC VI --> WS WS --> BT
Sources: src/tplr/chain.py:37-487 , src/tplr/comms.py:64-102
ChainManager Class
Section titled “ChainManager Class”The foundation of Templar’s blockchain integration is the ChainManager
class. It provides methods for committing and retrieving data from the chain, monitoring blocks, and managing peer relationships.
classDiagram class ChainManager { +wallet: bt.wallet +netuid: int +metagraph: bt.metagraph +hparams: dict +current_block: int +current_window: int +commitments: dict +peers: array +eval_peers: dict +block_event: asyncio.Event +block_listener(loop) +commit(wallet, bucket) +try_commit(wallet, bucket) +get_commitment(uid) +get_commitments() +update_peers_with_buckets() +start_commitment_fetcher() } ChainManager <|-- Comms class Comms { +wallet: bt.wallet +bucket: Bucket +uid: int +temp_dir: str +get_own_bucket(bucket_type, access_type) +put(state_dict, uid, window, key) +gather(my_uid, uids, window, key) +load_checkpoint(model, optimizer, scheduler) +save_checkpoint(model, optimizer, scheduler) }
Sources: src/tplr/chain.py:37-49 , src/tplr/comms.py:64-102
Commitment System
Section titled “Commitment System”Templar uses chain commitments to securely store and share access information for R2 storage buckets. Each neuron commits its bucket details to the blockchain, allowing other nodes to retrieve this information for data exchange.
Commitment Format
Section titled “Commitment Format”Commitments follow a fixed format of concatenated strings:
account_id
: 32 charactersaccess_key_id
: 32 characterssecret_access_key
: 64 characters
Total length: 128 characters
Commitment Flow
Section titled “Commitment Flow”sequenceDiagram participant Neuron participant ChainManager participant Subtensor Neuron->>ChainManager: try_commit(wallet, bucket) ChainManager->>Subtensor: get_commitment(netuid, uid) Subtensor-->>ChainManager: Return current commitment alt Commitment exists and matches ChainManager->>Neuron: Use existing commitment else Commitment missing or different ChainManager->>Subtensor: commit(wallet, netuid, concatenated) Subtensor-->>ChainManager: Confirm commitment ChainManager->>Neuron: Log successful commitment end Neuron->>ChainManager: start_commitment_fetcher() loop Periodic updates ChainManager->>Subtensor: query_map("Commitments", "CommitmentOf") Subtensor-->>ChainManager: Return all commitments ChainManager->>ChainManager: Parse into Bucket objects ChainManager->>ChainManager: Update commitments dict end
Sources: src/tplr/chain.py:174-233 , src/tplr/chain.py:304-397
Block and Window Management
Section titled “Block and Window Management”Templar uses blockchain blocks to synchronize the training process across the network. Blocks are grouped into windows based on the blocks_per_window
parameter, with each window driving a training iteration.
Block Listener
Section titled “Block Listener”Each neuron runs a background thread that subscribes to block headers from the Bittensor network. When a new block arrives, it updates the current block number and recalculates the current window if needed.
flowchart TD A["block_listener(loop)"] -->|"Subscribe to"| B["subtensor.substrate.subscribe_block_headers"] B -->|"New block event"| C["handler(event)"] C -->|"Update"| D["current_block = event.header.number"] D -->|"Calculate"| E["new_window = current_block / blocks_per_window"] E -->|"If changed"| F["current_window = new_window"] F -->|"Update"| G["comms.current_window"] G -->|"Drives"| H["Training/Validation Loop"]
Sources: src/tplr/chain.py:143-172 , neurons/miner.py:757-777 , neurons/validator.py:522-525
Window-Based Training
Section titled “Window-Based Training”The window concept is central to Templar’s training process:
- Global Step: Calculated as
current_window - start_window
, tracking overall training progress - Window Synchronization: Miners and validators wait for window transitions to coordinate actions
- Learning Rate Schedule: Tied to the global step for coordinated optimization
- Start Window Coordination: Ensures all neurons begin training from the same point
graph TD A["Block Events"] -->|"Trigger window transitions"| B["current_window"] subgraph "Miner" B -->|"Train for window"| C["Compute gradients"] C -->|"Upload to R2"| D["Wait for next window"] E["start_window"] -->|"Calculate"| F["global_step = current_window - start_window"] F -->|"Update"| G["Optimizer state"] end subgraph "Validator" B -->|"Evaluate for window"| H["Gather and assess gradients"] H -->|"Set weights"| I["Wait for next window"] E -->|"Calculate"| J["global_step"] J -->|"Update"| K["OpenSkill ratings"] end
Sources: neurons/miner.py:229-325 , neurons/validator.py:516-567
Peer Management
Section titled “Peer Management”The blockchain integration enables coordinated peer management for training and evaluation.
Commitment-Based Peer Discovery
Section titled “Commitment-Based Peer Discovery”Templar discovers and filters peers by retrieving and processing commitments from the blockchain:
flowchart TD A["fetch_commitments()"] -->|"Query chain"| B["get_commitments()"] B -->|"Parse raw data"| C["commitments dict"] C -->|"Process"| D["update_peers_with_buckets()"] D -->|"Map UIDs to buckets"| E["Evaluate peer eligibility"] E -->|"Filter by activity"| F["Active peers set"] F -->|"Filter by stake"| G["Eval peers dict"] H["Inactive detection"] -->|"Track inactive peers"| I["Apply penalties"]
Sources: src/tplr/chain.py:418-427 , src/tplr/chain.py:448-487
Peer Selection and Distribution
Section titled “Peer Selection and Distribution”Validators select and distribute peer lists for coordinated training:
sequenceDiagram participant Validator participant R2Storage participant Miner Validator->>Validator: select_next_peers() Validator->>R2Storage: post_peer_list(peers, window) Miner->>R2Storage: get_peer_list() R2Storage-->>Miner: peers, update_window Note over Miner,Validator: Peer list effective after window margin Miner->>Miner: Update comms.peers when window reached
Sources: neurons/validator.py:674-686 , src/tplr/neurons.py:127-197
Code Implementation
Section titled “Code Implementation”ChainManager Initialization
Section titled “ChainManager Initialization”Both miners and validators initialize the chain components as part of their setup:
# In both Miner.__init__ and Validator.__init__self.wallet = bt.wallet(config=self.config)self.subtensor = bt.subtensor(config=self.config)self.metagraph = self.subtensor.metagraph(self.config.netuid)if self.wallet.hotkey.ss58_address not in self.metagraph.hotkeys: tplr.logger.error(f"The wallet {self.wallet} is not registered on subnet: {self.metagraph.netuid}") sys.exit()self.uid = self.metagraph.hotkeys.index(self.wallet.hotkey.ss58_address)
# Initialize Comms with chain componentsself.comms = tplr.comms.Comms( wallet=self.wallet, save_location="/tmp", key_prefix="model", config=self.config, netuid=self.config.netuid, metagraph=self.metagraph, hparams=self.hparams, uid=self.uid,)
Sources: neurons/miner.py:107-143 , neurons/validator.py:134-174
Commitment Management
Section titled “Commitment Management”The commitment system securely stores and retrieves bucket information:
# Checking and updating commitmentsself.bucket = self.comms.get_own_bucket("gradients", "read")self.comms.try_commit(self.wallet, self.bucket)
# Retrieving and parsing commitmentsself.comms.commitments = await self.comms.get_commitments()
The try_commit
method checks if the current bucket configuration matches what’s on the chain and updates it if needed:
def try_commit(self, wallet: Wallet, bucket: Bucket) -> None: # Get existing commitment commitment = self.get_commitment(self.metagraph.hotkeys.index(wallet.hotkey.ss58_address))
# Compare with current bucket details if bucket_details_from_env != commitment_str: self.commit(wallet, bucket)
Sources: src/tplr/chain.py:174-233 , neurons/miner.py:246-247
Block Listener Implementation
Section titled “Block Listener Implementation”The block listener thread monitors blockchain events:
# Starting the listener threadself.listener = threading.Thread( target=self.block_listener, args=(self.loop,), daemon=True,).start()
The handler updates state based on new blocks:
def handler(event): self.current_block = int(event["header"]["number"]) if int(self.current_block / self.hparams.blocks_per_window) != self.current_window: self.current_window = int(self.current_block / self.hparams.blocks_per_window) self.comms.current_window = self.current_window
Sources: neurons/miner.py:235-240 , src/tplr/chain.py:155-166
Integration in Neurons
Section titled “Integration in Neurons”Miner Chain Integration
Section titled “Miner Chain Integration”Miners use chain integration for:
- Block-driven training: The training loop proceeds based on window transitions
- Start window coordination: Fetching the global start window from validators
- Peer discovery: Retrieving and using validator-selected peers
- Bucket commitment: Sharing storage access information
flowchart TD A["Miner.run()"] -->|"Initialize"| B["Block listener thread"] B -->|"Monitor blocks"| C["Update current_window"] D["Get start_window"] -->|"Coordinate with validators"| E["Calculate global_step"] F["Current window"] -->|"Drive"| G["Training loop"] G -->|"Process data"| H["Upload gradients"] H -->|"Wait for window transition"| G I["Update peers"] -->|"From R2"| J["Gather from peers"] J -->|"Apply updates"| K["Next window"]
Sources: neurons/miner.py:229-325 , neurons/miner.py:757-777
Validator Chain Integration
Section titled “Validator Chain Integration”Validators use chain integration for:
- Window coordination: The validation process syncs with block-derived windows
- Start window publishing: Setting the global training starting point
- Weight setting: Evaluating miners and setting chain weights
- Peer management: Selecting and distributing peer lists
flowchart TD A["Validator.run()"] -->|"Initialize"| B["Block listener thread"] B -->|"Monitor blocks"| C["Update current_window"] D["Highest stake validator"] -->|"Post start_window"| E["Set global training start"] F["Current window"] -->|"Drive"| G["Validation loop"] G -->|"Gather gradients"| H["Evaluate quality"] H -->|"Update scores"| I["Set weights on chain"] J["Peer management"] -->|"Select peers"| K["post_peer_list"] K -->|"For next window+margin"| L["Update peers"]
Sources: neurons/validator.py:516-579 , neurons/validator.py:522-525
Weight Setting Process
Section titled “Weight Setting Process”Validators evaluate miners and set weights on the blockchain:
flowchart TD A["Evaluate gradients"] -->|"Calculate scores"| B["update_openskill_ratings()"] B -->|"Combine with other metrics"| C["final_scores"] C -->|"Normalize"| D["update_weights()"] D -->|"Apply power normalization"| E["weights tensor"] F["Block processing"] -->|"Periodic weight setting"| G["subtensor.set_weights()"] G -->|"Submit to chain"| H["Update metagraph"]
Sources: neurons/validator.py:374-437 , neurons/validator.py:446-487
Window-Based Synchronization
Section titled “Window-Based Synchronization”Start Window Coordination
Section titled “Start Window Coordination”To ensure all nodes start training from the same point, Templar coordinates a global start window:
sequenceDiagram participant HV as "Highest-Stake Validator" participant R2 as "R2 Storage" participant OV as "Other Validators" participant M as "Miners" HV->>R2: post_start_window(window) OV->>R2: get_start_window() R2-->>OV: start_window M->>R2: get_start_window() R2-->>M: start_window Note over HV,M: All nodes calculate global_step = current_window - start_window
Sources: neurons/validator.py:534-563 , neurons/miner.py:250-259
Window-Driven Training Loop
Section titled “Window-Driven Training Loop”Both miners and validators use window transitions to drive their main loops:
sequenceDiagram participant BC as "Bittensor Chain" participant M as "Miner" participant V as "Validator" BC->>M: New block event BC->>V: New block event M->>M: Update current_window V->>V: Update current_window M->>M: Train for window V->>V: Evaluate for window M->>M: Wait for next window V->>V: Wait for next window Note over M,V: Both wait for: while current_window == step_window: await asyncio.sleep(0.1)
Sources: neurons/miner.py:751-754 , neurons/validator.py:627-636
Configuration Parameters
Section titled “Configuration Parameters”Key configuration parameters for chain integration:
Parameter | Description | Default | Source |
---|---|---|---|
netuid | Bittensor network UID | 268 | neurons/miner.py:71 |
blocks_per_window | Number of blocks per training window | 7 | hparams.json:8 |
validator_offset | Windows validators lag behind miners | 2 | hparams.json:30 |
peer_replacement_frequency | Windows between peer list updates | 5 | hparams.json:36 |
peer_list_window_margin | Windows before peer list takes effect | 2 | hparams.json:37 |
reset_inactivity_windows | Windows before inactive peer reset | 25 | hparams.json:46 |
Sources: hparams.json:8-47 , neurons/miner.py:71 , neurons/validator.py:90
Using in Development
Section titled “Using in Development”For local development, the ecosystem.config.js file shows how to configure neurons to interact with a local Bittensor chain:
// Example from ecosystem.config.js{ name: "TM1", script: "neurons/miner.py", interpreter: "python3", args: `--wallet.name templar_test --wallet.hotkey M1 --device cuda:1 --subtensor.network local --netuid 2 --use_wandb --project "${PROJECT_NAME}"`}
The --subtensor.network local
flag directs the neurons to use a local Subtensor chain for development and testing.
Sources: ecosystem.config.js:8-16