OmniTensor
  • Welcome
  • Introduction
    • What is OmniTensor?
    • Vision & Mission
    • Key Features
    • Why OmniTensor?
      • Current Challenges in AI
      • How OmniTensor Addresses These Challenges
  • Core Concepts
    • AI Grid as a Service (AI-GaaS)
      • Overview of AI-GaaS
      • Benefits of AI-GaaS
      • Use Cases of AI-GaaS
    • Decentralized Physical Infrastructure Network (DePIN)
      • GPU Sharing Model
      • Incentive Mechanisms
    • AI OmniChain
      • Layer 1 and Layer 2 Integration
      • AI Model Marketplace and Interoperability
    • DualProof Consensus Mechanism
      • Proof-of-Work (PoW) for AI Compute
      • Proof-of-Stake (PoS) for Validation
    • OMNIT Token
      • Overview
      • Utility
      • Governance
  • Tokenomics
    • Token Allocations
    • Token Locks
    • ERC20 Token
    • Audit
  • OmniTensor Infrastructure
    • L1 EVM Chain
      • Overview & Benefits
      • Development Tools & API
    • AI OmniChain
      • Interoperability
      • Scalability
      • Decentralized Data & Model Management
    • Nodes & Network Management
      • AI Consensus Validator Nodes
      • AI Compute Nodes (GPUs)
  • Roadmap & Updates
    • Roadmap
    • Future Features
  • PRODUCTS
    • AI Model Marketplace
    • dApp Store
    • Data Layer
    • Customizable Solutions
    • AI Inference Network
  • For the Community
    • Contributing to OmniTensor
      • Sharing Your GPU
      • Data Collection & Validation
    • Earning OMNIT Tokens
      • Computation Rewards
      • Data Processing & Validation Rewards
    • Community Incentives & Gamification
      • Participation Rewards
      • Leaderboards & Competitions
  • For Developers
    • Building on OmniTensor
      • dApp Development Overview
      • Using Pre-trained AI Models
    • SDK & Tools
      • OmniTensor SDK Overview
      • API Documentation
    • AI Model Training & Deployment
      • Training Custom Models
      • Deploying Models on OmniTensor
    • Decentralized Inference Network
      • Running AI Inference
      • Managing and Scaling Inference Tasks
    • Advanced Topics
      • Cross-Chain Interoperability
      • Custom AI Model Fine-Tuning
  • For Businesses
    • AI Solutions for Businesses
      • Ready-Made AI dApps
      • Custom AI Solution Development
    • Integrating OmniTensor with Existing Systems
      • Web2 & Web3 Integration
      • API Usage & Examples
    • Privacy & Security
      • Data Encryption & Privacy Measures
      • Secure AI Model Hosting
  • Getting Started
    • Setting Up Your Account
    • Installing SDK & CLI Tools
  • Tutorials & Examples
    • Building AI dApps Step by Step
    • Integrating AI Models with OmniTensor
    • Case Studies
      • AI dApp Implementations
      • Real-World Applications
  • FAQ
    • Common Questions & Issues
    • Troubleshooting
  • Glossary
    • Definitions of Key Terms & Concepts
  • Community and Support
    • Official Links
    • Community Channels
  • Legal
    • Terms of Service
    • Privacy Policy
    • Licensing Information
Powered by GitBook
On this page
  • Prerequisites
  • Overview
  • Supported Models
  • Basic Inference Workflow
  • Code Example
  • Advanced Configuration
  • Monitoring and Optimization
  • Error Handling and Retries
  1. For Developers
  2. Decentralized Inference Network

Running AI Inference

OmniTensor's decentralized inference network provides a robust, scalable solution for running AI inference tasks across a distributed network of GPU nodes. This guide will walk you through the process of leveraging OmniTensor's infrastructure for efficient and cost-effective AI inference.

Prerequisites

  • OmniTensor SDK installed

  • Valid API key for OmniTensor services

  • Familiarity with AI models and inference concepts

Overview

OmniTensor's decentralized inference network utilizes a distributed compute paradigm, allowing developers to run inference tasks on a network of GPU nodes contributed by the community. This approach offers several advantages:

  • Scalability - Dynamically scale inference capacity based on demand

  • Cost-efficiency - Pay only for the compute resources used

  • Redundancy - Increased fault tolerance through distributed processing

  • Low latency - Geographically distributed nodes reduce network latency

Supported Models

OmniTensor supports a wide range of pre-trained models, including:

  • Large Language Models (LLMs): GPT-3, BERT, T5

  • Computer Vision: YOLO, ResNet, EfficientNet

  • Speech Recognition: DeepSpeech, Wav2Vec

  • Custom models: Deploy your own fine-tuned or proprietary models

Basic Inference Workflow

  1. Initialize OmniTensor client

  2. Select or upload AI model

  3. Prepare input data

  4. Submit inference request

  5. Retrieve and process results

Code Example

Here's a basic example of running inference using the OmniTensor SDK:

from omnitensor import Client, Model, InferenceRequest

# Initialize client
client = Client(api_key="YOUR_API_KEY")

# Select pre-trained model
model = Model.from_catalog("gpt-3-small")

# Prepare input
input_text = "Translate the following English text to French: 'Hello, world!'"

# Create inference request
request = InferenceRequest(
    model=model,
    inputs={"text": input_text},
    output_keys=["translated_text"]
)

# Submit request and get results
result = client.run_inference(request)

print(result.outputs["translated_text"])

Advanced Configuration

Node Selection Strategy

OmniTensor allows you to specify node selection criteria for your inference tasks:

request = InferenceRequest(
    # ... other parameters ...
    node_preferences={
        "min_gpu_memory": 8,  # GB
        "max_latency": 50,    # ms
        "geographical_region": "europe-west"
    }
)

Batching and Streaming

For improved performance, you can batch multiple inputs or use streaming for real-time inference:

# Batched inference
batch_request = InferenceRequest(
    model=model,
    inputs={"texts": ["Hello", "World", "OmniTensor"]},
    batch_size=3
)

# Streaming inference
with client.stream_inference(model) as stream:
    while True:
        input_text = input("Enter text (or 'q' to quit): ")
        if input_text.lower() == 'q':
            break
        result = stream.process(input_text)
        print(result)

Monitoring and Optimization

OmniTensor provides real-time metrics for monitoring inference performance:

metrics = client.get_inference_metrics(request_id)
print(f"Inference time: {metrics.latency_ms} ms")
print(f"GPU utilization: {metrics.gpu_utilization}%")

Use these metrics to optimize your inference pipeline and make informed decisions about resource allocation.

Error Handling and Retries

OmniTensor's SDK includes built-in error handling and retry mechanisms:

from omnitensor.exceptions import NodeFailureError, TimeoutError

try:
    result = client.run_inference(request)
except NodeFailureError:
    # Automatic retry on a different node
    result = client.run_inference(request, retry_strategy="auto")
except TimeoutError:
    # Handle timeout scenario
    pass

PreviousDecentralized Inference NetworkNextManaging and Scaling Inference Tasks

Last updated 9 months ago