Text-to-Speech

Natural voice synthesis and audio generation from text

Convert text to natural, high-quality speech using Cxmpute's distributed text-to-speech service.

Overview

Cxmpute's Text-to-Speech (TTS) service transforms written text into lifelike audio using advanced voice synthesis models. Our global network of providers ensures fast generation times and high availability.

Key Features

High-Quality Audio: Professional-grade voice synthesis
Multiple Voices: Various voice models and styles
Fast Generation: Optimized for speed and reliability
Global Network: Low-latency access worldwide
Simple API: Easy integration with any application

Quick Start

Basic Request

Shell

curl -X POST https://cxmpute.cloud/api/v1/tts \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "X-User-Id: YOUR_USER_ID" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Welcome to Cxmpute! This is a demonstration of our text-to-speech service.",
    "voice": "af_bella"
  }' \
  --output welcome.wav

Python Example

Python

import requests

url = "https://cxmpute.cloud/api/v1/tts"
headers = {
    "Authorization": "Bearer YOUR_API_KEY",
    "X-User-Id": "YOUR_USER_ID",
    "Content-Type": "application/json"
}

data = {
    "text": "Hello from Cxmpute!",
    "voice": "af_bella"
}

response = requests.post(url, headers=headers, json=data)

if response.status_code == 200:
    with open("output.wav", "wb") as f:
        f.write(response.content)
    print("Audio saved as output.wav")
else:
    print(f"Error: {response.status_code}")

JavaScript Example

JavaScript

const fs = require('fs');

async function generateSpeech() {
  const response = await fetch('https://cxmpute.cloud/api/v1/tts', {
    method: 'POST',
    headers: {
      'Authorization': 'Bearer YOUR_API_KEY',
      'X-User-Id': 'YOUR_USER_ID',
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({
      text: 'This is generated using Cxmpute TTS!',
      voice: 'af_bella'
    })
  });

  if (response.ok) {
    const buffer = await response.arrayBuffer();
    fs.writeFileSync('speech.wav', Buffer.from(buffer));
    console.log('Audio saved as speech.wav');
  } else {
    console.error('Error:', response.status);
  }
}

generateSpeech();

API Reference

Endpoint

HTTP

POST /v1/tts

Request Parameters

Parameter	Type	Required	Description
`text`	string	Yes	Text to convert to speech (max 10,000 characters)
`voice`	string	No	Voice model to use (default: "af_bella")

Available Voices

Voice	Description	Language	Style
`af`	Standard American female voice	English (US)	Natural, balanced
`af_bella`	Warm, professional female voice	English (US)	Clear, pleasant
`af_nicole`	Energetic female voice	English (US)	Upbeat, dynamic
`af_sarah`	Calm, soothing female voice	English (US)	Gentle, relaxed
`af_sky`	Bright, youthful female voice	English (US)	Light, cheerful
`am_adam`	Professional male voice	English (US)	Authoritative, clear
`am_michael`	Friendly male voice	English (US)	Warm, approachable
`bf_emma`	Sophisticated British female voice	English (UK)	Elegant, refined
`bf_isabella`	Classic British female voice	English (UK)	Traditional, articulate
`bm_george`	Distinguished British male voice	English (UK)	Formal, authoritative
`bm_lewis`	Modern British male voice	English (UK)	Contemporary, friendly

More voices are continuously being added to our network.

Response

The endpoint returns raw audio data in WAV format:

Content-Type: audio/wav
Format: 16-bit PCM WAV
Sample Rate: 22,050 Hz
Channels: Mono

Error Responses

JSON

{
  "error": "Missing 'text' field."
}

Common error codes:

400: Bad request (missing text, text too long)
401: Unauthorized (invalid API key)
503: Service unavailable (no TTS providers online)
500: Internal server error

Use Cases

1. Content Creation

Generate voiceovers for videos, podcasts, and presentations:

Python

def create_voiceover(script_segments):
    audio_files = []
    
    for i, text in enumerate(script_segments):
        response = requests.post(
            "https://cxmpute.cloud/api/v1/tts",
            headers=headers,
            json={"text": text, "voice": "af_bella"}
        )
        
        filename = f"segment_{i}.wav"
        with open(filename, "wb") as f:
            f.write(response.content)
        audio_files.append(filename)
    
    return audio_files

2. Accessibility Features

Add screen reading capabilities to applications:

JavaScript

async function speakText(text) {
  const audio = await generateSpeech(text);
  const audioUrl = URL.createObjectURL(new Blob([audio]));
  const audioElement = new Audio(audioUrl);
  audioElement.play();
}

// Usage
document.addEventListener('click', (e) => {
  if (e.target.dataset.speak) {
    speakText(e.target.textContent);
  }
});

3. Language Learning

Create pronunciation examples for educational apps:

Python

def create_pronunciation_guide(phrases):
    for phrase in phrases:
        # Generate audio for the phrase
        audio_response = requests.post(
            "https://cxmpute.cloud/api/v1/tts",
            headers=headers,
            json={
                "text": phrase["text"],
                "voice": "af_sarah"  # Clear, educational voice
            }
        )
        
        # Save with metadata
        filename = f"pronunciation_{phrase['id']}.wav"
        with open(filename, "wb") as f:
            f.write(audio_response.content)

4. Interactive Applications

Add voice responses to chatbots and virtual assistants:

Python

def voice_assistant_response(user_message):
    # Get AI response
    chat_response = requests.post(
        "https://cxmpute.cloud/api/v1/chat/completions",
        headers=headers,
        json={
            "model": "llama3.1:8b",
            "messages": [{"role": "user", "content": user_message}]
        }
    )
    
    ai_text = chat_response.json()["choices"][0]["message"]["content"]
    
    # Convert to speech
    tts_response = requests.post(
        "https://cxmpute.cloud/api/v1/tts",
        headers=headers,
        json={"text": ai_text, "voice": "af_nicole"}
    )
    
    return tts_response.content

5. Notification Systems

Create audio alerts and announcements:

Python

def create_audio_notification(message, urgency="normal"):
    voice_map = {
        "normal": "af_bella",
        "urgent": "am_adam",
        "calm": "af_sarah"
    }
    
    response = requests.post(
        "https://cxmpute.cloud/api/v1/tts",
        headers=headers,
        json={
            "text": message,
            "voice": voice_map.get(urgency, "af_bella")
        }
    )
    
    return response.content

Advanced Usage

Batch Processing

Generate multiple audio files efficiently:

Python

import concurrent.futures
import requests

def generate_single_audio(text_item):
    response = requests.post(
        "https://cxmpute.cloud/api/v1/tts",
        headers=headers,
        json={
            "text": text_item["text"],
            "voice": text_item.get("voice", "af_bella")
        }
    )
    
    return {
        "id": text_item["id"],
        "audio": response.content if response.ok else None,
        "error": None if response.ok else response.text
    }

def batch_generate_audio(text_items, max_workers=5):
    with concurrent.futures.ThreadPoolExecutor(max_workers=max_workers) as executor:
        results = list(executor.map(generate_single_audio, text_items))
    
    return results

# Usage
texts = [
    {"id": "intro", "text": "Welcome to our service!", "voice": "af_bella"},
    {"id": "guide", "text": "Here's how to get started...", "voice": "af_sarah"},
    {"id": "thanks", "text": "Thank you for using our app!", "voice": "af_nicole"}
]

results = batch_generate_audio(texts)
for result in results:
    if result["audio"]:
        with open(f"{result['id']}.wav", "wb") as f:
            f.write(result["audio"])

Error Handling

Robust error handling for production applications:

Python

import time
import logging

def reliable_tts_request(text, voice="af_bella", max_retries=3):
    headers = {
        "Authorization": f"Bearer {API_KEY}",
        "X-User-Id": USER_ID,
        "Content-Type": "application/json"
    }
    
    for attempt in range(max_retries):
        try:
            response = requests.post(
                "https://cxmpute.cloud/api/v1/tts",
                headers=headers,
                json={"text": text, "voice": voice},
                timeout=30
            )
            
            if response.status_code == 200:
                return response.content
            elif response.status_code == 503:
                # No providers available, wait and retry
                logging.warning(f"No TTS providers available, attempt {attempt + 1}")
                time.sleep(2 ** attempt)  # Exponential backoff
                continue
            else:
                response.raise_for_status()
                
        except requests.exceptions.RequestException as e:
            logging.error(f"TTS request failed: {e}")
            if attempt == max_retries - 1:
                raise
            time.sleep(1)
    
    raise Exception("TTS request failed after all retries")

Streaming Integration

For real-time applications, combine with streaming text generation:

Python

async def streaming_tts_response(prompt):
    """Generate text and immediately convert to speech"""
    
    # Stream text response
    text_response = requests.post(
        "https://cxmpute.cloud/api/v1/chat/completions",
        headers=headers,
        json={
            "model": "llama3.1:8b",
            "messages": [{"role": "user", "content": prompt}],
            "stream": True
        },
        stream=True
    )
    
    accumulated_text = ""
    sentences = []
    
    for line in text_response.iter_lines():
        if line.startswith(b"data: "):
            try:
                data = json.loads(line[6:])
                if "choices" in data and data["choices"]:
                    content = data["choices"][0].get("delta", {}).get("content", "")
                    accumulated_text += content
                    
                    # Check for sentence boundaries
                    if any(punct in content for punct in ['.', '!', '?']):
                        # Extract complete sentences
                        new_sentences = extract_sentences(accumulated_text)
                        for sentence in new_sentences[len(sentences):]:
                            # Generate TTS for each complete sentence
                            audio = generate_tts(sentence)
                            yield audio
                        sentences = new_sentences
                        
            except json.JSONDecodeError:
                continue

Best Practices

1. Text Optimization

Prepare text for optimal speech generation:

Python

import re

def optimize_text_for_tts(text):
    # Expand abbreviations
    abbreviations = {
        "Mr.": "Mister",
        "Dr.": "Doctor",
        "Inc.": "Incorporated",
        "Ltd.": "Limited",
        "etc.": "et cetera",
        "e.g.": "for example",
        "i.e.": "that is"
    }
    
    for abbr, expansion in abbreviations.items():
        text = text.replace(abbr, expansion)
    
    # Handle numbers
    text = re.sub(r'\b(\d+)\b', lambda m: num_to_words(int(m.group(1))), text)
    
    # Clean up extra whitespace
    text = re.sub(r'\s+', ' ', text).strip()
    
    return text

def num_to_words(n):
    # Simple number to words conversion
    ones = ["", "one", "two", "three", "four", "five", "six", "seven", "eight", "nine"]
    teens = ["ten", "eleven", "twelve", "thirteen", "fourteen", "fifteen", "sixteen", "seventeen", "eighteen", "nineteen"]
    tens = ["", "", "twenty", "thirty", "forty", "fifty", "sixty", "seventy", "eighty", "ninety"]
    
    if n < 10:
        return ones[n]
    elif n < 20:
        return teens[n-10]
    elif n < 100:
        return tens[n//10] + ("" if n%10 == 0 else " " + ones[n%10])
    # Add more cases as needed
    else:
        return str(n)  # Fallback to digit representation

2. Voice Selection

Choose appropriate voices for different contexts:

Python

def select_voice_for_content(content_type, target_audience="general"):
    voice_map = {
        ("educational", "children"): "af_sarah",
        ("educational", "adults"): "af_bella",
        ("commercial", "general"): "af_nicole",
        ("technical", "general"): "am_adam",
        ("announcement", "general"): "am_michael",
        ("storytelling", "children"): "af_sarah",
        ("news", "general"): "am_adam"
    }
    
    return voice_map.get((content_type, target_audience), "af_bella")

# Usage
voice = select_voice_for_content("educational", "adults")

3. Caching Strategy

Implement intelligent caching for repeated content:

Python

import hashlib
import os

class TTSCache:
    def __init__(self, cache_dir="tts_cache"):
        self.cache_dir = cache_dir
        os.makedirs(cache_dir, exist_ok=True)
    
    def get_cache_key(self, text, voice):
        content = f"{text}:{voice}"
        return hashlib.md5(content.encode()).hexdigest()
    
    def get_cached_audio(self, text, voice):
        cache_key = self.get_cache_key(text, voice)
        cache_file = os.path.join(self.cache_dir, f"{cache_key}.wav")
        
        if os.path.exists(cache_file):
            with open(cache_file, "rb") as f:
                return f.read()
        return None
    
    def cache_audio(self, text, voice, audio_data):
        cache_key = self.get_cache_key(text, voice)
        cache_file = os.path.join(self.cache_dir, f"{cache_key}.wav")
        
        with open(cache_file, "wb") as f:
            f.write(audio_data)
    
    def generate_or_get_cached(self, text, voice="af_bella"):
        # Check cache first
        cached_audio = self.get_cached_audio(text, voice)
        if cached_audio:
            return cached_audio
        
        # Generate new audio
        response = requests.post(
            "https://cxmpute.cloud/api/v1/tts",
            headers=headers,
            json={"text": text, "voice": voice}
        )
        
        if response.ok:
            audio_data = response.content
            self.cache_audio(text, voice, audio_data)
            return audio_data
        
        raise Exception(f"TTS generation failed: {response.status_code}")

# Usage
tts_cache = TTSCache()
audio = tts_cache.generate_or_get_cached("Welcome to our service!", "af_bella")

Pricing

During our testnet phase, all services are completely free for all users! Pricing for the mainnet launch is to be determined (TBD).

Join our Discord community to stay updated on pricing announcements, give feedback, and connect with other developers building with Cxmpute.

Support & Community

Discord: Join our community for TTS tips and support
Examples: Find more examples in our GitHub repository

Ready to add voice to your applications? Start with our simple API and create engaging audio experiences for your users!