A modular Python library for voice interactions with AI systems, providing text-to-speech (TTS) and speech-to-text (STT) capabilities with interrupt handling.
While we provide CLI and WEB examples, VoiceLLM is designed to be integrated in other projects.
- Text-to-Speech: High-quality speech synthesis with adjustable speed
- Speech-to-Text: Accurate voice recognition using OpenAI's Whisper
- Voice Activity Detection: Efficient speech detection using WebRTC VAD
- Interrupt Handling: Stop TTS by speaking or using stop commands
- Modular Design: Easily integrate with any text generation system
- Python 3.8+ (3.11 recommended)
- PortAudio for audio input/output
# Install from PyPI
pip install voicellm
# Or clone the repository
git clone https://github.com/lpalbou/voicellm.git
cd voicellm
pip install -e .
# Install with development dependencies
pip install "voicellm[dev]"
# Install all dependencies including the package
pip install -r requirements.txt
from voicellm import VoiceManager
import time
# Initialize voice manager
voice_manager = VoiceManager(debug_mode=False)
# Text to speech
voice_manager.speak("Hello, I am an AI assistant. How can I help you today?")
# Wait for speech to complete
while voice_manager.is_speaking():
time.sleep(0.1)
# Speech to text with callback
def on_transcription(text):
print(f"User said: {text}")
if text.lower() != "stop":
# Process with your text generation system
response = f"You said: {text}"
voice_manager.speak(response)
# Start voice recognition
voice_manager.listen(on_transcription)
# Wait for user to say "stop" or press Ctrl+C
try:
while voice_manager.is_listening():
time.sleep(0.1)
except KeyboardInterrupt:
pass
# Clean up
voice_manager.cleanup()
The package includes several examples that demonstrate different ways to use VoiceLLM.
If installed globally, you can launch VoiceLLM directly in voice mode:
# Start VoiceLLM in voice mode (TTS ON, STT ON)
voicellm
# With options
voicellm --debug --whisper base --model gemma3:latest --api http://localhost:11434/api/chat
Command line options:
--debug
: Enable debug mode with detailed logging--api
: URL of the Ollama API (default: http://localhost:11434/api/chat)--model
: Ollama model to use (default: granite3.3:2b)- Other examples : cogito:3b, phi4-mini:latest, qwen2.5:latest, cogito:latest, gemma3:latest, etc.
--whisper
: Whisper model to use (tiny, base, small, medium, large)--no-voice
: Start in text mode instead of voice mode--system
: Custom system prompt
# Run the CLI example (TTS ON, STT OFF)
voicellm-cli cli
# With debug mode
voicellm-cli cli --debug
# Run the web API example
voicellm-cli web
# With different host and port
voicellm-cli web --host 0.0.0.0 --port 8000
You can also run a simplified version that doesn't load the full models:
# Run the web API with simulation mode
voicellm-cli web --simulate
If you encounter issues with the web API:
- 404 Not Found: Make sure you're accessing the correct endpoints (e.g.,
/api/test
,/api/tts
) - Connection Issues: Ensure no other service is using the port
- Model Loading Errors: Try running with
--simulate
flag to test without loading models - Dependencies: Ensure all required packages are installed:
pip install flask soundfile numpy requests
- Test with a simple Flask script:
from flask import Flask app = Flask(__name__) @app.route('/') def home(): return "Flask works!" app.run(host='127.0.0.1', port=5000)
# Run the simple example
voicellm-cli simple
The main class that coordinates TTS and STT functionality:
# Initialize
manager = VoiceManager(tts_model="tts_models/en/ljspeech/tacotron2-DDC",
whisper_model="tiny", debug_mode=False)
# TTS
manager.speak(text, speed=1.0, callback=None)
manager.stop_speaking()
manager.is_speaking()
# STT
manager.listen(on_transcription, on_stop=None)
manager.stop_listening()
manager.is_listening()
# Configuration
manager.change_whisper_model(model_name)
manager.change_vad_aggressiveness(aggressiveness)
# Cleanup
manager.cleanup()
Handles text-to-speech synthesis:
from voicellm.tts import TTSEngine
tts = TTSEngine(model_name="tts_models/en/ljspeech/tacotron2-DDC", debug_mode=False)
tts.speak(text, speed=1.0, callback=None)
tts.stop()
tts.is_active()
Manages speech recognition with VAD:
from voicellm.recognition import VoiceRecognizer
def on_transcription(text):
print(f"Transcribed: {text}")
def on_stop():
print("Stop command detected")
recognizer = VoiceRecognizer(transcription_callback=on_transcription,
stop_callback=on_stop,
whisper_model="tiny",
debug_mode=False)
recognizer.start(tts_interrupt_callback=None)
recognizer.stop()
recognizer.change_whisper_model("base")
recognizer.change_vad_aggressiveness(2)
VoiceLLM is designed to be used with any text generation system:
from voicellm import VoiceManager
import requests
# Initialize voice manager
voice_manager = VoiceManager()
# Function to call text generation API
def generate_text(prompt):
response = requests.post("http://localhost:11434/api/chat", json={
"model": "granite3.3:2b",
"messages": [{"role": "user", "content": prompt}],
"stream": False
})
return response.json()["message"]["content"]
# Callback for speech recognition
def on_transcription(text):
if text.lower() == "stop":
return
print(f"User: {text}")
# Generate response
response = generate_text(text)
print(f"AI: {response}")
# Speak response
voice_manager.speak(response)
# Start listening
voice_manager.listen(on_transcription)
This is a test project that I designed with examples to work with Ollama, but I will adapt the examples and voicellm to work with any LLM provider (anthropic, openai, etc).
VoiceLLM is licensed under the MIT License.
This project depends on several open-source libraries and models, each with their own licenses. Please see ACKNOWLEDGMENTS.md for a detailed list of dependencies and their respective licenses.
Some dependencies, particularly certain TTS models, may have non-commercial use restrictions. If you plan to use VoiceLLM in a commercial application, please ensure you are using models that permit commercial use or obtain appropriate licenses.