Text-to-Speech
Transformers
ONNX
GGUF
Chinese
English
voice-dialogue
speech-recognition
large-language-model
asr
tts
llm
chinese
english
real-time
conversational
Instructions to use MoYoYoTech/VoiceDialogue with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use MoYoYoTech/VoiceDialogue with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-to-speech", model="MoYoYoTech/VoiceDialogue") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("MoYoYoTech/VoiceDialogue", dtype="auto") - llama-cpp-python
How to use MoYoYoTech/VoiceDialogue with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="MoYoYoTech/VoiceDialogue", filename="assets/models/llm/qwen/Qwen3-8B-Q6_K.gguf", )
llm.create_chat_completion( messages = "\"The answer to the universe is 42\"" )
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use MoYoYoTech/VoiceDialogue with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf MoYoYoTech/VoiceDialogue:Q6_K # Run inference directly in the terminal: llama-cli -hf MoYoYoTech/VoiceDialogue:Q6_K
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf MoYoYoTech/VoiceDialogue:Q6_K # Run inference directly in the terminal: llama-cli -hf MoYoYoTech/VoiceDialogue:Q6_K
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf MoYoYoTech/VoiceDialogue:Q6_K # Run inference directly in the terminal: ./llama-cli -hf MoYoYoTech/VoiceDialogue:Q6_K
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf MoYoYoTech/VoiceDialogue:Q6_K # Run inference directly in the terminal: ./build/bin/llama-cli -hf MoYoYoTech/VoiceDialogue:Q6_K
Use Docker
docker model run hf.co/MoYoYoTech/VoiceDialogue:Q6_K
- LM Studio
- Jan
- Ollama
How to use MoYoYoTech/VoiceDialogue with Ollama:
ollama run hf.co/MoYoYoTech/VoiceDialogue:Q6_K
- Unsloth Studio
How to use MoYoYoTech/VoiceDialogue with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for MoYoYoTech/VoiceDialogue to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for MoYoYoTech/VoiceDialogue to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for MoYoYoTech/VoiceDialogue to start chatting
- Pi
How to use MoYoYoTech/VoiceDialogue with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf MoYoYoTech/VoiceDialogue:Q6_K
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "MoYoYoTech/VoiceDialogue:Q6_K" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use MoYoYoTech/VoiceDialogue with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf MoYoYoTech/VoiceDialogue:Q6_K
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default MoYoYoTech/VoiceDialogue:Q6_K
Run Hermes
hermes
- Docker Model Runner
How to use MoYoYoTech/VoiceDialogue with Docker Model Runner:
docker model run hf.co/MoYoYoTech/VoiceDialogue:Q6_K
- Lemonade
How to use MoYoYoTech/VoiceDialogue with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull MoYoYoTech/VoiceDialogue:Q6_K
Run and chat with the model
lemonade run user.VoiceDialogue-Q6_K
List all available models
lemonade list
File size: 2,649 Bytes
1858ba9 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 | # 功能特性
VoiceDialogue 系统集成了多项先进技术,提供端到端的语音交互体验。
## 🎵 音频处理
- **回声消除音频捕获** - 自动消除回声干扰,提升语音质量
- **语音活动检测(VAD)** - 智能检测用户说话状态,自动开始/停止录制
- **实时音频流处理** - 低延迟音频播放和处理
## 🗣️ 语音识别
- **智能语音识别引擎** - 中文使用FunASR高精度识别,其他语言使用Whisper模型
- **自动语言切换** - 根据启动参数自动选择最优识别引擎
- **实时转录处理** - 流式语音转文本处理,降低响应延迟
## 🧠 语言模型
- **Qwen2.5 (14B)** - 内置阿里巴巴开源的中文优化模型
- **LangChain 集成** - 方便扩展和支持更多语言模型
- **自定义系统提示词** - 可在代码中配置 AI 助手的行为风格
## 🎭 语音合成
项目集成了两种先进的语音合成技术,支持动态说话人管理:
#### GPT-SoVITs 技术(中文角色)
基于 GPT-SoVITs 的中文语音合成,支持以下角色:
- **罗翔** (Luo Xiang) - 法学教授风格,具有幽默风趣和深入浅出的讲解风格
- **马保国** (Ma Baoguo) - 太极大师风格,带有标志性的口音和语调特色
- **沈逸** (Shen Yi) - 学者风格,具有理性分析风格和富有磁性的嗓音
- **杨幂** (Yang Mi) - 明星风格,拥有清甜动人的声线和自然流畅的表达方式
- **周杰伦** (Zhou Jielun) - 歌手风格,具有标志性的说话风格和音乐气质
- **马云** (Ma Yun) - 企业家风格,富有激情的演讲风格和商业洞察表达方式
#### Kokoro TTS 技术(英文角色)
基于 Kokoro TTS 的英文语音合成,支持以下角色:
- **Heart** - 温暖亲切的英语女性语音,声音富有感情色彩
- **Bella** - 优质的英语女性语音,具有清晰自然的发音和良好的表现力
- **Nicole** - 高质量的英语女性语音,发音清晰准确,语调自然流畅
#### 技术特点
- **智能引擎选择** - 系统根据内容语言自动选择最适合的TTS引擎
- **动态说话人管理** - 支持运行时动态加载和切换说话人
- **高质量合成** - 采用先进的神经网络技术,生成自然流畅的语音
- **可扩展架构** - 模块化设计,方便添加更多语音角色和TTS引擎
## ⚙️ 服务模式
- **命令行模式 (CLI)** - 在终端中直接运行,提供实时语音交互体验
- **API 服务模式** - 启动一个 FastAPI Web 服务器,提供 HTTP 接口进行交互
- **桌面应用模式 (Electron)** - 提供图形界面的桌面应用程序。
|