How to Install and Use Fish-Speech for Text-to-Speech Synthesis π
Saturday, Dec 14, 2024 | 7 minute read
π The next-gen voice synthesis system is revolutionizing TTS with its multilingual capabilities, high accuracy, real-time processing, emotional nuance, and open-source community support! Perfect for developers and researchers alike! πβ¨
“In today’s fast-paced technological world, voice synthesis technology is beginning a new chapter, so let’s explore the leaders in this field together!” π
β‘οΈ Discover Fish-Speech: The Next Voice Synthesis Superhero!
Fish-Speech is the latest generation of Text-to-Speech (TTS) systems, and its version V1.4 has captured attention with cutting-edge multilingual technology and high-fidelity voice synthesis capabilities! πβ¨ This system boasts an impressive training dataset, containing approximately 700,000 hours of multilingual audio, with a special emphasis on support for English and Chinese, two vital languages.
π₯ What Makes Fish-Speech Unique?
Fish-Speech excels in voice synthesis. Its zero-shot and few-shot TTS features allow users to easily generate natural and smooth speech by just inputting short voice samples. π’π€ Notably, Fish-Speech has an extremely high accuracy rate in voice generation, achieving an astonishingly low character error rate (CER) and word error rate (WER) of around 2%! β¨
In terms of processing speed, Fish-Speech performs exceptionally, enabling real-time inference on hardware like the NVIDIA RTX 4060 and RTX 4090, significantly enhancing user experience. π¨π» Its phoneme-independent design allows Fish-Speech to handle various language scripts flexibly, greatly improving multilingual support. π
Additionally, the emotional and tone control features in Fish-Speech ensure that the generated speech is not monotonous but instead filled with emotional nuances and rich variations, perfectly matching the demands of various application scenarios! πΆβ€οΈ
π¨βπ» Why Do Developers Prefer Fish-Speech?
As an open-source project, Fish-Speech cultivates an active community atmosphere, encouraging developers to collaborate and contribute code to enhance its functionalities and effects. π€π‘ The project follows the CC-BY-NC-SA-4.0 license, emphasizing responsible use, making it well-suited for academic research and personal projects, and providing users with a clear framework for usage.
It’s also worth noting that the ongoing updates to Fish-Speech highlight new features and enhancements, showcasing a commitment to continuous performance improvement and diversified capabilities, greatly exciting developers! ππ§
π Real-World Applications and Community Feedback
Users can now access the Fish Agent demo through an online platform and find a comprehensive feature introduction in the Fish Speech documentation, helping them better understand and utilize the system. π²βοΈ Community feedback and contributions have already been reflected in version updates, enhancing user trust and support for Fish-Speech, driving its widespread application in the realm of voice synthesis.
With its outstanding open-source characteristics and powerful features, Fish-Speech is becoming the most anticipated voice synthesis star of the technological development era. Whether for individual developers or academic institutions, this project offers immense benefits and inspiration! ππ
π Getting Started with Fish-Speech: Installation and Usage Guide
Installation Steps - For Windows Users π₯οΈ
Letβs quickly go over how to install Fish-Speech on a Windows system! The process is straightforward, and weβll start with installation for professional users.
Professional User Installation Steps π§
-
Create a Python 3.10 Virtual Environment
conda create -n fish-speech python=3.10 conda activate fish-speech
With the above command, we create a virtual environment named
fish-speech
. The benefit of a virtual environment is that it provides a clean workspace for your project, preventing conflicts with other Python packages on your system. -
Install PyTorch
pip3 install torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1 --index-url https://download.pytorch.org/whl/cu121
In this step, we install PyTorch and its related libraries for CUDA 12.x, ensuring efficient computations leveraging GPU power. Choosing the right version of PyTorch is crucial, directly impacting model training and inference speed.
-
Install Fish Speech
pip3 install -e .
This command installs the Fish-Speech project in editable mode in the current directory. Editable mode means you can modify the code at any time without needing to reinstall, offering great flexibility!
-
(Optional) Install Triton for Enhanced Performance
pip install https://github.com/AnyaCoder/fish-speech/releases/download/v0.1.0/triton_windows-0.1.0-py3-none-any.whl
Triton is a tool designed to accelerate deep learning inference; while optional, its installation is highly recommended for optimal performance.
Non-Professional User Installation Steps π»
-
Extract the Project Package
- Simply unzip the project package and locate the
install_env.bat
file. Double-click to run it, automatically setting up the environment. This is very user-friendly for beginners, as you do not need to manually input all commands.
- Simply unzip the project package and locate the
-
Download and Install LLVM Compiler and Microsoft Visual C++ Redistributable
- These two tools are dependencies that ensure the project runs smoothly, providing necessary support.
-
Open the WebUI Management Interface
start.bat
Double-clicking this file will launch the Web UI for Fish-Speech, making subsequent operations convenient.
Linux User Installation Steps π§
Installing Fish-Speech on a Linux system is just as simple and straightforward! Here are the steps for Linux users.
-
Create a Python 3.10 Virtual Environment
conda create -n fish-speech python=3.10 conda activate fish-speech
-
Install PyTorch
pip3 install torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1
Here we omit the CUDA details, as PyTorch will automatically detect the suitable version based on the environment on some systems.
-
Install Ubuntu/Debian Dependencies
apt install libsox-dev ffmpeg apt install build-essential cmake libasound-dev portaudio19-dev libportaudio2 libportaudiocpp0
These dependencies will ensure your environment supports audio processing and provides the necessary tools for project compilation.
-
Install Fish Speech
pip3 install -e .[stable]
This command is similar to the one used in Windows, but here we explicitly state that we want to install stable dependencies.
macOS User Installation Steps π
Installing Fish-Speech on macOS is also quite clear.
-
Create a Python 3.10 Virtual Environment
conda create -n fish-speech python=3.10 conda activate fish-speech
-
Install PyTorch
pip install torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1
On macOS, you can directly use the pip command to install PyTorch after setting up the dependency for Python environment.
-
Install Fish Speech
pip install -e .[stable]
Similar to Linux, here you’ll also install the stable version in editable mode, ensuring flexibility during development.
Docker User Installation Steps π¦
Running Fish-Speech using Docker is an incredibly convenient method, especially for developers. Hereβs the detailed breakdown.
-
Install NVIDIA Container Toolkit for GPU Support
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \ && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \ sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \ sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list sudo apt-get update sudo apt-get install -y nvidia-container-toolkit sudo systemctl restart docker
This code sets up the NVIDIA container toolkit to support GPU acceleration in the Docker environment.
-
Pull and Run the Fish Speech Image
docker pull fishaudio/fish-speech:latest-dev docker run -it \ --name fish-speech \ --gpus all \ -p 7860:7860 \ fishaudio/fish-speech:latest-dev \ zsh
With these two commands, you first pull the latest development version of the Fish-Speech image, then launch an interactive container that allows you to run and test code within it.
-
Download Model Dependencies Inside the Docker Container
huggingface-cli download fishaudio/fish-speech-1.5 --local-dir checkpoints/fish-speech-1.5
This step downloads the model’s necessary weights and saves them in the specified local directory, ensuring subsequent voice synthesis services have the required data.
-
Set Environment Variables and Access the Web UI
export GRADIO_SERVER_NAME="0.0.0.0" python tools/run_webui.py
After running the above commands, you can access the Web UI at
http://localhost:7860
to interact with the model and validate functionality.
Usage Example - Text-to-Speech Conversion π€
Now, let’s see how to use Fish-Speech to convert text to speech. Below is a simple code example to help you understand how to achieve this.
from fish_speech import FishSpeech
# Instantiate the FishSpeech object
synthesizer = FishSpeech()
# Text to synthesize
text = "Hello, welcome to Fish-Speech for text-to-speech synthesis."
# Voice synthesis
audio = synthesizer.synthesize(text)
# Play the synthesized audio
audio.play()
In this example, we first import the FishSpeech
class and create an instance of the synthesizer. Then we define the text to be synthesized, using the synthesize
method to convert it to speech. Finally, we play the synthesized audio using audio.play()
. This code illustrates how easy it is to use Fish-Speech.
By following these steps, you’ll be able to smoothly install and experience Fish-Speech, turning text into vibrant speech and starting your voice synthesis journey! π