Usage
- Download a voice and extract the
.onnxand.onnx.jsonfiles - Run the
piperbinary with text on standard input,--model /path/to/your-voice.onnx, and--output_file output.wav
For example:
echo 'Welcome to the world of speech synthesis!' | \ ./piper --model en_US-lessac-medium.onnx --output_file welcome.wavFor multi-speaker models, use --speaker <number> to change speakers (default: 0).
See piper --help for more options.
Streaming Audio
Piper can stream raw audio to stdout as its produced:
echo 'This sentence is spoken first. This sentence is synthesized while the first sentence is spoken.' | \ ./piper --model en_US-lessac-medium.onnx --output-raw | \ aplay -r 22050 -f S16_LE -t raw -This is raw audio and not a WAV file, so make sure your audio player is set to play 16-bit mono PCM samples at the correct sample rate for the voice.
JSON Input
The piper executable can accept JSON input when using the --json-input flag. Each line of input must be a JSON object with text field. For example:
{ "text": "First sentence to speak." }{ "text": "Second sentence to speak." }Optional fields include:
speaker- string- Name of the speaker to use from
speaker_id_mapin config (multi-speaker voices only)
- Name of the speaker to use from
speaker_id- number- Id of speaker to use from 0 to number of speakers - 1 (multi-speaker voices only, overrides “speaker”)
output_file- string- Path to output WAV file
The following example writes two sentences with different speakers to different files:
{ "text": "First speaker.", "speaker_id": 0, "output_file": "/tmp/speaker_0.wav" }{ "text": "Second speaker.", "speaker_id": 1, "output_file": "/tmp/speaker_1.wav" }If you’d like to use a GPU, install the onnxruntime-gpu package:
.venv/bin/pip3 install onnxruntime-gpuand then run piper with the --cuda argument. You will need to have a functioning CUDA environment, such as what’s available in NVIDIA’s PyTorch containers.