Skip to content

Usage

  1. Download a voice and extract the .onnx and .onnx.json files
  2. Run the piper binary with text on standard input, --model /path/to/your-voice.onnx, and --output_file output.wav

For example:

Terminal window
echo 'Welcome to the world of speech synthesis!' | \
./piper --model en_US-lessac-medium.onnx --output_file welcome.wav

For multi-speaker models, use --speaker <number> to change speakers (default: 0).

See piper --help for more options.

Streaming Audio

Piper can stream raw audio to stdout as its produced:

Terminal window
echo 'This sentence is spoken first. This sentence is synthesized while the first sentence is spoken.' | \
./piper --model en_US-lessac-medium.onnx --output-raw | \
aplay -r 22050 -f S16_LE -t raw -

This is raw audio and not a WAV file, so make sure your audio player is set to play 16-bit mono PCM samples at the correct sample rate for the voice.

JSON Input

The piper executable can accept JSON input when using the --json-input flag. Each line of input must be a JSON object with text field. For example:

{ "text": "First sentence to speak." }
{ "text": "Second sentence to speak." }

Optional fields include:

  • speaker - string
    • Name of the speaker to use from speaker_id_map in config (multi-speaker voices only)
  • speaker_id - number
    • Id of speaker to use from 0 to number of speakers - 1 (multi-speaker voices only, overrides “speaker”)
  • output_file - string
    • Path to output WAV file

The following example writes two sentences with different speakers to different files:

{ "text": "First speaker.", "speaker_id": 0, "output_file": "/tmp/speaker_0.wav" }
{ "text": "Second speaker.", "speaker_id": 1, "output_file": "/tmp/speaker_1.wav" }

If you’d like to use a GPU, install the onnxruntime-gpu package:

Terminal window
.venv/bin/pip3 install onnxruntime-gpu

and then run piper with the --cuda argument. You will need to have a functioning CUDA environment, such as what’s available in NVIDIA’s PyTorch containers.