Revision as of 08:03, 6 November 2012

<<< back to Wiki Home

Getting Started

Festival Speech Synthesis System - http://www.cstr.ed.ac.uk/projects/festival/
build on os x with do_prompt capabilities - http://linguisticmystic.com/2011/07/15/using-festival-tts-on-os-x/
- http://permalink.gmane.org/gmane.science.tts.festvox/381
this is a class - http://www.speech.cs.cmu.edu/15-492/assignments/tts/index.html

Learning

Book - http://festvox.org/festvox/book1.html
short tutorial - http://festvox.org/festtut-2.0/
exercises and hints - http://festvox.org/festtut-2.0/exercises/

Slides on HTS Synthesis

http://www.sp.nitech.ac.jp/~tokuda/tokuda_iscslp2006.pdf

Training Voice Models

howto http://festvox.org/festvox/c3170.html#AEN3172
training text input - http://www.festvox.org/cmu_arctic/cmuarctic.data
useful tips http://festvox.org/index.html, including name for EMU speech database system http://www.shlrc.mq.edu.au/emu/
Building a CLUSTERGEN Statistical Parametric Synthesizer: http://festvox.org/festvox/c3170.html#AEN3172

Building a Unit Selection Cluster Voice

(from here http://festvox.org/festvox/x3082.html)

```
mkdir uw_uw_rdt
```

cd uw_uw_rdt

uniphone setup:

 $FESTVOXDIR/src/unitsel/setup_clunits uw us rdt uniphone

generate prompts and prompt files:

festival -b festvox/build_clunits.scm '(build_prompts_waves "etc/uniphone.data")'

record sound, using audacity. save as 16k, 16bit mono.
make labels:
```
./bin/make_labs prompt-wav/*.wav
```

build utterance structure:

festival -b festvox/build_clunits.scm '(build_utts "etc/uniphone.data")'

do pitch marking:
```
./bin/make_pm_wave etc/uniphone.data
```
find Mel Frequency Cepstral Coefficients:
```
./bin/make_mcep etc/uniphone.data
```

build cluster unit selection synth:

festival -b festvox/build_clunits.scm '(build_clunits "etc/uniphone.data")'

Using a Unit Selection Cluster Voice Synth

from uw_us_rdt directory:
```
festival festvox/uw_us_rdt_clunits.scm
```
in Scheme:
```
(voice_uw_us_rdt_clunits) 
```
```
(SayText "this is a little test.")
```

Building a CLUSTERGEN Statistical Parametric Synthesizer

adapted from http://festvox.org/festvox/c3170.html#AEN3172

mkdir uw_us_rdt_arctic

uw_us_rdt_arctic $FESTVOXDIR/src/clustergen/setup_cg uw us rdt_arctic

copy text into etc/txt.done.data. use some of the lines from here http://www.festvox.org/cmu_arctic/cmuarctic.data
copy audio files into wav/
use
```
bin/get_wavs
```
to copy files to power normalize and convert to proper format.

Building a Unit Selection Cluster Voice from TIMIT data

(from here http://festvox.org/festvox/x3082.html)

```
mkdir uw_uw_rdt_timit
```

cd uw_uw_rdt_timit

timit setup:

 $FESTVOXDIR/src/unitsel/setup_clunits uw us rdt timit

generate prompts and prompt files:

festival -b festvox/build_clunits.scm '(build_prompts_waves "etc/timit.data")'

record sound, using audacity. save as 16k, 16bit mono.
make labels:
```
./bin/make_labs prompt-wav/*.wav
```

build utterance structure:

festival -b festvox/build_clunits.scm '(build_utts "etc/timit.data")'

do pitch marking:
```
./bin/make_pm_wave etc/timit.data
```
find Mel Frequency Cepstral Coefficients:
```
./bin/make_mcep etc/timit.data
```

build cluster unit selection synth:

festival -b festvox/build_clunits.scm '(build_clunits "etc/timit.data")'

Improving Quality

Fix phoneme labeling - http://sourceforge.net/projects/wavesurfer/
tuning a voice - http://www.cstr.ed.ac.uk/emasters/summer_school_2005/tutorial3/tutorial.html

Using Voices

using meghan voice

Run the Server

open terminal:

cd /Users/murmur/Desktop/meghan festival_server -c meghans_special_sauce.scm

To kill the server:

Control-C

Run the Client

open a 2nd terminal window:

	cd /Users/murmur/Desktop/meghan 
	festival_client myfile.txt --ttw --output client_test.wav

Other stuff (python):

import os
os.popen("/Applications/festival_2.1/festival/src/main/festival_client /Users/murmur/Desktop/meghan/myfile.txt --ttw --output /Users/murmur/Desktop/meghan/client_test78.wav")

Using A Newly Trained Voice

Modify the voice so festival knows it's there

append proclaim message to your newly trained model uw_us_rdt_clunits.scm in uw_us_rdt_clunits/festvox:

(proclaim_voice
 'uw_us_rdt_clunits
 '((language english)
   (gender male)
   (dialect american)
   (description
    "This is Robert Twomey trained on CLUNITS, TIMIT databse.")))

(provide 'uw_us_rdt_clunits)

Install voice to festival directory

http://roberttwomey.com/downloads/uw_us_rdt_clunits.tar.gz
unzip file from festival root directory, it should install to the correct directory
copy your newly trained voice to festival/lib/voices/english/
the name of your new voice directory (ex: uw_us_rdt_clunits/) needs to match the voice file (ex: uw_us_rdt_clunits/festvox/uw_us_rdt_clunits.scm)

Configure festival to use your voice by default

to set your voice as default (and add a special pause entry), add the following to festival/etc/siteinit.scm:

(set! voice_default 'voice_uw_us_rdt_clunits)

(lex.add.entry '("<break>" n (((pau pau) 0))))

the lex.add.entry line makes a new word in the lexicon <break> that adds a pause.
run festival_server and it will load your new voice by default
http://www.cstr.ed.ac.uk/projects/festival/manual/festival_24.html

Run the Server

from anywhere: festival_server

Run the Client

run the client: echo "Do you really want to see all of it?" | festival_client --ttw --output test.wav
generates a wave file

Tuning phrasing, prosody, etc with SABLE

http://www.cstr.ed.ac.uk/projects/festival/manual/festival_10.html#SEC31

@@ Line 11: / Line 11: @@
 *short tutorial - http://festvox.org/festtut-2.0/
 *exercises and hints - http://festvox.org/festtut-2.0/exercises/
+==Slides on HTS Synthesis==
+*http://www.sp.nitech.ac.jp/~tokuda/tokuda_iscslp2006.pdf
 =Training Voice Models=

Difference between revisions of "Festival TTS"

Revision as of 08:03, 6 November 2012

Contents

Getting Started

Learning

Slides on HTS Synthesis

Training Voice Models

Building a Unit Selection Cluster Voice

Using a Unit Selection Cluster Voice Synth

Building a CLUSTERGEN Statistical Parametric Synthesizer

Building a Unit Selection Cluster Voice from TIMIT data

Improving Quality

Using Voices

using meghan voice

Run the Server

Run the Client

Using A Newly Trained Voice

Modify the voice so festival knows it's there

Install voice to festival directory

Configure festival to use your voice by default

Run the Server

Run the Client

Tuning phrasing, prosody, etc with SABLE

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools

Support