Latest revision as of 05:45, 10 March 2017

<<< back to Wiki Home

Getting Started

Festival Speech Synthesis System - http://www.cstr.ed.ac.uk/projects/festival/
build on os x with do_prompt capabilities - http://linguisticmystic.com/2011/07/15/using-festival-tts-on-os-x/
- http://permalink.gmane.org/gmane.science.tts.festvox/381
this is a class - http://www.speech.cs.cmu.edu/15-492/assignments/tts/index.html

Learning

Book - http://festvox.org/festvox/book1.html
short tutorial - http://festvox.org/festtut-2.0/
exercises and hints - http://festvox.org/festtut-2.0/exercises/

Slides on HTS Synthesis

http://www.sp.nitech.ac.jp/~tokuda/tokuda_iscslp2006.pdf

Training Voice Models

howto http://festvox.org/festvox/c3170.html#AEN3172
training text input - http://www.festvox.org/cmu_arctic/cmuarctic.data
useful tips http://festvox.org/index.html, including name for EMU speech database system http://www.shlrc.mq.edu.au/emu/
Building a CLUSTERGEN Statistical Parametric Synthesizer: http://festvox.org/festvox/c3170.html#AEN3172

Building a Unit Selection Cluster Voice

(from here http://festvox.org/festvox/x3086.html)

```
mkdir uw_uw_rdt
```

cd uw_uw_rdt

uniphone setup:

 $FESTVOXDIR/src/unitsel/setup_clunits uw us rdt uniphone

generate prompts and prompt files:

festival -b festvox/build_clunits.scm '(build_prompts_waves "etc/uniphone.data")'

record sound, using audacity. save as 16k, 16bit mono.
make labels:
```
./bin/make_labs prompt-wav/*.wav
```

build utterance structure:

festival -b festvox/build_clunits.scm '(build_utts "etc/uniphone.data")'

do pitch marking:
```
./bin/make_pm_wave etc/uniphone.data
```
find Mel Frequency Cepstral Coefficients:
```
./bin/make_mcep etc/uniphone.data
```

build cluster unit selection synth:

festival -b festvox/build_clunits.scm '(build_clunits "etc/uniphone.data")'

Using a Unit Selection Cluster Voice Synth

from uw_us_rdt directory:
```
festival festvox/uw_us_rdt_clunits.scm
```
in Scheme:
```
(voice_uw_us_rdt_clunits) 
```
```
(SayText "this is a little test.")
```

Building a CLUSTERGEN Statistical Parametric Synthesizer

(adapted from http://festvox.org/festvox/c3174.html#AEN3176)

```
mkdir uw_us_rdt_arctic
```
```
cd uw_us_rdt_arctic
```

$FESTVOXDIR/src/clustergen/setup_cg uw us rdt_arctic

copy text into
```
etc/txt.done.data
```
. use some of the lines from here http://www.festvox.org/cmu_arctic/cmuarctic.data
copy audio files into
```
wav/
```
use
```
bin/get_wavs
```
to copy files to power normalize and convert to proper format.

Building a Unit Selection Cluster Voice from TIMIT data

(from here http://festvox.org/festvox/c2645.html#AEN2716)

```
mkdir uw_uw_rdt_timit
```
```
cd uw_uw_rdt_timit
```

timit setup:

 $FESTVOXDIR/src/unitsel/setup_clunits uw us rdt timit

generate prompts and prompt files:

festival -b festvox/build_clunits.scm '(build_prompts_waves "etc/timit.data")'

record sound, using audacity. save as 16k, 16bit mono.

copy sound files from recording directory into voice directory.

 ./bin/get_wavs ~/Sounds/TIMIT_Training_Data/warehouse_omni/*.wav

make labels:
```
./bin/make_labs prompt-wav/*.wav
```

build utterance structure:

festival -b festvox/build_clunits.scm '(build_utts "etc/timit.data")'

do pitch marking:
```
./bin/make_pm_wave etc/timit.data
```
find Mel Frequency Cepstral Coefficients:
```
./bin/make_mcep etc/timit.data
```

build cluster unit selection synth:

festival -b festvox/build_clunits.scm '(build_clunits "etc/timit.data")'

Improving Quality

Fix phoneme labeling - http://sourceforge.net/projects/wavesurfer/
tuning a voice - http://www.cstr.ed.ac.uk/emasters/summer_school_2005/tutorial3/tutorial.html

Using Voices

using meghan voice

Run the Server

open terminal:

cd /Users/murmur/Desktop/meghan 
festival_server -c meghans_special_sauce.scm

To kill the server:

Control-C

Run the Client

open a 2nd terminal window:

cd /Users/murmur/Desktop/meghan 
festival_client myfile.txt --ttw --output client_test.wav

Other stuff (python):

import os
os.popen("/Applications/festival_2.1/festival/src/main/festival_client /Users/murmur/Desktop/meghan/myfile.txt --ttw --output /Users/murmur/Desktop/meghan/client_test78.wav")

Using A Newly Trained Voice

Modify the voice so festival knows it's there

append proclaim message to your newly trained model uw_us_rdt_clunits.scm in uw_us_rdt_clunits/festvox:

(proclaim_voice
 'uw_us_rdt_clunits
 '((language english)
   (gender male)
   (dialect american)
   (description
    "This is Robert Twomey trained on CLUNITS, TIMIT databse.")))

(provide 'uw_us_rdt_clunits)

Install voice to festival directory

http://roberttwomey.com/downloads/uw_us_rdt_clunits.tar.gz
unzip file from festival root directory, it should install to the correct directory
copy your newly trained voice to festival/lib/voices/english/
the name of your new voice directory (ex: uw_us_rdt_clunits/) needs to match the voice file (ex: uw_us_rdt_clunits/festvox/uw_us_rdt_clunits.scm)

Configure festival to use your voice by default

to set your voice as default (and add a special pause entry), add the following to festival/etc/siteinit.scm:

(set! voice_default 'voice_uw_us_rdt_clunits)

(lex.add.entry '("<break>" n (((pau pau) 0))))

the lex.add.entry line makes a new word in the lexicon <break> that adds a pause.
run festival_server and it will load your new voice by default
http://www.cstr.ed.ac.uk/projects/festival/manual/festival_24.html

Add a pause as a new lexical entry

add the following after (provide 'siteinit) in festival/etc/siteinit.scm:

(voice_uw_us_rdt_clunits)

(lex.add.entry '("<break>" n (((pau pau) 0))))

this will add a new word, "<break>" that is synthesized as a brief pause in speech.

Using Voice on Raspberry Pi

install with apt-get. alternatively, follow these instructions: http://elinux.org/RPi_Text_to_Speech_(Speech_Synthesis)
With festival 2.1
copy the voice data into /usr/share/festival/voices/english
edit /usr/share/festival/voices.scm and add the new voice uw_us_rdt_clunits at the beginning of the default-voice-priority-list. (end of the file)
now your new voice should be the default for festival.

rpi with external i2s dac

change aplay command within festival/scheme:

(Parameter.set 'Audio_Command "aplay -q -c 2 -t raw -f s16 -r 8000 $FILE")"

or add to startup: https://wiki.archlinux.org/index.php/Festival#Usage_with_a_Sound_Server

Tuning phrasing, prosody, etc with SABLE

http://www.cstr.ed.ac.uk/projects/festival/manual/festival_10.html#SEC31

Using Festival

Run the Server

from anywhere: festival_server

Run the Client

run the client: echo "Do you really want to see all of it?" | festival_client --ttw --output test.wav
generates a wave file

Synthesize Speech to Audio Out

run festival: echo "test this" | festival --tts
plays through speakers.

Render a Text File In Speech

run the server
run the client: cat ~/Documents/speech\ performance/speech\ performance\ structure.txt | festival_client --ttw --output structure.wav

Phoneme tests

Switch voices:

(voice_kal_diphone)

Switch back:

(voice_uw_us_rdt_clunits)

Pronounce phonemes:

(SayPhones '(pau ch pau m ay n ey m ih z r ah b er t pau))

(SayPhones '(pau ch pau m ay n ey m ih z r ow b er t pau))

(SayPhones '(pau ch pau m ay n ey m ih z r ow b ah t pau))

(SayPhones '(pau ch pau m ay n ey m ih z r ah b ah t pau))

@@ Line 21: / Line 21: @@
 ==Building a Unit Selection Cluster Voice==
-(from here http://festvox.org/festvox/x3082.html)
+(from here http://festvox.org/festvox/x3086.html)
 #<pre>mkdir uw_uw_rdt
 cd uw_uw_rdt</pre>
@@ Line 39: / Line 39: @@
 ==Building a CLUSTERGEN Statistical Parametric Synthesizer==
-adapted from http://festvox.org/festvox/c3170.html#AEN3172
+(adapted from http://festvox.org/festvox/c3174.html#AEN3176)
 #<pre>mkdir uw_us_rdt_arctic</pre>
 #<pre>cd uw_us_rdt_arctic</pre>
@@ Line 48: / Line 48: @@
 ==Building a Unit Selection Cluster Voice from TIMIT data==
-(from here http://festvox.org/festvox/x3082.html)
+(from here http://festvox.org/festvox/c2645.html#AEN2716)
 #<pre>mkdir uw_uw_rdt_timit</pre>
 #<pre>cd uw_uw_rdt_timit</pre>
@@ Line 68: / Line 68: @@
 == using meghan voice ==
 === Run the Server ===
 *open terminal:
-	cd /Users/murmur/Desktop/meghan
+<pre>
-	festival_server -c meghans_special_sauce.scm
+cd /Users/murmur/Desktop/meghan
+festival_server -c meghans_special_sauce.scm
+</pre>
 *To kill the server:
-	Control-C
+<pre>Control-C</pre>
 === Run the Client ===
 *open a 2nd terminal window:
 <pre>
-	cd /Users/murmur/Desktop/meghan
+cd /Users/murmur/Desktop/meghan
-	festival_client myfile.txt --ttw --output client_test.wav
+festival_client myfile.txt --ttw --output client_test.wav
 </pre>
 *Other stuff (python):
@@ Line 85: / Line 86: @@
 os.popen("/Applications/festival_2.1/festival/src/main/festival_client /Users/murmur/Desktop/meghan/myfile.txt --ttw --output /Users/murmur/Desktop/meghan/client_test78.wav")
 </pre>
 ==Using A Newly Trained Voice ==
 ===Modify the voice so festival knows it's there===
@@ Line 131: / Line 133: @@
 *edit '''/usr/share/festival/voices.scm''' and add the new voice <code>uw_us_rdt_clunits</code> at the beginning of the default-voice-priority-list. (end of the file)
 *now your new voice should be the default for festival.
+==rpi with external i2s dac==
+*change aplay command within festival/scheme:
+<syntaxhighlight lang="bash">
+(Parameter.set 'Audio_Command "aplay -q -c 2 -t raw -f s16 -r 8000 $FILE")"
+</syntaxhighlight>
+*or add to startup: https://wiki.archlinux.org/index.php/Festival#Usage_with_a_Sound_Server
 =Tuning phrasing, prosody, etc with SABLE=
@@ Line 146: / Line 154: @@
 *run the server
 *run the client: <code>cat ~/Documents/speech\ performance/speech\ performance\ structure.txt | festival_client --ttw --output structure.wav</code>
+==Phoneme tests==
+Switch voices:
+<syntaxhighlight lang="scheme">(voice_kal_diphone)</syntaxhighlight>
+Switch back:
+<syntaxhighlight lang="scheme">(voice_uw_us_rdt_clunits)</syntaxhighlight>
+Pronounce phonemes:
+<syntaxhighlight lang="scheme">(SayPhones '(pau ch pau m ay n ey m ih z r ah b er t pau))  </syntaxhighlight>
+<syntaxhighlight lang="scheme">(SayPhones '(pau ch pau m ay n ey m ih z r ow b er t pau))  </syntaxhighlight>
+<syntaxhighlight lang="scheme">(SayPhones '(pau ch pau m ay n ey m ih z r ow b ah t pau))  </syntaxhighlight>
+<syntaxhighlight lang="scheme">(SayPhones '(pau ch pau m ay n ey m ih z r ah b ah t pau))  </syntaxhighlight>

Difference between revisions of "Festival TTS"

Latest revision as of 05:45, 10 March 2017

Contents

Getting Started

Learning

Slides on HTS Synthesis

Training Voice Models

Building a Unit Selection Cluster Voice

Using a Unit Selection Cluster Voice Synth

Building a CLUSTERGEN Statistical Parametric Synthesizer

Building a Unit Selection Cluster Voice from TIMIT data

Improving Quality

Using Voices

using meghan voice

Run the Server

Run the Client

Using A Newly Trained Voice

Modify the voice so festival knows it's there

Install voice to festival directory

Configure festival to use your voice by default

Add a pause as a new lexical entry

Using Voice on Raspberry Pi

rpi with external i2s dac

Tuning phrasing, prosody, etc with SABLE

Using Festival

Run the Server

Run the Client

Synthesize Speech to Audio Out

Render a Text File In Speech

Phoneme tests

Robert-Depot