Latest revision as of 05:45, 10 March 2017

<<< back to Wiki Home

Getting Started

Festival Speech Synthesis System - http://www.cstr.ed.ac.uk/projects/festival/
build on os x with do_prompt capabilities - http://linguisticmystic.com/2011/07/15/using-festival-tts-on-os-x/
- http://permalink.gmane.org/gmane.science.tts.festvox/381
this is a class - http://www.speech.cs.cmu.edu/15-492/assignments/tts/index.html

Learning

Book - http://festvox.org/festvox/book1.html
short tutorial - http://festvox.org/festtut-2.0/
exercises and hints - http://festvox.org/festtut-2.0/exercises/

Slides on HTS Synthesis

http://www.sp.nitech.ac.jp/~tokuda/tokuda_iscslp2006.pdf

Training Voice Models

howto http://festvox.org/festvox/c3170.html#AEN3172
training text input - http://www.festvox.org/cmu_arctic/cmuarctic.data
useful tips http://festvox.org/index.html, including name for EMU speech database system http://www.shlrc.mq.edu.au/emu/
Building a CLUSTERGEN Statistical Parametric Synthesizer: http://festvox.org/festvox/c3170.html#AEN3172

Building a Unit Selection Cluster Voice

(from here http://festvox.org/festvox/x3086.html)

```
mkdir uw_uw_rdt
```

cd uw_uw_rdt

uniphone setup:

 $FESTVOXDIR/src/unitsel/setup_clunits uw us rdt uniphone

generate prompts and prompt files:

festival -b festvox/build_clunits.scm '(build_prompts_waves "etc/uniphone.data")'

record sound, using audacity. save as 16k, 16bit mono.
make labels:
```
./bin/make_labs prompt-wav/*.wav
```

build utterance structure:

festival -b festvox/build_clunits.scm '(build_utts "etc/uniphone.data")'

do pitch marking:
```
./bin/make_pm_wave etc/uniphone.data
```
find Mel Frequency Cepstral Coefficients:
```
./bin/make_mcep etc/uniphone.data
```

build cluster unit selection synth:

festival -b festvox/build_clunits.scm '(build_clunits "etc/uniphone.data")'

Using a Unit Selection Cluster Voice Synth

from uw_us_rdt directory:
```
festival festvox/uw_us_rdt_clunits.scm
```
in Scheme:
```
(voice_uw_us_rdt_clunits) 
```
```
(SayText "this is a little test.")
```

Building a CLUSTERGEN Statistical Parametric Synthesizer

(adapted from http://festvox.org/festvox/c3174.html#AEN3176)

```
mkdir uw_us_rdt_arctic
```
```
cd uw_us_rdt_arctic
```

$FESTVOXDIR/src/clustergen/setup_cg uw us rdt_arctic

copy text into
```
etc/txt.done.data
```
. use some of the lines from here http://www.festvox.org/cmu_arctic/cmuarctic.data
copy audio files into
```
wav/
```
use
```
bin/get_wavs
```
to copy files to power normalize and convert to proper format.

Building a Unit Selection Cluster Voice from TIMIT data

(from here http://festvox.org/festvox/c2645.html#AEN2716)

```
mkdir uw_uw_rdt_timit
```
```
cd uw_uw_rdt_timit
```

timit setup:

 $FESTVOXDIR/src/unitsel/setup_clunits uw us rdt timit

generate prompts and prompt files:

festival -b festvox/build_clunits.scm '(build_prompts_waves "etc/timit.data")'

record sound, using audacity. save as 16k, 16bit mono.

copy sound files from recording directory into voice directory.

 ./bin/get_wavs ~/Sounds/TIMIT_Training_Data/warehouse_omni/*.wav

make labels:
```
./bin/make_labs prompt-wav/*.wav
```

build utterance structure:

festival -b festvox/build_clunits.scm '(build_utts "etc/timit.data")'

do pitch marking:
```
./bin/make_pm_wave etc/timit.data
```
find Mel Frequency Cepstral Coefficients:
```
./bin/make_mcep etc/timit.data
```

build cluster unit selection synth:

festival -b festvox/build_clunits.scm '(build_clunits "etc/timit.data")'

Improving Quality

Fix phoneme labeling - http://sourceforge.net/projects/wavesurfer/
tuning a voice - http://www.cstr.ed.ac.uk/emasters/summer_school_2005/tutorial3/tutorial.html

Using Voices

using meghan voice

Run the Server

open terminal:

cd /Users/murmur/Desktop/meghan 
festival_server -c meghans_special_sauce.scm

To kill the server:

Control-C

Run the Client

open a 2nd terminal window:

cd /Users/murmur/Desktop/meghan 
festival_client myfile.txt --ttw --output client_test.wav

Other stuff (python):

import os
os.popen("/Applications/festival_2.1/festival/src/main/festival_client /Users/murmur/Desktop/meghan/myfile.txt --ttw --output /Users/murmur/Desktop/meghan/client_test78.wav")

Using A Newly Trained Voice

Modify the voice so festival knows it's there

append proclaim message to your newly trained model uw_us_rdt_clunits.scm in uw_us_rdt_clunits/festvox:

(proclaim_voice
 'uw_us_rdt_clunits
 '((language english)
   (gender male)
   (dialect american)
   (description
    "This is Robert Twomey trained on CLUNITS, TIMIT databse.")))

(provide 'uw_us_rdt_clunits)

Install voice to festival directory

http://roberttwomey.com/downloads/uw_us_rdt_clunits.tar.gz
unzip file from festival root directory, it should install to the correct directory
copy your newly trained voice to festival/lib/voices/english/
the name of your new voice directory (ex: uw_us_rdt_clunits/) needs to match the voice file (ex: uw_us_rdt_clunits/festvox/uw_us_rdt_clunits.scm)

Configure festival to use your voice by default

to set your voice as default (and add a special pause entry), add the following to festival/etc/siteinit.scm:

(set! voice_default 'voice_uw_us_rdt_clunits)

(lex.add.entry '("<break>" n (((pau pau) 0))))

the lex.add.entry line makes a new word in the lexicon <break> that adds a pause.
run festival_server and it will load your new voice by default
http://www.cstr.ed.ac.uk/projects/festival/manual/festival_24.html

Add a pause as a new lexical entry

add the following after (provide 'siteinit) in festival/etc/siteinit.scm:

(voice_uw_us_rdt_clunits)

(lex.add.entry '("<break>" n (((pau pau) 0))))

this will add a new word, "<break>" that is synthesized as a brief pause in speech.

Using Voice on Raspberry Pi

install with apt-get. alternatively, follow these instructions: http://elinux.org/RPi_Text_to_Speech_(Speech_Synthesis)
With festival 2.1
copy the voice data into /usr/share/festival/voices/english
edit /usr/share/festival/voices.scm and add the new voice uw_us_rdt_clunits at the beginning of the default-voice-priority-list. (end of the file)
now your new voice should be the default for festival.

rpi with external i2s dac

change aplay command within festival/scheme:

(Parameter.set 'Audio_Command "aplay -q -c 2 -t raw -f s16 -r 8000 $FILE")"

or add to startup: https://wiki.archlinux.org/index.php/Festival#Usage_with_a_Sound_Server

Tuning phrasing, prosody, etc with SABLE

http://www.cstr.ed.ac.uk/projects/festival/manual/festival_10.html#SEC31

Using Festival

Run the Server

from anywhere: festival_server

Run the Client

run the client: echo "Do you really want to see all of it?" | festival_client --ttw --output test.wav
generates a wave file

Synthesize Speech to Audio Out

run festival: echo "test this" | festival --tts
plays through speakers.

Render a Text File In Speech

run the server
run the client: cat ~/Documents/speech\ performance/speech\ performance\ structure.txt | festival_client --ttw --output structure.wav

Phoneme tests

Switch voices:

(voice_kal_diphone)

Switch back:

(voice_uw_us_rdt_clunits)

Pronounce phonemes:

(SayPhones '(pau ch pau m ay n ey m ih z r ah b er t pau))

(SayPhones '(pau ch pau m ay n ey m ih z r ow b er t pau))

(SayPhones '(pau ch pau m ay n ey m ih z r ow b ah t pau))

(SayPhones '(pau ch pau m ay n ey m ih z r ah b ah t pau))

@@ Line 11: / Line 11: @@
 *short tutorial - http://festvox.org/festtut-2.0/
 *exercises and hints - http://festvox.org/festtut-2.0/exercises/
+==Slides on HTS Synthesis==
+*http://www.sp.nitech.ac.jp/~tokuda/tokuda_iscslp2006.pdf
 =Training Voice Models=
@@ Line 19: / Line 21: @@
 ==Building a Unit Selection Cluster Voice==
-(from here http://festvox.org/festvox/x3082.html)
+(from here http://festvox.org/festvox/x3086.html)
 #<pre>mkdir uw_uw_rdt
 cd uw_uw_rdt</pre>
@@ Line 37: / Line 39: @@
 ==Building a CLUSTERGEN Statistical Parametric Synthesizer==
-adapted from http://festvox.org/festvox/c3170.html#AEN3172
+(adapted from http://festvox.org/festvox/c3174.html#AEN3176)
-#<code>mkdir uw_us_rdt_arctic
+#<pre>mkdir uw_us_rdt_arctic</pre>
-uw_us_rdt_arctic
+#<pre>cd uw_us_rdt_arctic</pre>
-$FESTVOXDIR/src/clustergen/setup_cg uw us rdt_arctic</code>
+#<pre>$FESTVOXDIR/src/clustergen/setup_cg uw us rdt_arctic</pre>
-#copy text into <code>etc/txt.done.data</code>. use some of the lines from here http://www.festvox.org/cmu_arctic/cmuarctic.data
+#copy text into <pre>etc/txt.done.data</pre>. use some of the lines from here http://www.festvox.org/cmu_arctic/cmuarctic.data
-#copy audio files into <code>wav/</code>
+#copy audio files into <pre>wav/</pre>
 #use <pre>bin/get_wavs</pre> to copy files to power normalize and convert to proper format.
 ==Building a Unit Selection Cluster Voice from TIMIT data==
-(from here http://festvox.org/festvox/x3082.html)
+(from here http://festvox.org/festvox/c2645.html#AEN2716)
-#<pre>mkdir uw_uw_rdt_timit
+#<pre>mkdir uw_uw_rdt_timit</pre>
-cd uw_uw_rdt_timit</pre>
+#<pre>cd uw_uw_rdt_timit</pre>
 #timit setup: <pre> $FESTVOXDIR/src/unitsel/setup_clunits uw us rdt timit</pre>
 #generate prompts and prompt files: <pre>festival -b festvox/build_clunits.scm '(build_prompts_waves "etc/timit.data")'</pre>
 #record sound, using audacity. save as 16k, 16bit mono.
+#copy sound files from recording directory into voice directory. <pre> ./bin/get_wavs ~/Sounds/TIMIT_Training_Data/warehouse_omni/*.wav </pre>
 #make labels: <pre>./bin/make_labs prompt-wav/*.wav</pre>
 #build utterance structure: <pre>festival -b festvox/build_clunits.scm '(build_utts "etc/timit.data")'</pre>
@@ Line 57: / Line 60: @@
 #find Mel Frequency Cepstral Coefficients: <pre>./bin/make_mcep etc/timit.data</pre>
 #build cluster unit selection synth: <pre>festival -b festvox/build_clunits.scm '(build_clunits "etc/timit.data")'</pre>
 =Improving Quality=
@@ Line 65: / Line 67: @@
 = Using Voices =
 == using meghan voice ==
+=== Run the Server ===
-===To run the server:===
+*open terminal:
-*open terminal:
+<pre>
-	cd /Users/murmur/Desktop/meghan
+cd /Users/murmur/Desktop/meghan
-	festival_server -c meghans_special_sauce.scm
+festival_server -c meghans_special_sauce.scm
+</pre>
 *To kill the server:
-	Control-C
+<pre>Control-C</pre>
+=== Run the Client ===
-*To run the client:
+*open a 2nd terminal window:
-open a 2nd terminal window:
+<pre>
-	cd /Users/murmur/Desktop/meghan
+cd /Users/murmur/Desktop/meghan
-	festival_client myfile.txt --ttw --output client_test.wav
+festival_client myfile.txt --ttw --output client_test.wav
+</pre>
 *Other stuff (python):
+<pre>
 import os
 os.popen("/Applications/festival_2.1/festival/src/main/festival_client /Users/murmur/Desktop/meghan/myfile.txt --ttw --output /Users/murmur/Desktop/meghan/client_test78.wav")
+</pre>
-==Using new Voices==
+==Using A Newly Trained Voice ==
-===Modify new voice so festival knows it's there===
+===Modify the voice so festival knows it's there===
-*append to <code>uw_us_rdt_clunits.scm</code> in <code>uw_us_rdt_clunits/festvox</code>:
+*append proclaim message to your newly trained model <code>uw_us_rdt_clunits.scm</code> in <code>uw_us_rdt_clunits/festvox</code>:
 <pre>
 (proclaim_voice
@@ Line 99: / Line 102: @@
 </pre>
-===Install voice to festival directory==
+===Install voice to festival directory===
 *http://roberttwomey.com/downloads/uw_us_rdt_clunits.tar.gz
 *unzip file from festival root directory, it should install to the correct directory
 *copy your newly trained voice to <code>festival/lib/voices/english/</code>
 *the name of your new voice directory (ex: <code>uw_us_rdt_clunits/</code>) needs to match the voice file (ex: <code>uw_us_rdt_clunits/festvox/uw_us_rdt_clunits.scm</code>)
 ===Configure festival to use your voice by default===
-*to set your voice as default, add the following to <code>festival/etc/siteinit.scm</code>:
+*to set your voice as default (and add a special pause entry), add the following to <code>festival/etc/siteinit.scm</code>:
 <pre>
-(autoload voice_uw_us_rdt_clunits "/Users/rtwomey/code/tts/festival/lib/voices/english/uw_us_rdt_clunits/festvox/uw_us_rdt_clunits" "American English male uw_us_rdt_clunits")
 (set! voice_default 'voice_uw_us_rdt_clunits)
-(voice_uw_us_rdt_clunits)
 (lex.add.entry '("<break>" n (((pau pau) 0))))
 </pre>
-*the <code>lex.add.entry</code> line makes a new word in the lexicon <pre><break></pre> that adds a pause.
+*the <code>lex.add.entry</code> line makes a new word in the lexicon <code><break></code> that adds a pause.
-*change directory <code>cd</code> to the folder containing your <code>festvox</code> files (trained model)
 *run <code>festival_server</code> and it will load your new voice by default
 *http://www.cstr.ed.ac.uk/projects/festival/manual/festival_24.html
+=Add a pause as a new lexical entry=
+*add the following after <code> (provide 'siteinit) </code> in <code>festival/etc/siteinit.scm</code>:
+<pre>
+(voice_uw_us_rdt_clunits)
+(lex.add.entry '("<break>" n (((pau pau) 0))))
+</pre>
+*this will add a new word, "<break>" that is synthesized as a brief pause in speech.
+=Using Voice on Raspberry Pi=
+*install with apt-get. alternatively, follow these instructions:  http://elinux.org/RPi_Text_to_Speech_(Speech_Synthesis)
+*With festival 2.1
+*copy the voice data into <code>/usr/share/festival/voices/english</code>
+*edit '''/usr/share/festival/voices.scm''' and add the new voice <code>uw_us_rdt_clunits</code> at the beginning of the default-voice-priority-list. (end of the file)
+*now your new voice should be the default for festival.
+==rpi with external i2s dac==
+*change aplay command within festival/scheme:
+<syntaxhighlight lang="bash">
+(Parameter.set 'Audio_Command "aplay -q -c 2 -t raw -f s16 -r 8000 $FILE")"
+</syntaxhighlight>
+*or add to startup: https://wiki.archlinux.org/index.php/Festival#Usage_with_a_Sound_Server
 =Tuning phrasing, prosody, etc with SABLE=
 *http://www.cstr.ed.ac.uk/projects/festival/manual/festival_10.html#SEC31
+=Using Festival=
+==Run the Server==
+*from anywhere: <code>festival_server</code>
+==Run the Client==
+*run the client: <code>echo "Do you really want to see all of it?" | festival_client --ttw --output test.wav </code>
+*generates a wave file
+==Synthesize Speech to Audio Out==
+*run festival: <code>echo "test this" | festival --tts</code>
+*plays through speakers.
+==Render a Text File In Speech==
+*run the server
+*run the client: <code>cat ~/Documents/speech\ performance/speech\ performance\ structure.txt | festival_client --ttw --output structure.wav</code>
+==Phoneme tests==
+Switch voices:
+<syntaxhighlight lang="scheme">(voice_kal_diphone)</syntaxhighlight>
+Switch back:
+<syntaxhighlight lang="scheme">(voice_uw_us_rdt_clunits)</syntaxhighlight>
+Pronounce phonemes:
+<syntaxhighlight lang="scheme">(SayPhones '(pau ch pau m ay n ey m ih z r ah b er t pau))  </syntaxhighlight>
+<syntaxhighlight lang="scheme">(SayPhones '(pau ch pau m ay n ey m ih z r ow b er t pau))  </syntaxhighlight>
+<syntaxhighlight lang="scheme">(SayPhones '(pau ch pau m ay n ey m ih z r ow b ah t pau))  </syntaxhighlight>
+<syntaxhighlight lang="scheme">(SayPhones '(pau ch pau m ay n ey m ih z r ah b ah t pau))  </syntaxhighlight>

Difference between revisions of "Festival TTS"

Latest revision as of 05:45, 10 March 2017

Contents

Getting Started

Learning

Slides on HTS Synthesis

Training Voice Models

Building a Unit Selection Cluster Voice

Using a Unit Selection Cluster Voice Synth

Building a CLUSTERGEN Statistical Parametric Synthesizer

Building a Unit Selection Cluster Voice from TIMIT data

Improving Quality

Using Voices

using meghan voice

Run the Server

Run the Client

Using A Newly Trained Voice

Modify the voice so festival knows it's there

Install voice to festival directory

Configure festival to use your voice by default

Add a pause as a new lexical entry

Using Voice on Raspberry Pi

rpi with external i2s dac

Tuning phrasing, prosody, etc with SABLE

Using Festival

Run the Server

Run the Client

Synthesize Speech to Audio Out

Render a Text File In Speech

Phoneme tests

Robert-Depot