UNTREF Speech Workshop
Contents
Introduction
Conversing With Machines
A short 1-2 day workshop introducing speech recognition and speech synthesis techniques for the creation of interactive artwork. We use pre-compiled open-source tools (CMU Sphinx ASR, Festival TTS, Processing, Python) and focus on the demonstrable strengths and unexpected limitations of speech technologies as vehicles for creating meaning.
Saturday Sept 21, 2-6pm Centro Cultural de Borges UNTREF.
Background Reading:
- Natalie Jeremijenko. "If Things Can Talk, What Do They Say? If We Can Talk To Things, What Do We Say?" 2005-03-05 [http://www.electronicbookreview.com/thread/firstperson/voicechip
- also see the responses by Simon Penny, Lucy Suchmann, and Natalie linked from that page.
- "Dialogue With A Monologue: Voice Chips and the Products of Abstract Speech". http://www.topologicalmedialab.net/xinwei/classes/readings/Jeremijenko/VoiceChips.pdf
- Mel Bochner. "Serial Art, Systems, Solipsism." (pdf)
Automatic Speech Recognition
Engines
- CMU Sphinx Open Source Toolkit For Speech Recognition Project by Carnegie Mellon University
- Pocketsphinx. A light-weight, portable implementatin of sphinx. pocketsphinx on win32 - http://www.aiaioo.com/cms/index.php?id=28
- Google ASR.
- Google ASR wrapped for processing - http://stt.getflourish.com/
Installing CMU Sphinx
- Download from sourceforge: http://cmusphinx.sourceforge.net/wiki/download/
- If using windows, you need the sphinxbase-0.8-win32.zip and pocketsphinx-0.8-win32.zip files. I already downloaded these for you. They are in the untref_speech folder.
Using sphinx
- open a terminal. Windows, Run->Cmd.
- change to the pocketsphinx directory.
cd Desktop\untref_speech\pocketsphinx-0.8-win32\bin\Release
- run the pocketsphinx command to recognize english:
pocketsphinx_continuous.exe -hmm ..\..\model\hmm\en_US\hub4wsj_sc_8k -dict ..\..\model\lm\en_US\cmu07a.dic -lm ..\..\model\lm\en_US\hub4.5000.DMP
- recognize spanish:
pocketsphinx_continuous.exe -hmm ..\..\model\hmm\es_MX\hub4_spanish_itesm.cd_cont_2500 -dict ..\..\model\lm\es_MX\h4.dict -lm ..\..\model\lm\es_MX\H4.arpa.Z.DMP
- this should transcribe live from the microphone.
Language Models
Acoustic models versus language models.
Grammars versus Satistical Language Models.
Available language models. English, Mandarin, French, Spanish, German, Dutch and more: http://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/
Training your own Models
grammer is trivial.
slm, can use online tools. or try the sphinxtrain packages.
Programming with Speech Recognition
Processing. Sphinx4, the java interface.
Python or c++, command line, android. pocketsphinx.
Text To Speech Synthesis
Engines
- Festival/Festvox. Festival from University of Edinburgh. CMU Speech group.
- freetts. wrapper for processing - http://www.local-guru.net/blog/pages/ttslib
- MARY TTS. http://mary.dfki.de/
- Google TTS. http://amnonp5.wordpress.com/2011/11/26/text-to-speech/
- Mac OS X Built in speech synthesis
- MBROLA voices. - http://tcts.fpms.ac.be/synthesis/
- Siri,
Test them online
- Festival online demo - http://www.cstr.ed.ac.uk/projects/festival/onlinedemo.html
- Spanish (UVIGO Spanish Male)
- American English
- Others...
- MARY TTS online demo - http://mary.dfki.de:59125/
Voices
- http://festvox.org/dbs/index.html
- https://github.com/joseguerrero/festival-spanish-voices
- spanish voices - http://sangonz.wordpress.com/2010/05/22/spanish-voices-for-festival/
Installing Festival
- http://festvox.org/packed/festival/2.1/festival-2.1-release.tar.gz
- windows binaries http://sourceforge.net/projects/e-guidedog/files/related%20third%20party%20software/0.3/festival-2.1-win.7z/download
- voices http://homepages.inf.ed.ac.uk/jyamagis/software/page54/page54.html
Tutorial
Making a Voice
- Portraiture?
Activity: Feedback Loop
A Conversation.