Changes

UNTREF Speech Workshop

4,339 bytes added, 01:33, 21 December 2014

no edit summary

~~[[Home | <<< back to Wiki Home]]~~

~~='''Introduction'''=~~

http://phenomenologyftw.files.wordpress.com/2011/03/saussure.gif

=Background Reading:=

*Natalie Jeremijenko. "If Things Can Talk, What Do They Say? If We Can Talk To Things, What Do We Say?" 2005-03-05 [http://www.electronicbookreview.com/thread/firstperson/voicechip

**also see the responses by Simon Penny, Lucy Suchmann, and Natalie linked from that page.

*Mel Bochner. "Serial Art, Systems, Solipsism." ([[Media:Serial Art, Systems, Solipsism - Bochner 1967.pdf|pdf]])

=~~'''~~Automatic Speech Recognition~~'''~~=

https://engineering.purdue.edu/~ee649/notes/figures/ear.gif

*Google ASR.

*Google ASR wrapped for processing - http://stt.getflourish.com/

==Hands-on with Processing==

===STT Library===

*Download and install the STT library. http://dl.dropbox.com/u/974773/_keepalive/stt.zip

*Download the library file, unzip it, and copy it to the ''Processing\libraries'' folder.

===Example - Listening with Google ASR ===

*Processing example:

**http://wiki.roberttwomey.com/images/c/c2/Google_listen.zip

*Try switching the recognition language. "es" vs. "en", "de", "fr".

==Hands-on with Sphinx==

===Installation===

*Download from sourceforge: http://cmusphinx.sourceforge.net/wiki/download/

*If using windows, you need the '''sphinxbase-0.8-win32.zip''' and '''pocketsphinx-0.8-win32.zip''' files. I already downloaded these for you. They are in the '''untref_speech ''' folder.

===Usage===

===Language Models===

*'''Acoustic models''' versus '''language models'''.*'''Grammars''' versus ''~~'Satistical~~ Statistical Language Models'''.

*Available language models for Sphinx:

**English

**I can talk you through using the resultant model.

==~~Programming~~ Hands-on with ~~Speech Recognition~~Sphinx4 Library for Processing==*This section includes a wrapper of the CMU Sphinx4 recognizer for Processing. ~~'''Sphinx4'''~~Read more about the CMU Sphinx project at http://cmusphinx.sourceforge.net/.*Below we have a library for processing, an example using a grammar of phrases for recognition, ~~the java interface~~and one using a statistical language model.

~~Python or c++, command line, android~~===Library===*JAR file and some necessary language and acoustic models to do Sphinx-based speech recognition. ~~'''pocketsphinx'''~~*Download the zip file below and copy it to your Processing/libraries folder: **Download file: http://wiki.dxarts.washington.edu/groups/general/wiki/d7564/attachments/d8bfa/sphinx.zip

==~~Hands~~=Example -~~on with Processing~~Grammar-based Recognition ===*~~Requires STT library~~Simple grammar-based speech recognition with Sphinx4 in processing. **Download file: http://~~stt~~wiki.~~getflourish~~dxarts.~~com~~washington.edu/groups/general/wiki/d7564/attachments/63c05/sphinxGrammarCustomdict.zipthis example uses a simple grammar. In the data folder it has a grammar file (.gram), a dictionary file (.dict), and a config file (.xml)*Download the ~~library~~ grammar file(upstairs.gram) is a JSGF format grammar file that lists the possible words your system can hear. It has a format with individual words in upper-case letters, ~~unzip~~ and a "|" mark between each word. You should be able to edit this file and fill itwith your own words.the dict file (upstairs.dict) is a pronunciation dictionary file. It breaks each of those upper-case words fro the grammar into phonemic units. The easiest way to make a new dictionary with your own words is to use the online language tool described below.finally, the config file (upstairs.config.xml) specifies various parameters and file-names for the speech recognition engine. In this file you will probably need to change the path to your data files such as the grammar, dict, and ~~copy it~~ the Library files you installed above. If you edit the xml file you will see that a lot of the paths are of the form "/Users/rtwomey/" which is obviously my computer, replace with the path to the ~~Processing\libraries folder~~file on your system. contact me if this doesn't work ===Example - SLM-based Recognition===*This example does Sphinx4 automatic speech recognition using a statistical language model (SLM) rather than a grammar. *I have included two SLMs in the data directory, 3990.lm/3990.dict, and 7707.lm/7707.dict*They were generated with the CMU Sphinx Knowledge Base Tool (see below). For each, I ~~also put it~~ uploaded a plain-text file of sentences and saved the resulting tar file with dict, lm, and other results.*As above, you may need to change some file paths in the sphinx_config.xml file to match the setup on ~~the thumbdrive~~your system.*~~Processing example~~Download file: [[http:~~File~~//wiki.dxarts.washington.edu/groups/general/wiki/d7564/attachments/01040/sphinxSLMTest.zip ===Online Tool for Training Language Models===*This produces a statistical language model and dictionary (along with various other products) for the text you upload.*Your source file should be plain text, one sentence per line. *Upload the file and then click "Compile Knowledge Base."*On the results screen, click on the .TAR file to download it. Unzip this file:~~google_listen~~**The .dic is your pronunciation dictionary. You may want to rename it to .dict to match the files in the sketch. Or change your config file.**The .lm file is a 3-gram SLM file. If you are trying the SLM example above you will need this as well.~~zip]]~~*~~Try switching~~ The grammar example above runs from a grammar (.gram) and a dictionary (.dict). This online language tools generates the ~~recognition~~ dictionary for your text but not the grammar. You will need to make the grammar on your own.*The SLM example above runs from a grammar (.gram) and a languagemodel (.lm). This online tool generates both files.*Sphinx Knowledge Base Tool: http://www.speech.cs.cmu.edu/tools/lmtool-new.html ==Other programming==*Python or c++*command line*android*'''pocketsphinx'''.

=~~'''~~Text To Speech Synthesis~~'''~~=

http://www.pixel-issue.net/wp-content/uploads/2011/11/voder-2.png

**Others...

*MARY TTS online demo - http://mary.dfki.de:59125/

~~==Installing Festival==~~

*http://festvox.org/packed/festival/2.1/festival-2.1-release.tar.gz

*Tutorial - http://homepages.inf.ed.ac.uk/jyamagis/misc/Practice_of_Festival_speech_synthesizer.html

*windows binaries http://sourceforge.net/projects/e-guidedog/files/related%20third%20party%20software/0.3/festival-2.1-win.7z/download

*voices http://homepages.inf.ed.ac.uk/jyamagis/software/page54/page54.html

==Hands-on With Processing==

*For Google TTS no library is required. You don't have to install anything. You just need an internet connection to talk to google.

===Example 1. Speech===

[[*http:~~File:google_speak~~//wiki.roberttwomey.com/images/d/d2/Google_speak.zip]]

===Example 2. Daisy Bell===

*Processing Daisy Bell example using Google Text To Speech. Requires an internet connection:

**[[http:~~File~~//wiki.roberttwomey.com/images/4/43/Google_daisy.zip ==Hands-on with Festival===== Installation ===*http://festvox.org/packed/festival/2.1/festival-2.1-release.tar.gz*Tutorial - http:~~google_daisy~~//homepages.inf.ed.ac.uk/jyamagis/misc/Practice_of_Festival_speech_synthesizer.html*windows binaries http://sourceforge.net/projects/e-guidedog/files/related%20third%20party%20software/0.3/festival-2.1-win.7z/download*voices http://homepages.inf.ed.ac.uk/jyamagis/software/page54/page54.html*Copy festival folder to C:\ ===Usage===*run the terminal. Start Menu, Run -> Cmd.*switch to the festival directory:**<code>cd C:\festival</code>*start festival: **<code>festival</code>*to say something:**<code>(SayText "this is what I am going to say")</code>*to render speech to sound file:***to switch voices:**<code>(voice_rab_diphone)</code>**<code>(voice_uw_us_rdt_clunits)</code> *to exit festival:**<code>(exit)</code>*Festival is written in Scheme, a variant of LISP.~~zip]]~~

==Voices==

*Robert Voice

=~~'''~~Activity: Feedback Loop~~'''~~=

http://phenomenologyftw.files.wordpress.com/2011/03/saussure.gif

==Processing Sketch==

~~[[File~~http:~~listen_speak~~//wiki.roberttwomey.com/images/8/86/Listen_speak.zip]]

← Older edit

Rtwomey

Bureaucrat, administrator

5,710

edits

Changes

UNTREF Speech Workshop

Robert-Depot