Changes

Jump to: navigation, search

Solipsist Development

2,863 bytes added, 00:46, 11 November 2012
How To Compile on OS X
[[Home | <<< back to Wiki Home]]
 
==pocket sphinx by hand==
my pocketsphinx:
<code>
~/supercollider/solipsist/binaries/pocketsphinx-osx-continuous\
-hmm ~/supercollider/solipsist/data/models/hmm/en_US/hub4wsj_sc_8k -dict\
~/supercollider/solipsist/data/models/script/script.dic\
-lm ~/supercollider/solipsist/data/models/script/script.lm\
-infile robertrauschenberg1_rs.wav
</code>
default pocketsphinx:
<code>
pocketsphinx_continuous -hmm ~/supercollider/solipsist/data/models/hmm/en_US/hub4wsj_sc_8k\
-dict ~/supercollider/solipsist/data/models/script/script.dic\
-lm ~/supercollider/solipsist/data/models/script/script.lm\
-infile robertrauschenberg1_rs.wav
</code>
 
==Proposal - Solipsist ==
I have been working with voice recognition technologies and Mel Bochner's text 'Serial Art, Systems, Solipsism', developing a device for performance and exchange between human and computer. The device consists of a microphone, speech recognition system, software, and a receipt printer. The format of the conversation is an dialog between human voice and printed receipts in text--the system transcribes (and validates) what it hears in terms of the words it knows. As is characteristic of voice recognition, face recognition, and other kinds of machine perception that operate within explicitly defined or statistically trained spaces of "perception", this is a solipsistic system, "denying the existence of anything outside the confines of its own mind" (Bochner, 1967). This character of the solipsist is one Mel Bochner evoked to describe the autonomy and denial of external reference in minimalist sculpture of the 1960s, but which I find particularly appropriate to describe current "smart" technologies--ultimately the agency of the system still comes down to whatever agency the programmers have embedded in it, a sort of ventriloquism. This idea of a closed, narrowly parameterized space of perception in the machine is an interesting model for (and contrast to) issues of language, vocabulary, and free expression in humans--an exploration I intend to pursue through this project.
Finally, I need to address sonic properties of the piece and its time course as a composition. The voice of the viewer as they speak into the microphone is one sound source in the system, and the receipt printer has a very assertive (and retro) dot-matrix-ey sound as it prints out text and cuts rolls of paper. I need to make some decisions about how to use the sounds of the speaker and the printer over the course of the piece. Also, will I add in additional sound sources such as more printers, pre-recorded voices, voices of past participants, or processed sounds? There are additional possibilities here for rhythmic exchanges between the percussive sound of the printer and the speaker, for long pauses, silences, and repetitions. Additionally, I need to establish some overall arc for the piece--does an encounter with the system travel through to one pre-ordained conclusion? Are there multiple branching possibilities that change depending on what the viewer says and how they respond to the printouts? Finally there is a relationship to be explored between speech as text and speech as sound--a parallel to the roles of printing as text and printing as sound. The fundamental distinctions between text, sound, and speech as kinds of communication and expression can be ripe territory for exploration. I suspect that these conceptual and compositional questions will occupy most of my time this quarter and comprise the bulk of the work that I need to do.
 
Where else do we do this sort of projection and anthropomorphism? (projecting psychology or attributing intention to non-intelligent systems)
The most obvious technical challenges I foresee at this point are the implementation of sound input and pre-processing with supercollider, and interfacing from supercollider to the speech recognition library and receipt printer. As part of this project is a critical investigation of the strengths and limitations of automatic speech recognition (ASR) technology, I intend to get more involved with the internal mechanisms of speech recognition as implemented in the Sphinx-4 library. A more comprehensive understanding of that technology is necessary to figure out how to tweak it and expose its internal character and assumptions.
**see [[#Code | Code]] section below.
'''<nowiki>*</nowiki>I do not have these items yet.'''
==Open Questions==
*Where else do we do this sort of projection and anthropomorphism? (projecting psychology or attributing intention to non-intelligent systems)
 
==Timeline==
'''Week 1 - 2'''
 
Introduction to course and project development.
#Find desk and desk lamp for "me vs. the computer" staging of microphone/speech recognition system.
'''Week 4 - 4/19''' 
Work time.
'''Week 5 - MILESTONE 1 - 4/26''' 
Working model of each of <s>two</s> one tracks:
#Participant speaking to computer.
#<s>Computer/printer speaking to itself (feedback loop). Interpreting printer sounds as speech. Or transforming them into speech.</s>
'''Week 6 - 5/3''' 
Realize that the best approach will combine elements of each of the two tracks above.
'''Week 7 - MILESTONE 2 - 5/10''' 
*Experiments with the characterization of the system:
**Software agent? agency. towards what goals?
I imagine these two go hand in hand--that the choice of particular texts will lend much of the character to the piece.
'''Week 8 - 5/17''' 
Have others experience the system, try it out.
'''Week 9 - MILESTONE 3 - 5/24''' 
Near-final form, near-final realization.
Viewer interaction tests.
'''Week 10 - 5/31''' 
Final changes, improvements, last minute blitz.
'''Presentation - 6/7'''
==Progress==
*Example code for Grammar-based recognition: http://svn.roberttwomey.com/processing/sphinxGrammarTest/
*Example code for Statistical Language Model (SLM) based recognition: http://svn.roberttwomey.com/processing/sphinxSLMTest/
 
===Supercollider===
*Speech Segmenter tool in supercollider http://svn.roberttwomey.com/supercollider/supercollider/speechtool.scd
===Pocketsphinx command-line Recognizer in OS X===
*http://svn.roberttwomey.com/supercollider/pocketsphinx-osx/
*http://sourceforge.net/projects/cmusphinx/files/pocketsphinx/0.7/pocketsphinx-0.7.tar.gz/download
 
===Build a Language Model Online===
*upload a list of sentences (i.e. text file) here: http://www.speech.cs.cmu.edu/tools/lmtool-new.html
===Build a Language Model Locally===
*see week 10 and final week above... not yet done
 
===Secret Rabbit Code (libsamplerate)===
*libsamplerate (used for sndfile-resample): http://www.mega-nerd.com/SRC/download.html
*libsndfile: http://www.mega-nerd.com/libsndfile/#Download
== How To Compile on OS X ==
===Download and Install Homebrew===from https://github.com/mxcl/homebrew/wiki/installation<pre> /usr/bin/ruby -e "$(curl -fsSL https://raw.github.com/gist/323731)" </pre>===Install libsamplerate===<pre> brew install libssamplerate </pre>this should install libsndfile as well (as a dependency of libsamplerate). ===Install sphinxbase-0.7===*download sphinxbase-0.7: http://sourceforge.net/projects/cmusphinx/files/sphinxbase/0.7/sphinxbase-0.7.tar.gz/download *configure, make, install:<pre>./configure --without-pythonmakesudo make install</pre> ===Install pocketsphinx-0.7===*download pocketsphinx-0.7: http://sourceforge.net/projects/cmusphinx/files/pocketsphinx/0.7/pocketsphinx-0.7.tar.gz/download*configure, make, install:<pre>./configure --without-pythonmakesudo make install</pre> ===Build pocketsphinx-psx-countinuous===*download: http://svn.roberttwomey.com/supercollider/pocketsphinx-osx/*build in xcode.*copy /build/Debug/pocketsphinx-osx-continuous to your binaries/ folder. this is the command line program used by the speechtool.scd program.== sphinx-openal on OS X ==*brew to install openal*build this - https://gitorious.org/code-dump/sphinx-openal ===OUT OF DATE: =======Basic CMU Sphinx-4 Automatic Speech Recognition (ASR) library info:====
*Reference Home http://cmusphinx.sourceforge.net/wiki/
*download http://sourceforge.net/projects/cmusphinx/files/
*training a SLM, http://www.speech.cs.cmu.edu/tools/lmtool-new.html
==== Compiling pocketsphinx as a universal static lib on OS X ====
*make x86_64 version of libsphinxbase:
<code>
*Charles O. Hartman. Virtual Muse: Experiments in Computer Poetry. http://www.amazon.com/Virtual-Muse-Experiments-Computer-Wesleyan/dp/0819522392/ref=ntt_at_ep_dpt_2 / http://www.upne.com/0-8195-2238-4.html
*Brief History of the Oulipo. Jean Lescure. In ''New Media Reader'', Noah Wardrip-Fruin, Nick Montfort 2003.
*Dennis Oppenheim. "Color Application for Chandra." 1971.
** My two-and-a-half-year-old daughter is taught seven basic colors by repeated exposure to projected light and to my voice. In three hours she is able to associate the color symbol with the word symbol, thereby acquiring this data. Individual tape loops of Chandra's voice repeating the color names are played twenty four hours a day to a parrot in a separate room. The parrot eventually learns to mimic the color names. Here, color is not directly applied to a surface, but transmitted (abstracted from its source) and used to structure the vocal responses of a bird. It becomes a method for me to throw my voice." (in Dennis Oppenheim: Selected works 1967-90 . Heiss. 1992)

Navigation menu