~/supercollider/solipsist/binaries/pocketsphinx-osx-continuous\ -hmm ~/supercollider/solipsist/data/models/hmm/en_US/hub4wsj_sc_8k -dict\ ~/supercollider/solipsist/data/models/script/script.dic\ -lm ~/supercollider/solipsist/data/models/script/script.lm\ -infile robertrauschenberg1_rs.wav
pocketsphinx_continuous -hmm ~/supercollider/solipsist/data/models/hmm/en_US/hub4wsj_sc_8k\ -dict ~/supercollider/solipsist/data/models/script/script.dic\ -lm ~/supercollider/solipsist/data/models/script/script.lm\ -infile robertrauschenberg1_rs.wav
I have been working with voice recognition technologies and Mel Bochner's text 'Serial Art, Systems, Solipsism', developing a device for performance and exchange between human and computer. The device consists of a microphone, speech recognition system, software, and a receipt printer. The format of the conversation is an dialog between human voice and printed receipts in text--the system transcribes (and validates) what it hears in terms of the words it knows. As is characteristic of voice recognition, face recognition, and other kinds of machine perception that operate within explicitly defined or statistically trained spaces of "perception", this is a solipsistic system, "denying the existence of anything outside the confines of its own mind" (Bochner, 1967). This character of the solipsist is one Mel Bochner evoked to describe the autonomy and denial of external reference in minimalist sculpture of the 1960s, but which I find particularly appropriate to describe current "smart" technologies--ultimately the agency of the system still comes down to whatever agency the programmers have embedded in it, a sort of ventriloquism. This idea of a closed, narrowly parameterized space of perception in the machine is an interesting model for (and contrast to) issues of language, vocabulary, and free expression in humans--an exploration I intend to pursue through this project.
There are multiple challenges in developing this piece as a performance/installation. The first and most concrete is to get a baseline speech recognition system working. I have implemented this in the fall using the Sphinx-4 speech recognition library in Processing and Java. I have also acquired a receipt printer, ribbon, and paper, and can control its printing behavior through a serial interface. Details are in the Code section. This is the "proof of concept".
The development of roles for two characters in this piece--the system and the participant--is necessary to create the kind of encounter I have in mind. On the one hand, I would like this project to investigate the strengths and limitations of the speech recognition technology through viewer interaction, and on the other hand I would like to create a psychological investigation which highlights our human propensity to project psychology onto inanimate things and to attribute intention to them. Explicit attention in constructing the roles of both performers (human and machine) and in framing the situation will tease out some of the interesting ideas in both of these domains.
Finally, I need to address sonic properties of the piece and its time course as a composition. The voice of the viewer as they speak into the microphone is one sound source in the system, and the receipt printer has a very assertive (and retro) dot-matrix-ey sound as it prints out text and cuts rolls of paper. I need to make some decisions about how to use the sounds of the speaker and the printer over the course of the piece. Also, will I add in additional sound sources such as more printers, pre-recorded voices, voices of past participants, or processed sounds? There are additional possibilities here for rhythmic exchanges between the percussive sound of the printer and the speaker, for long pauses, silences, and repetitions. Additionally, I need to establish some overall arc for the piece--does an encounter with the system travel through to one pre-ordained conclusion? Are there multiple branching possibilities that change depending on what the viewer says and how they respond to the printouts? Finally there is a relationship to be explored between speech as text and speech as sound--a parallel to the roles of printing as text and printing as sound. The fundamental distinctions between text, sound, and speech as kinds of communication and expression can be ripe territory for exploration. I suspect that these conceptual and compositional questions will occupy most of my time this quarter and comprise the bulk of the work that I need to do.
Where else do we do this sort of projection and anthropomorphism? (projecting psychology or attributing intention to non-intelligent systems)
The most obvious technical challenges I foresee at this point are the implementation of sound input and pre-processing with supercollider, and interfacing from supercollider to the speech recognition library and receipt printer. As part of this project is a critical investigation of the strengths and limitations of automatic speech recognition (ASR) technology, I intend to get more involved with the internal mechanisms of speech recognition as implemented in the Sphinx-4 library. A more comprehensive understanding of that technology is necessary to figure out how to tweak it and expose its internal character and assumptions.
I will update the weekly Progress section as the quarter continues.
*I do not have these items yet.
Week 1 - 2
Introduction to course and project development.
Week 3 - Proposal - 4/12
Week 5 - MILESTONE 1
Working model of each of
two one tracks:
Realize that the best approach will combine elements of each of the two tracks above.
Week 7 - MILESTONE 2
I imagine these two go hand in hand--that the choice of particular texts will lend much of the character to the piece.
Have others experience the system, try it out.
Week 9 - MILESTONE 3
Near-final form, near-final realization.
Viewer interaction tests.
Final changes, improvements, last minute blitz.
get system running again
svn co https://cmusphinx.svn.sourceforge.net/svnroot/cmusphinx cmusphinx
Pictures (on my computer)
To Do - Conceptual:
To Do - Technical:
Sphinx Website Stuff:
In Class Discussion:
put a coin to purchase the receipt occupying roles master script: stelios learning from its own output: nico that's what I said: juan more than just the filter: juan some analysis of the text reconfigure the text what's the thing about the receipt printer?: nico coin-op justifies the receipt. commercial transaction record of a transaction oracle, coin-op oracle: nico trimpin, coin-op sculpture: juan automatic poetry generator ask people to engage poetically with it, record them the idea of the residual, the parts/things that people say that are outside of the system (it's closed language-field) site-specificity: administrative scout out sites with speech rec system. not be so one-dimensional, analyze tonality, intonation, rate. profiling affect.
Josh P. suggests white rabbit code (libsamplerate) to do my supercollider -> 16KHz (sphinx) downsampling. The following three libraries can be built and installed:
Convert audio file to 16-bit little-endian audio raw:
ffmpeg -i solipsism1.wav -acodec pcm_s16le -ac 1 -ar 16000 solipsism1_pcm_s16le.wav
Later in the week:
./pocketsphinx-osx-continuous -jsgf solipsist.gram -infile solipsism1_pcm_s16le.wav
./pocketsphinx-osx-continuous -dict 9316.dic -lm 9316.lm
Allright! I made much progress on my speechtool supercollider code. (http://svn.roberttwomey.com/supercollider/supercollider/speechtool.scd)
sndfile-resampleto resample 44.1KHz audio files to 16KHz for pocket-sphinx.
./pocketsphinx-osx-continuous -jsgf solipsist.gram -infile numbers.raw
./pocketsphinx-osx-continuous -dict /Users/rtwomey/Documents/dxarts463_sp11/mmpi-2/9316.dic -lm /Users/rtwomey/Documents/dxarts463_sp11/mmpi-2/9316.lm
./pocketsphinx-osx-continuous -lm /Users/rtwomey/Documents/dxarts463_sp11/mmpi2_slm/mmpi2.arpa
Info on default models:
Rhyming + Homophonic sounds:
Building a Grammar:
Building a Language Model:
Generating a Dictionary:
perl make_pronunciation.pl -tools /Users/rtwomey/code/cmusphinx/trunk/logios/Tools/ -dictdir /Users/rtwomey/code/cmusphinx/trunk/logios/Tools/MakeDict/lib/dict -words /Users/rtwomey/Documents/dxarts463_sp11/mmpi2_slm/mmpi2.tmp.vocab -handdict NONE -dict results.dic
git clone git://github.com/chokkan/liblbfgs.git liblbfgs
contains supercollider code, os x binaries, data files, etc. see README inside for more info.
coming wed. am.
/usr/bin/ruby -e "$(curl -fsSL https://raw.github.com/gist/323731)"
brew install libssamplerate
this should install libsndfile as well (as a dependency of libsamplerate).
./configure --without-python make sudo make install
./configure --without-python make sudo make install
./configure CFLAGS="-arch i386 -m32" LDFLAGS="-arch i386 -m32"
cd sphinxbase-0.7 ./configure make
libsphinxbase.x86_84.ain temp directory
cd pocketsphinx-0.7 ./configure make
libpocketsphinx.x86_64.afile in temp directory
export CFLAGS="-arch i386" export LDFLAGS="-arch i386" cd sphinxbase-0.7 make clean ./configure make
libsphinxbase.i386.ain temp directory
cd pocketsphinx-0.7 make clean ./configure make
libpocketsphinx.i386.afile in temp directory
lipo -create -output libsphinxbase.a libsphinxbase.x86_64.a libsphinxbase.i386.a lipo -create -output libpocketsphinx.a libpocketsphinx.x86_64.a libpocketsphinx.i386.a
export CFLAGS="-I/usr/include/malloc" ./configure make sudo make install