5,710
edits
Changes
→Week 4
**in sphinx, SpeechClassifier
***"uses Bent Schmidt Nielsen's algorithm. Each time audio comes in, the average signal level and the background noise level are updated, using the signal level of the current audio. If the average signal level is greater than the background noise level by a certain threshold value (configurable), then the current audio is marked as speech. Otherwise, it is marked as non-speech." from http://cmusphinx.sourceforge.net/sphinx4/javadoc/edu/cmu/sphinx/frontend/endpoint/SpeechClassifier.html
***the threshold in the config.xml file I am using is "13". (a unit-less thirteen).
***"Could anyone provide the link of the paper about that algorithm?" "I don't think this algorithm is worth a paper. It's very simple" http://sourceforge.net/projects/cmusphinx/forums/forum/382337/topic/4364498
**Bent K. Schmidt-Nielsen http://www.merl.com/people/?user=bent
*Another aspect that interests me--as far as me as performer, hearing my own voice--is "I am Sitting In A Room" by Alvin Lucier [http://ubumexico.centro.org.mx/sound/source/Lucier-Alvin_Sitting.mp3]. I am just listening to this again now, I had never noticed that he had a stutter. This adds an interesting psychological dimension to the performance, where he states "I regard this activity not so much as a demonstration of a physical fact, but more as a way to smooth out an irregularities my speech might have." "...so that any semblance of my speech. With perhaps the exception of rhythm is destroyed. What you will hear then are the natural resonant frequencies of the room, articulated by speech. I regard this activity not so much as a demonstration of a physical fact, but more as a way to smooth out any irregularities my speech might have."
*that second part, "What you will hear then are the natural resonant frequencies of the room" is analogous for me to sounding out the linguisitic possiblities of the speech recognition system. I should explore the limits of the wall stree journal (WSJ_5k) model and the larger corpi of language/acoustic information. It is about *articulating* the limits of a system with your own voice.
=== Week 5 - MILESTONE 1 ===
*Pictures to Come