Difference between revisions of "NLP"

(New page: [http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2006T13 link ldc upenn])
 
 
(12 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 +
Web 1T 5-gram Version 1 [http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2006T13 ldc upenn]
  
[http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2006T13 link ldc upenn]
+
ldc new corpora [http://www.ldc.upenn.edu/About/newcorpora.shtml link]
 +
 
 +
Suffix Array Analysis [http://projectile.sv.cmu.edu/research/public/tools/salm/tutorial.pdf Tutorial of the SALM package]
 +
 
 +
Visual Wordnet [http://kylescholz.com/projects/wordnet]
 +
 
 +
== Word Vectors ==
 +
Gloss vector overlap http://www.d.umn.edu/~tpederse/Pubs/ijcai03.pdf
 +
 
 +
Second order co-occurence vectors
 +
* http://www.d.umn.edu/~tpederse/Pubs/eacl2006-vector.pdf
 +
* http://www.d.umn.edu/~tpederse/Pubs/patwardhan.pdf
 +
 
 +
== Opinion Mining ==
 +
http://wiki.cse.cuhk.edu.hk/irwin.king/kb/opinionmining
 +
 
 +
SensiWordNet - a publicly available resource for opinion mining. 
 +
*http://nmis.isti.cnr.it/sebastiani/Publications/LREC06.pdf
 +
 
 +
== Natural Language Interface to a Video Data Model ==
 +
http://etd.lib.metu.edu.tr/upload/12606251/index.pdf
 +
 
 +
http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?tp=&arnumber=4061311&isnumber=4061295
 +
 
 +
== Data mining==
 +
http://en.wikipedia.org/wiki/Principal_components_analysis
 +
 
 +
== NLP Language Research==
 +
=== generation of conjectures ===
 +
*  'Automatic Conjecture Generation in the Digital Humanities' by Patrick Juola and Ashley Bernola
 +
** http://twitter.com/conjecturator
 +
 
 +
=== resources ===
 +
* Wordnet
 +
* Framenet
 +
* OpenConceptNet http://conceptnet.media.mit.edu/
 +
* Cyc
 +
* http://www.cse.ohio-state.edu/~dbyron/788AU06/initial-handout.pdf
 +
 
 +
=== basic Content Analysis ===
 +
* http://www.williamlowe.net/software/ca-in-python.html
 +
=== Identity as a Variable ===
 +
* http://www.ucd.ie/euiteniba/pdf/Identity%20as%20a%20Variable.pdf
 +
 
 +
=== Closed Captioning ===
 +
* http://www.dcmp.org/captioningkey/
 +
* http://en.wikipedia.org/wiki/Closed_captioning
 +
* Caption it yourself http://www.dcmp.org/ciy/
 +
 
 +
=== Teleprompter ===
 +
Language for performance
 +
[[Teleprompter]]
 +
 
 +
=== Wordnet similarity ===
 +
wordnet::similarity word vectors
 +
* coffee#n#1: nutmeg, preparation, coffee_tree, caffeine, pulverized, coffee, arabia, tea, shelf_life, packed, hot_water, drinking, topped, java, lemon_peel, ordered, cognac, irish_whiskey, beverage, perforated, cup, sweetened, infusion, espresso, cream, boiling, dehydrated, finely, whipped_cream, stimulating, bitter, cinnamon, alkaloid
 +
* cup#n#1: loaded, drunk, disposable, mustache, coffee, boxlike, tea, saucer, footed, drinking, eucharistic, collectively, tableware, bowl, cup, standardized, greece, rim, toast, missing, ancient_greek, drinking_vessel
 +
 
 +
* rifle#n#1: automatic_rifle, loads, loading, rifle, action_mechanism, bore, barrel, butt_end, breech, cartridge, fired, sliding, firearm, shotgun, lever, armored, shoulder_holster, portable, lifted, semiautomatic, forward_motion, rifled
 +
* work#n#1: dry_rot, busywork, willing, machine_tool, technology, practical, barn, shiny, undertaking, productive, wages, inquiring, waiting, unit_of_time, preliminary, cleansing, substitute, incomplete, obliged, operations, a_great_deal, rubbing, washed, separately, ophelia, outstanding, interesting, waxing, stocks, attending, papers, labor, thoroughly, heavy_lifting, wasnt, municipal, succeed, missionary, hoped, further, mechanical, medical_care, barber, assigned, meager, budget, course_of_study, attempted, disadvantaged, boss, routine, damaging, close_to, done_for, grade, directed, systematically, recreational, duties, checked, shining, shoes, piece_of_work, polishing, sunday, soap, gawkers, leave_of_absence, improve, no_longer, cleaning, housewife, handling
 +
* neck: sternum, collarbone, hanging, immunity, human_being, cartilaginous, externally, cervix, aids, elderly, body_part, glandular, rings, flex, mastoid, clavicle, admired, occipital_bone, arteries, larynx, membranous, inhaled, aorta, fold, obliquely, artery, ductless, spine, on_fire, chin, graceful
 +
* rifle#n#1 <-> neck#n#1 = 0.0728235790450328
 +
 
 +
=== Generative Video through NLP ===
 +
* making edits based on semantic content
 +
** sequence
 +
** cut points?
 +
** decisions based on character names?
 +
* aesthetic strategies for text
 +
** side-scrolling text
 +
** intertitles
 +
** subtitles
 +
** titles
 +
* aesthetic strategies for photos
 +
** Ken Burns effect
 +
** simulated motion
 +
*** motion blur [[Main_Page#Motion Estimation]]
 +
*** camera shake
 +
*** applying motion parameters extracted from real-life situation
 +
* aesthetic strategies for video
 +
** specifically... diverse collections of heterogenous clips and documentation
 +
** video collection strategies
 +
 
 +
=== similarity measures ===
 +
Calculating the similarity of two phrases, with the goal of finding more related matches.  Example: "ocean at night" to "still life, objects around the studio" versus "cat tails at the ocean"
 +
* http://www.google.com/search?q=calculate+similarity+of+two+phrases&hl=en&start=20&sa=N
 +
* http://stackoverflow.com/questions/70560/how-do-i-compare-phrases-for-similarity
 +
* http://en.wikipedia.org/wiki/Document_classification
 +
* Phrase-based Document Similarity Based on an Index Graph Model  http://www2007.org/papers/paper632.pdf
 +
* http://pami.uwaterloo.ca/pub/hammouda/hammouda_icdm02.pdf
 +
 
 +
[[NLP]]
 +
 
 +
Optical Character Recognition
 +
* tesseract-ocr [http://code.google.com/p/tesseract-ocr/]
 +
* ocropus [http://code.google.com/p/ocropus/] layout analysis, tesseract is a plugin
 +
 
 +
[http://www.hpcwire.com/offthewire/University-of-Reading-Scientists-Study-Word-Evolution-40356217.html word evolution study at University of Reading]
 +
* Professor Mark Pagel [http://www.evolution.reading.ac.uk/ http://www.evolution.reading.ac.uk/]
 +
* ThamesBlue
 +
 
 +
[[language associations]]
 +
 
 +
Closed Captioning
 +
* http://en.wikipedia.org/wiki/Closed_captioning
 +
 
 +
=== Electronic Literature as Performance ===
 +
* http://www.drunkenboat.com/db10/05ele/elite.html

Latest revision as of 08:19, 30 September 2009

Web 1T 5-gram Version 1 ldc upenn

ldc new corpora link

Suffix Array Analysis Tutorial of the SALM package

Visual Wordnet [1]

Contents

Word Vectors

Opinion Mining

Natural Language Interface to a Video Data Model

Data mining

NLP Language Research

generation of conjectures

resources

basic Content Analysis

Identity as a Variable

Closed Captioning

Teleprompter

Language for performance Teleprompter

Wordnet similarity

wordnet::similarity word vectors

  • coffee#n#1: nutmeg, preparation, coffee_tree, caffeine, pulverized, coffee, arabia, tea, shelf_life, packed, hot_water, drinking, topped, java, lemon_peel, ordered, cognac, irish_whiskey, beverage, perforated, cup, sweetened, infusion, espresso, cream, boiling, dehydrated, finely, whipped_cream, stimulating, bitter, cinnamon, alkaloid
  • cup#n#1: loaded, drunk, disposable, mustache, coffee, boxlike, tea, saucer, footed, drinking, eucharistic, collectively, tableware, bowl, cup, standardized, greece, rim, toast, missing, ancient_greek, drinking_vessel
  • rifle#n#1: automatic_rifle, loads, loading, rifle, action_mechanism, bore, barrel, butt_end, breech, cartridge, fired, sliding, firearm, shotgun, lever, armored, shoulder_holster, portable, lifted, semiautomatic, forward_motion, rifled
  • work#n#1: dry_rot, busywork, willing, machine_tool, technology, practical, barn, shiny, undertaking, productive, wages, inquiring, waiting, unit_of_time, preliminary, cleansing, substitute, incomplete, obliged, operations, a_great_deal, rubbing, washed, separately, ophelia, outstanding, interesting, waxing, stocks, attending, papers, labor, thoroughly, heavy_lifting, wasnt, municipal, succeed, missionary, hoped, further, mechanical, medical_care, barber, assigned, meager, budget, course_of_study, attempted, disadvantaged, boss, routine, damaging, close_to, done_for, grade, directed, systematically, recreational, duties, checked, shining, shoes, piece_of_work, polishing, sunday, soap, gawkers, leave_of_absence, improve, no_longer, cleaning, housewife, handling
  • neck: sternum, collarbone, hanging, immunity, human_being, cartilaginous, externally, cervix, aids, elderly, body_part, glandular, rings, flex, mastoid, clavicle, admired, occipital_bone, arteries, larynx, membranous, inhaled, aorta, fold, obliquely, artery, ductless, spine, on_fire, chin, graceful
  • rifle#n#1 <-> neck#n#1 = 0.0728235790450328

Generative Video through NLP

  • making edits based on semantic content
    • sequence
    • cut points?
    • decisions based on character names?
  • aesthetic strategies for text
    • side-scrolling text
    • intertitles
    • subtitles
    • titles
  • aesthetic strategies for photos
    • Ken Burns effect
    • simulated motion
  • aesthetic strategies for video
    • specifically... diverse collections of heterogenous clips and documentation
    • video collection strategies

similarity measures

Calculating the similarity of two phrases, with the goal of finding more related matches. Example: "ocean at night" to "still life, objects around the studio" versus "cat tails at the ocean"

NLP

Optical Character Recognition

  • tesseract-ocr [2]
  • ocropus [3] layout analysis, tesseract is a plugin

word evolution study at University of Reading

language associations

Closed Captioning

Electronic Literature as Performance