<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-7732226477253686468</id><updated>2011-07-08T06:52:46.629-07:00</updated><category term='ACL'/><category term='Graphical Models'/><category term='&quot;kernel methods&quot;'/><category term='experimentation'/><category term='datasets'/><category term='data mining'/><category term='sequence labeling'/><category term='ICML'/><category term='Matlab'/><category term='latex'/><category term='development'/><category term='Information Retrieval'/><category term='meaning'/><category term='graphs'/><category term='NAACL'/><category term='Parsing'/><category term='ontology'/><category term='NIPS'/><category term='smoothing'/><category term='multilingual dependency parsing'/><category term='Web'/><category term='Accuracy'/><category term='&quot;Machine Translation&quot;'/><category term='job'/><category term='Geek Humor'/><category term='results'/><category term='phd'/><category term='&quot;shared task&quot;'/><category term='WSD'/><category term='gradlife'/><category term='eacl'/><category term='AI'/><category term='SIGIR 2007'/><category term='MT'/><category term='spam'/><category term='dragon'/><category term='beryl'/><category term='todo'/><category term='Humor'/><category term='&quot;information extraction&quot;'/><category term='HTK'/><category term='ML'/><category term='learning'/><category term='India'/><category term='papers'/><category term='taxonomy'/><category term='coling'/><category term='theory'/><category term='math'/><category term='NLP'/><category term='charts'/><category term='linguistics'/><category term='research'/><category term='WWW'/><category term='semantic web'/><category term='TextGraph'/><category term='graduate students'/><category term='language'/><category term='&quot;machine learning&quot;'/><category term='font'/><category term='Google'/><category term='&quot;sentiment analysis&quot;'/><category term='patents'/><category term='language modeling'/><category term='TKDE'/><category term='KDD'/><category term='EMNLP'/><category term='HMM'/><category term='coding'/><category term='search'/><category term='CIA'/><category term='reading list'/><category term='&quot;AAAI 2007&quot;'/><category term='statistics'/><category term='IR'/><category term='ubuntu'/><category term='concordances'/><category term='&quot;CoNLL 2007&quot;'/><category term='machine learning'/><category term='writing'/><category term='vista'/><category term='estimation'/><title type='text'>Misc Research Stuff</title><subtitle type='html'>An online notebook for my jottings on NLP and machine learning.</subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://resnotebook.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://resnotebook.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>Delip Rao</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>59</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-7732226477253686468.post-6877143954934392418</id><published>2009-07-03T13:01:00.000-07:00</published><updated>2009-07-04T13:37:15.564-07:00</updated><title type='text'>Dealing with large scale graphs</title><content type='html'>&lt;div&gt;To a hammer everything looks like a nail but one great hammer to have in your toolbox is the graph. The &lt;a href="http://www.aclweb.org/anthology/index.html"&gt;ACL anthology&lt;/a&gt; alone lists more than 300 results for the query &lt;a href="http://www.google.com/custom?q=%22graph+based%22&amp;amp;btnG=Search&amp;amp;hl=en&amp;amp;client=google-coop-np&amp;amp;cof=AH%3Aleft%3BCX%3AACL%2520Anthology%2520search%3BL%3Ahttp%3A%2F%2Fwww.google.com%2Fcoop%2Fintl%2Fen%2Fimages%2Fcustom_search_sm.gif%3BLH%3A65%3BLP%3A1%3BVLC%3A%23551a8b%3BGFNT%3A%23666666%3BDIV%3A%23cccccc%3B&amp;amp;cx=011664571474657673452%3A4w9swzkcxiy&amp;amp;adkw=AELymgVkmTTk4qTXJrDVPNTR6g4ViEj-nAg-Nqo7jvuyoligGcMvib0rqxKsTBfQt6QMeJ0oC2s2Qq0e-eV8IKdKmlbX_YDsfpSHlZuUWX9Baq88Tjxz24BaobmQZZo2_wTS3EFlrDDBAX9FfPCf-vKxUqOHxyN5yUZpAGfJtl8SdTQaJ0Kj02E&amp;amp;boostcse=0&amp;amp;sa=2"&gt;"graph based"&lt;/a&gt;. Graphs based formalisms allow us to write down solutions in succinct linear algebra representation. However implementation of such solutions for large problems, or even for small datasets with blown-up graph representations can be challenging in limited resource environments. While some go for interesting &lt;a href="http://snowbird.djvuzone.org/2007/abstracts/139.pdf"&gt;approximate solutions&lt;/a&gt;, an alternative solution is to pool in several limited resource nodes into a map-reduce cluster and design a parallel algorithm to conquer scale with concurrency. This is easier said than done since designing some parallel algorithms requires a different perspective of the problem. This is well worth the effort as the new insights gained will reveal connections between things you already knew. For instance, in our &lt;a href="http://www.textgraphs.org/ws09/index.html"&gt;TextGraphs 2009&lt;/a&gt; paper we started out scaling up &lt;a href="http://learning.eng.cam.ac.uk/zoubin/papers/zgl.pdf"&gt;Label Propagation&lt;/a&gt; but eventually the connection to &lt;a href="http://google.stanford.edu/~backrub/pageranksub.ps"&gt;PageRank&lt;/a&gt; became obvious. To me this was a bigger learning moment than getting Label Propagation work for large graphs. [&lt;a href="http://www.clsp.jhu.edu/~delip/nocrawl/textgraphs09.pdf"&gt;Preprint Copy&lt;/a&gt;]&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;For the actual implementation, we used &lt;a href="http://hadoop.apache.org/"&gt;Hadoop&lt;/a&gt; (surprise!) although &lt;a href="http://googleresearch.blogspot.com/2009/06/large-scale-graph-computing-at-google.html"&gt;bulk synchronous parallel models&lt;/a&gt; make more sense given the locality of the operations in most graph algorithms.&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7732226477253686468-6877143954934392418?l=resnotebook.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://resnotebook.blogspot.com/feeds/6877143954934392418/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7732226477253686468&amp;postID=6877143954934392418' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/6877143954934392418'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/6877143954934392418'/><link rel='alternate' type='text/html' href='http://resnotebook.blogspot.com/2009/07/dealing-with-large-scale-graphs.html' title='Dealing with large scale graphs'/><author><name>Delip Rao</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7732226477253686468.post-3346449798513838589</id><published>2009-03-31T15:04:00.000-07:00</published><updated>2009-03-31T15:26:24.994-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='&quot;information extraction&quot;'/><category scheme='http://www.blogger.com/atom/ns#' term='research'/><category scheme='http://www.blogger.com/atom/ns#' term='NLP'/><category scheme='http://www.blogger.com/atom/ns#' term='&quot;sentiment analysis&quot;'/><title type='text'>Sentiment Analysis is AI-Hard</title><content type='html'>In a &lt;a href="http://mags.acm.org/communications/200904/?pg=16"&gt;breezy article&lt;/a&gt; on sentiment analysis, Alex Wright quotes Bo Pang saying:&lt;br /&gt;&lt;blockquote&gt;We are dealing with sentiment that can be expressed in subtle ways.&lt;/blockquote&gt;This is so true with the examples I've encountered while working and my favorite is this one I saw on iTunes recently.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_O_xB_EK1QO4/SdKVzy1k_hI/AAAAAAAAAxU/JJIdp0smXlo/s1600-h/Picture+4.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 400px; height: 61px;" src="http://4.bp.blogspot.com/_O_xB_EK1QO4/SdKVzy1k_hI/AAAAAAAAAxU/JJIdp0smXlo/s400/Picture+4.png" alt="" id="BLOGGER_PHOTO_ID_5319478826930339346" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;While I commend Alex for writing an informative yet accessible article on the topic, I disagree with the article's opinion that sentiment analysis is a series of "filters". That is clearly an euphemism. Any working sentiment analysis system is actually an engineering feat often consisting of a series of hacks duct-taped by a glue handling special cases.&lt;br /&gt;&lt;br /&gt;The article also seems to suggest that extracting factual information is somehow easier than opinions. I invite them to participate &lt;a href="http://apl.jhu.edu/%7Epaulmac/kbp.html"&gt;here&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7732226477253686468-3346449798513838589?l=resnotebook.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://resnotebook.blogspot.com/feeds/3346449798513838589/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7732226477253686468&amp;postID=3346449798513838589' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/3346449798513838589'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/3346449798513838589'/><link rel='alternate' type='text/html' href='http://resnotebook.blogspot.com/2009/03/sentiment-analysis-is-ai-hard.html' title='Sentiment Analysis is AI-Hard'/><author><name>Delip Rao</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_O_xB_EK1QO4/SdKVzy1k_hI/AAAAAAAAAxU/JJIdp0smXlo/s72-c/Picture+4.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7732226477253686468.post-5793850017551277084</id><published>2009-02-28T09:02:00.000-08:00</published><updated>2009-02-28T09:09:42.562-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='graduate students'/><category scheme='http://www.blogger.com/atom/ns#' term='research'/><category scheme='http://www.blogger.com/atom/ns#' term='NLP'/><category scheme='http://www.blogger.com/atom/ns#' term='Humor'/><title type='text'>On the way to Brewer's Art</title><content type='html'>&lt;div&gt;Never mind how we got to this topic:&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;me: Parsing is for fogies.&lt;div&gt;Markus: What?&lt;/div&gt;&lt;div&gt;Jason: I think he means crusty old linguists.&lt;/div&gt;&lt;div&gt;Markus: You should probably use a shallow parser.&lt;/div&gt;&lt;div&gt;me: I'm shallower than that; I use n-grams.&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7732226477253686468-5793850017551277084?l=resnotebook.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://resnotebook.blogspot.com/feeds/5793850017551277084/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7732226477253686468&amp;postID=5793850017551277084' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/5793850017551277084'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/5793850017551277084'/><link rel='alternate' type='text/html' href='http://resnotebook.blogspot.com/2009/02/on-way-to-brewers-art.html' title='On the way to Brewer&apos;s Art'/><author><name>Delip Rao</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7732226477253686468.post-9163927998336591832</id><published>2008-12-19T11:45:00.000-08:00</published><updated>2008-12-19T11:47:38.501-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='eacl'/><category scheme='http://www.blogger.com/atom/ns#' term='NLP'/><title type='text'>EACL Reading</title><content type='html'>EACL 2009 &lt;a href="http://www.eacl2009.gr/conference/acceptedpapers"&gt;accepted paper list&lt;/a&gt; is up. Here's my reading list:&lt;br /&gt;&lt;br /&gt;WEAKLY SUPERVISED PART-OF-SPEECH TAGGING FOR RESOURCE-SCARCE LANGUAGES&lt;br /&gt;Kazi Saidul Hasan and Vincent Ng&lt;br /&gt;&lt;br /&gt;USING CYCLES AND QUASI-CYCLES TO DISAMBIGUATE DICTIONARY GLOSSES&lt;br /&gt;Roberto Navigli&lt;br /&gt;&lt;br /&gt;SYNTACTIC AND SEMANTIC KERNELS FOR SHORT TEXT PAIR CATEGORIZATION&lt;br /&gt;Alessandro Moschitti&lt;br /&gt;&lt;br /&gt;SENTIMENT SUMMARIZATION: EVALUATING AND LEARNING USER PREFERENCES&lt;br /&gt;Kevin Lerman, Sasha Blair-Goldensohn and Ryan McDonald&lt;br /&gt;&lt;br /&gt;PERSON IDENTIFICATION FROM TEXT AND SPEECH GENRE SAMPLES&lt;br /&gt;Jade Goldstein-Stewart, Ransom Winder and Roberta Sabin&lt;br /&gt;&lt;br /&gt;OUTCLASSING WIKIPEDIA IN OPEN-DOMAIN INFORMATION EXTRACTION: WEAKLY-SUPERVISED ACQUISITION OF ATTRIBUTES OVER CONCEPTUAL HIERARCHIES&lt;br /&gt;Marius Pasca&lt;br /&gt;&lt;br /&gt;GROWING FINELY-DISCRIMINATING TAXONOMIES FROM SEEDS OF VARYING QUALITY AND SIZE&lt;br /&gt;Tony Veale, Guofu Li and Yanfen Hao&lt;br /&gt;&lt;br /&gt;GENERATING A NON-ENGLISH SUBJECTIVITY LEXICON: RELATIONS THAT MATTER&lt;br /&gt;Valentin Jijkoun and Katja Hofmann&lt;br /&gt;&lt;br /&gt;CONTEXTUAL PHRASE-LEVEL POLARITY ANALYSIS USING LEXICAL AFFECT SCORING AND SYNTACTIC N-GRAMS&lt;br /&gt;Apoorv Agarwal, Fadi Biadsy and Kathleen Mckeown&lt;br /&gt;&lt;br /&gt;COMPANY-ORIENTED EXTRACTIVE SUMMARIZATION OF FINANCIAL NEWS&lt;br /&gt;Katja Filippova, Mihai Surdeanu, Massimiliano Ciaramita and Hugo Zaragoza&lt;br /&gt;&lt;br /&gt;ANALYSING WIKIPEDIA AND GOLD-STANDARD CORPORA FOR NER TRAINING&lt;br /&gt;Joel Nothman, Tara Murphy and James R. Curran&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7732226477253686468-9163927998336591832?l=resnotebook.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://resnotebook.blogspot.com/feeds/9163927998336591832/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7732226477253686468&amp;postID=9163927998336591832' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/9163927998336591832'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/9163927998336591832'/><link rel='alternate' type='text/html' href='http://resnotebook.blogspot.com/2008/12/eacl-reading.html' title='EACL Reading'/><author><name>Delip Rao</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7732226477253686468.post-433062886769014657</id><published>2008-12-02T16:08:00.000-08:00</published><updated>2008-12-05T20:19:12.083-08:00</updated><title type='text'>And we're back ...</title><content type='html'>Sometime back I wrote about &lt;a href="http://resnotebook.blogspot.com/2008/07/quick-scan-at-acl.html"&gt;Wordle&lt;/a&gt; to visualize textual information using frequency counts. Change.gov, Obama's transition team website &lt;a href="http://change.gov/newsroom/entry/join_the_discussion_daschles_healthcare_response/"&gt;uses it&lt;/a&gt; on the comments in response to their health care system. This is very interesting but I think Wordle should display top 100 collocations instead of top 100 words. But oh, we also learnt at last ACL how to &lt;a href="http://www.blogger.com/www.aclweb.org/anthology-new/P/P08/P08-1075.pdf"&gt;learn collocation information from unigram frequencies&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7732226477253686468-433062886769014657?l=resnotebook.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://resnotebook.blogspot.com/feeds/433062886769014657/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7732226477253686468&amp;postID=433062886769014657' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/433062886769014657'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/433062886769014657'/><link rel='alternate' type='text/html' href='http://resnotebook.blogspot.com/2008/12/and-we.html' title='And we&apos;re back ...'/><author><name>Delip Rao</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7732226477253686468.post-4732591426486768248</id><published>2008-07-17T05:10:00.000-07:00</published><updated>2008-07-17T05:28:42.926-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='ACL'/><category scheme='http://www.blogger.com/atom/ns#' term='NLP'/><title type='text'>Too many cooks?</title><content type='html'>Computational Linguistics is becoming like the &lt;a href="http://www.sciencemag.org/current.dtl"&gt;Science&lt;/a&gt; or &lt;a href="http://www.nature.com/nature/index.html"&gt;Nature&lt;/a&gt;. For instance, see &lt;a href="http://www.mitpressjournals.org/doi/abs/10.1162/coli.2008.07-055-R2-06-29"&gt;this paper&lt;/a&gt; in the current issue: (In this case, the broth wasn't spoiled ;-)&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp3.blogger.com/_O_xB_EK1QO4/SH83NvewHQI/AAAAAAAAAlY/fKe9DdQPyfQ/s1600-h/Picture+2.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://bp3.blogger.com/_O_xB_EK1QO4/SH83NvewHQI/AAAAAAAAAlY/fKe9DdQPyfQ/s400/Picture+2.png" alt="" id="BLOGGER_PHOTO_ID_5223954801996340482" border="0" /&gt;&lt;/a&gt;Guess which paper has the largest number of authors on the &lt;a href="http://aclweb.org/anthology-new/"&gt;ACL anthology&lt;/a&gt;?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7732226477253686468-4732591426486768248?l=resnotebook.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://resnotebook.blogspot.com/feeds/4732591426486768248/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7732226477253686468&amp;postID=4732591426486768248' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/4732591426486768248'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/4732591426486768248'/><link rel='alternate' type='text/html' href='http://resnotebook.blogspot.com/2008/07/too-many-cooks.html' title='Too many cooks?'/><author><name>Delip Rao</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://bp3.blogger.com/_O_xB_EK1QO4/SH83NvewHQI/AAAAAAAAAlY/fKe9DdQPyfQ/s72-c/Picture+2.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7732226477253686468.post-1539681135760948429</id><published>2008-07-08T10:55:00.000-07:00</published><updated>2008-07-08T11:34:33.508-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='theory'/><category scheme='http://www.blogger.com/atom/ns#' term='NIPS'/><category scheme='http://www.blogger.com/atom/ns#' term='ML'/><category scheme='http://www.blogger.com/atom/ns#' term='machine learning'/><title type='text'>To theory or not to theory</title><content type='html'>I stumbled upon this paper "Reflections after Refereeing Papers for NIPS" by &lt;a href="http://www.stat.berkeley.edu/%7Ebreiman/"&gt;Leo Breiman&lt;/a&gt; that gives some really candid insights into theory papers. (Unfortunately, I could not find a soft copy to share, except &lt;a href="http://direct.bl.uk/bld/PlaceOrder.do?UIN=026632341&amp;amp;ETOC=EN&amp;amp;from=searchengine"&gt;this link&lt;/a&gt;.) Some noteworthy observations:&lt;br /&gt;&lt;blockquote&gt;&lt;/blockquote&gt;&lt;blockquote&gt;"No theorems" implies "No theory"&lt;br /&gt;&lt;br /&gt;"... more than 99% of the published papers are useless exercises."&lt;br /&gt;&lt;br /&gt;"Mathematical theory is not critical to development of machine learning."&lt;br /&gt;&lt;br /&gt;"Our fields would be better off with far fewer theorems, less emphasis on faddish stuff, and much more into scientific inquiry and engineering."&lt;br /&gt;&lt;/blockquote&gt;&lt;br /&gt;I really liked this article, especially coming from someone who has been working in theory all his life but I would still prefer reading papers giving theoretical insight, however useless, than pages and pages of feature engineering &amp;amp; experimentation using classifier X on problem Y -- the current trend at ACL.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7732226477253686468-1539681135760948429?l=resnotebook.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://resnotebook.blogspot.com/feeds/1539681135760948429/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7732226477253686468&amp;postID=1539681135760948429' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/1539681135760948429'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/1539681135760948429'/><link rel='alternate' type='text/html' href='http://resnotebook.blogspot.com/2008/07/to-theory-or-not-to-theory.html' title='To theory or not to theory'/><author><name>Delip Rao</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7732226477253686468.post-1264975633512369098</id><published>2008-07-07T06:49:00.000-07:00</published><updated>2008-07-07T06:58:00.163-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='ACL'/><category scheme='http://www.blogger.com/atom/ns#' term='NLP'/><title type='text'>A quick scan at ACL</title><content type='html'>&lt;a href="http://mendicantbug.com/"&gt;Mendicant Bug&lt;/a&gt; informs about a new tag-cloud service called &lt;a href="http://wordle.net/"&gt;Wordle&lt;/a&gt;. Here is a look at this year's ACL. Gives a clear idea of what is going on! A larger image is available &lt;a href="http://wordle.net/gallery/wrdl/55900/ACL2008"&gt;here&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp3.blogger.com/_O_xB_EK1QO4/SHIfnk7dNQI/AAAAAAAAAlQ/kK-ZqPhGcCk/s1600-h/Picture+1.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://bp3.blogger.com/_O_xB_EK1QO4/SHIfnk7dNQI/AAAAAAAAAlQ/kK-ZqPhGcCk/s400/Picture+1.png" alt="" id="BLOGGER_PHOTO_ID_5220269682864239874" border="0" /&gt;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7732226477253686468-1264975633512369098?l=resnotebook.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://resnotebook.blogspot.com/feeds/1264975633512369098/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7732226477253686468&amp;postID=1264975633512369098' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/1264975633512369098'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/1264975633512369098'/><link rel='alternate' type='text/html' href='http://resnotebook.blogspot.com/2008/07/quick-scan-at-acl.html' title='A quick scan at ACL'/><author><name>Delip Rao</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://bp3.blogger.com/_O_xB_EK1QO4/SHIfnk7dNQI/AAAAAAAAAlQ/kK-ZqPhGcCk/s72-c/Picture+1.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7732226477253686468.post-3029458617144851675</id><published>2008-05-11T22:43:00.000-07:00</published><updated>2008-05-11T22:51:39.251-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='semantic web'/><category scheme='http://www.blogger.com/atom/ns#' term='IR'/><category scheme='http://www.blogger.com/atom/ns#' term='search'/><category scheme='http://www.blogger.com/atom/ns#' term='NLP'/><category scheme='http://www.blogger.com/atom/ns#' term='Information Retrieval'/><title type='text'>Powerset Natural Language Search</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp1.blogger.com/_O_xB_EK1QO4/SCfaF7BJ-NI/AAAAAAAAAj0/j2J_Hf_WmQs/s1600-h/Picture+1.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://bp1.blogger.com/_O_xB_EK1QO4/SCfaF7BJ-NI/AAAAAAAAAj0/j2J_Hf_WmQs/s400/Picture+1.png" alt="" id="BLOGGER_PHOTO_ID_5199364090099267794" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;a href="http://www.powerset.com"&gt;Powerset&lt;/a&gt;, a company we only remember seeing as conference sponsors, now actually has something working. After receiving an email from them, I tried out several queries. At best, it seems to answer most Wh-questions and certain whole-part relations.&lt;br /&gt;&lt;br /&gt;Try out the &lt;a href="http://www.google.com/search?q=Who+is+Bart+Simpson%27s+father%3F"&gt;same query on Google&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7732226477253686468-3029458617144851675?l=resnotebook.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://resnotebook.blogspot.com/feeds/3029458617144851675/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7732226477253686468&amp;postID=3029458617144851675' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/3029458617144851675'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/3029458617144851675'/><link rel='alternate' type='text/html' href='http://resnotebook.blogspot.com/2008/05/powerset-natural-language-search.html' title='Powerset Natural Language Search'/><author><name>Delip Rao</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://bp1.blogger.com/_O_xB_EK1QO4/SCfaF7BJ-NI/AAAAAAAAAj0/j2J_Hf_WmQs/s72-c/Picture+1.png' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7732226477253686468.post-8427848024892135427</id><published>2008-04-03T20:35:00.000-07:00</published><updated>2008-04-03T20:38:09.460-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='writing'/><title type='text'>Writing style</title><content type='html'>The sweetest thing ever written in a paper:  "The reader who is unfamiliar with this field or who has allowed his or her facility with some of its concepts to fall into disrepair may profit from a brief perusal of Feller (1950) and Gallagher (1968)."&lt;br /&gt;&lt;br /&gt; - Brown et. al., "Class based n-gram Models of Natural Language.", Computational Linguistics, 1990&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7732226477253686468-8427848024892135427?l=resnotebook.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://resnotebook.blogspot.com/feeds/8427848024892135427/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7732226477253686468&amp;postID=8427848024892135427' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/8427848024892135427'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/8427848024892135427'/><link rel='alternate' type='text/html' href='http://resnotebook.blogspot.com/2008/04/writing-style.html' title='Writing style'/><author><name>Delip Rao</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7732226477253686468.post-1882041265773451165</id><published>2008-03-28T11:54:00.000-07:00</published><updated>2008-03-28T12:05:14.441-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='ACL'/><category scheme='http://www.blogger.com/atom/ns#' term='research'/><category scheme='http://www.blogger.com/atom/ns#' term='NLP'/><title type='text'>Searching ACL anthology</title><content type='html'>If you are looking up the &lt;a href="http://acl.ldc.upenn.edu/"&gt;ACL anthology&lt;/a&gt; regularly, my friend &lt;a href="http://www.clsp.jhu.edu/%7Emarkus/"&gt;Markus&lt;/a&gt; has a nice firefox search plugin to do that. You can get that and others from &lt;a href="http://mycroft.mozdev.org/download.html?category=14&amp;amp;country=WW&amp;amp;language=all"&gt;this page&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp0.blogger.com/_O_xB_EK1QO4/R-1BA3HSN0I/AAAAAAAAAi8/bzq5dn4BV-w/s1600-h/Picture+1.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://bp0.blogger.com/_O_xB_EK1QO4/R-1BA3HSN0I/AAAAAAAAAi8/bzq5dn4BV-w/s400/Picture+1.png" alt="" id="BLOGGER_PHOTO_ID_5182870229223618370" border="0" /&gt;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7732226477253686468-1882041265773451165?l=resnotebook.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://resnotebook.blogspot.com/feeds/1882041265773451165/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7732226477253686468&amp;postID=1882041265773451165' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/1882041265773451165'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/1882041265773451165'/><link rel='alternate' type='text/html' href='http://resnotebook.blogspot.com/2008/03/searching-acl-anthology.html' title='Searching ACL anthology'/><author><name>Delip Rao</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://bp0.blogger.com/_O_xB_EK1QO4/R-1BA3HSN0I/AAAAAAAAAi8/bzq5dn4BV-w/s72-c/Picture+1.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7732226477253686468.post-2848863492815245577</id><published>2008-03-27T21:26:00.000-07:00</published><updated>2008-03-27T22:20:52.966-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='ACL'/><category scheme='http://www.blogger.com/atom/ns#' term='research'/><category scheme='http://www.blogger.com/atom/ns#' term='NLP'/><title type='text'>ACL accepted papers</title><content type='html'>&lt;span style="font-size:100%;"&gt;&lt;span style="font-family:georgia;"&gt;Hal posted a while back about the &lt;/span&gt;&lt;a style="font-family: georgia;" href="http://nlpers.blogspot.com/2008/03/acl-papers-up.html"&gt;ACL accepted papers&lt;/a&gt;&lt;span style="font-family:georgia;"&gt; that I just read now -- I've been living under a rock for some time. You can get a printer friendly version &lt;/span&gt;&lt;a style="font-family: georgia;" href="http://cs.jhu.edu/%7Edelip/misc/acl08.html"&gt;here&lt;/a&gt;&lt;span style="font-family:georgia;"&gt;. I know, my paper did not make it to that list :(&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:georgia;"&gt;New additions to my reading list:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-style: italic;font-family:georgia;font-size:100%;"  &gt;Distributional Identification of Non-Referential Pronouns&lt;/span&gt;&lt;span style="font-size:100%;"&gt;&lt;br /&gt;&lt;span style="font-family:georgia;"&gt;Shane Bergsma, Dekang Lin and Randy Goebel&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-style: italic;font-family:georgia;font-size:100%;"  &gt;An Unsupervised Approach to Biography Production using Wikipedia&lt;/span&gt;&lt;span style="font-size:100%;"&gt;&lt;br /&gt;&lt;span style="font-family:georgia;"&gt;Fadi Biadsy, Julia Hirschberg and Elena Filatova&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-style: italic;font-family:georgia;font-size:100%;"  &gt;Resolving Personal Names in Email Using Context Expansion&lt;/span&gt;&lt;span style="font-size:100%;"&gt;&lt;br /&gt;&lt;span style="font-family:georgia;"&gt;Tamer Elsayed, Douglas Oard and Galileo Namata&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-style: italic;font-family:georgia;font-size:100%;"  &gt;Mining Wiki Resources for Multilingual Named Entity Recognition&lt;/span&gt;&lt;span style="font-size:100%;"&gt;&lt;br /&gt;&lt;span style="font-family:georgia;"&gt;Alexander Richman and Patrick Schone&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-style: italic;font-family:georgia;font-size:100%;"  &gt;Inducing Gazetteers for Named Entity Recognition by Large-scale Clustering of Dependency Relations&lt;/span&gt;&lt;span style="font-size:100%;"&gt;&lt;br /&gt;&lt;span style="font-family:georgia;"&gt;Jun'ichi Kazama and Kentaro Torisawa&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-style: italic;font-family:georgia;font-size:100%;"  &gt;Name Translation in Statistical Machine Translation - Learning When to Transliterate&lt;/span&gt;&lt;span style="font-size:100%;"&gt;&lt;br /&gt;&lt;span style="font-family:georgia;"&gt;Ulf Hermjakob, Kevin Knight and Hal Daume&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-style: italic;font-family:georgia;font-size:100%;"  &gt;The Tradeoffs Between Open and Traditional Relation Extraction&lt;/span&gt;&lt;span style="font-size:100%;"&gt;&lt;br /&gt;&lt;span style="font-family:georgia;"&gt;Michele Banko and Oren Etzioni&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:georgia;"&gt;(Longest paper title)&lt;/span&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-style: italic;font-family:georgia;font-size:100%;"  &gt;Unsupervised Discovery of Generic Relationships Using Pattern Clusters and its Evaluation by Automatically Generated SAT Analogy Questions&lt;/span&gt;&lt;span style="font-size:100%;"&gt;&lt;br /&gt;&lt;span style="font-family:georgia;"&gt;Dmitry Davidov and Ari Rappoport&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-style: italic;font-family:georgia;font-size:100%;"  &gt;Finding Contradictions in Text&lt;/span&gt;&lt;span style="font-size:100%;"&gt;&lt;br /&gt;&lt;span style="font-family:georgia;"&gt;Marie-Catherine de Marneffe, Anna Rafferty and Christopher Manning&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-style: italic;font-family:georgia;font-size:100%;"  &gt;Extracting Question-Context-Answer Triples from Online Forums&lt;/span&gt;&lt;span style="font-size:100%;"&gt;&lt;br /&gt;&lt;span style="font-family:georgia;"&gt;Shilin Ding, Gao Cong, Chin-Yew Lin and Xiaoyan Zhu&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-style: italic;font-family:georgia;font-size:100%;"  &gt;EM Can Find Pretty Good HMM POS-Taggers (When Given a Good Start)&lt;/span&gt;&lt;span style="font-size:100%;"&gt;&lt;br /&gt;&lt;span style="font-family:georgia;"&gt;Yoav Goldberg, Meni Adler and Michael Elhadad&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-style: italic;font-family:georgia;font-size:100%;"  &gt;Extraction of Entailed Semantic Relations Through Syntax-based Comma Resolution&lt;/span&gt;&lt;span style="font-size:100%;"&gt;&lt;br /&gt;&lt;span style="font-family:georgia;"&gt;Vivek Srikumar, Roi Reichart, Mark Sammons, Ari Rappoport and Dan Roth&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-style: italic;font-family:georgia;font-size:100%;"  &gt;Learning Bigrams from Unigrams&lt;/span&gt;&lt;span style="font-size:100%;"&gt;&lt;br /&gt;&lt;span style="font-family:georgia;"&gt;Xiaojin Zhu, Andrew Goldberg, Michael Rabbat and Robert Nowak&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-style: italic;font-family:georgia;font-size:100%;"  &gt;Evaluating Roget's Thesauri&lt;/span&gt;&lt;span style="font-size:100%;"&gt;&lt;br /&gt;&lt;span style="font-family:georgia;"&gt;Alistair Kennedy and Stan Szpakowicz&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic;font-family:georgia;" &gt;Randomized Language Models via Perfect Hash Functions&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:georgia;"&gt;David Talbot and Thorsten Brants&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic;font-family:georgia;" &gt;Solving Relational Similarity Problems Using the Web as a Corpus&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:georgia;"&gt;Preslav Nakov and Marti Hearst&lt;/span&gt;&lt;br /&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7732226477253686468-2848863492815245577?l=resnotebook.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://resnotebook.blogspot.com/feeds/2848863492815245577/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7732226477253686468&amp;postID=2848863492815245577' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/2848863492815245577'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/2848863492815245577'/><link rel='alternate' type='text/html' href='http://resnotebook.blogspot.com/2008/03/acl-accepted-papers.html' title='ACL accepted papers'/><author><name>Delip Rao</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7732226477253686468.post-4958967804814637532</id><published>2008-02-24T17:50:00.001-08:00</published><updated>2008-02-24T17:50:28.820-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='NLP'/><category scheme='http://www.blogger.com/atom/ns#' term='Geek Humor'/><title type='text'>What do you do?</title><content type='html'>&lt;span style="font-family: trebuchet ms;font-family:georgia;" &gt;As a grad student working on NLP how do you explain what you are working on, to friends and family? I inevitably end up referring to the Google search engine even though what I do is quite far from IR. Actually, thats not true. These days IR seems to consume everything but thats another story.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: trebuchet ms;font-family:georgia;" &gt;This reminds me of a funny conversation at CLSP recently:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: trebuchet ms;font-family:georgia;" &gt;Sanjeev is telling us about an incident where a concerned parent of a young child with a speaking disability is asking him for his opinion. Apparently, she is confused about "Language and Speech Processing" in CLSP.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: trebuchet ms;font-family:georgia;" &gt;Keith butts in: "Run a few more iterations of EM and he'll be fine."&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7732226477253686468-4958967804814637532?l=resnotebook.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://resnotebook.blogspot.com/feeds/4958967804814637532/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7732226477253686468&amp;postID=4958967804814637532' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/4958967804814637532'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/4958967804814637532'/><link rel='alternate' type='text/html' href='http://resnotebook.blogspot.com/2008/02/what-do-you-do.html' title='What do you do?'/><author><name>Delip Rao</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7732226477253686468.post-2511804754707907521</id><published>2008-02-14T01:23:00.000-08:00</published><updated>2008-02-14T01:59:11.602-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Parsing'/><category scheme='http://www.blogger.com/atom/ns#' term='NLP'/><category scheme='http://www.blogger.com/atom/ns#' term='Geek Humor'/><title type='text'>A song on parsing</title><content type='html'>We all know &lt;a href="http://cs.jhu.edu/%7Ejason/"&gt;Jason&lt;/a&gt;'s love for parsing from &lt;a href="http://cs.jhu.edu/%7Ejason/research.html"&gt;his work&lt;/a&gt; but it takes a different level of dedication to write a Valentine's Day &lt;a href="http://cs.jhu.edu/%7Ejason/fun/grammar-and-the-sentence/"&gt;song about parsing&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;As Jason says, "Parsers just want to be appreciated, like everyone else."&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7732226477253686468-2511804754707907521?l=resnotebook.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://resnotebook.blogspot.com/feeds/2511804754707907521/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7732226477253686468&amp;postID=2511804754707907521' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/2511804754707907521'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/2511804754707907521'/><link rel='alternate' type='text/html' href='http://resnotebook.blogspot.com/2008/02/song-on-parsing.html' title='A song on parsing'/><author><name>Delip Rao</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7732226477253686468.post-6474563260346081323</id><published>2007-10-17T17:07:00.000-07:00</published><updated>2007-10-17T17:12:39.058-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='statistics'/><category scheme='http://www.blogger.com/atom/ns#' term='math'/><category scheme='http://www.blogger.com/atom/ns#' term='Geek Humor'/><category scheme='http://www.blogger.com/atom/ns#' term='Humor'/><title type='text'>Funny bone</title><content type='html'>&lt;span style="font-family:trebuchet ms;"&gt;The frequentist exclaimed, "All your Bayes are belong to us!" to which the Bayesian responded, "Well, it depends."&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7732226477253686468-6474563260346081323?l=resnotebook.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://resnotebook.blogspot.com/feeds/6474563260346081323/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7732226477253686468&amp;postID=6474563260346081323' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/6474563260346081323'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/6474563260346081323'/><link rel='alternate' type='text/html' href='http://resnotebook.blogspot.com/2007/10/funny-bone.html' title='Funny bone'/><author><name>Delip Rao</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7732226477253686468.post-3497417276278114080</id><published>2007-09-20T22:58:00.000-07:00</published><updated>2007-09-20T23:09:21.150-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='NIPS'/><category scheme='http://www.blogger.com/atom/ns#' term='ML'/><category scheme='http://www.blogger.com/atom/ns#' term='learning'/><category scheme='http://www.blogger.com/atom/ns#' term='machine learning'/><title type='text'>NIPS papers are out</title><content type='html'>&lt;span style="font-family: verdana;"&gt;For a full list see &lt;/span&gt;&lt;a style="font-family: verdana;" href="http://nips07.stanford.edu/accepted_papers.html"&gt;here&lt;/a&gt;&lt;span style="font-family: verdana;"&gt;. Some papers I want to read based on my current interests:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: verdana;"&gt;Random Projections for Manifold Learning&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: verdana;"&gt;Chinmay Hegde, Michael Wakin, Richard Baraniuk&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: verdana;"&gt;The Distribution Family of Similarity Distances&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: verdana;"&gt;Gertjan Burghouts, Arnold Smeulders, Jan-Mark Geusebroek&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: verdana;"&gt;Manifold Sculpting&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: verdana;"&gt;Michael Gashler, Dan Ventura, Tony Martinez&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: verdana;"&gt;A learning framework for nearest neighbor search&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: verdana;"&gt;Lawrence Cayton, Sanjoy Dasgupta&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: verdana;"&gt;Learning Bounds for Domain Adaptation&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: verdana;"&gt;John Blitzer, Koby Crammer, Alex Kulesza, Fernando Pereira, Jennifer Wortman&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: verdana;"&gt;Convex Relaxations of EM&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: verdana;"&gt;Yuhong Guo, Dale Schuurmans&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: verdana;"&gt;A Randomized Algorithm for Large Scale Support Vector Learning&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: verdana;"&gt;Krishnan Kumar, Chiru Bhattacharya, Ramesh Hariharan&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: verdana;"&gt;Bundle Methods for Machine Learning&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: verdana;"&gt;Alex Smola, S V N Vishwanathan, Quoc Le&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: verdana;"&gt;Regularized Boost for Semi-Supervised Learning&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: verdana;"&gt;Ke Chen, Shihai Wang&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: verdana;"&gt;Learning the structure of manifolds using random projections&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: verdana;"&gt;Yoav Freund, Sanjoy Dasgupta, Mayank Kabra, Nakul Verma&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: verdana;"&gt;A complexity measure for intuitive theories&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: verdana;"&gt;Charles Kemp, Noah Goodman, Joshua Tenenbaum&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7732226477253686468-3497417276278114080?l=resnotebook.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://resnotebook.blogspot.com/feeds/3497417276278114080/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7732226477253686468&amp;postID=3497417276278114080' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/3497417276278114080'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/3497417276278114080'/><link rel='alternate' type='text/html' href='http://resnotebook.blogspot.com/2007/09/nips-papers-are-out.html' title='NIPS papers are out'/><author><name>Delip Rao</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7732226477253686468.post-3697259487579971802</id><published>2007-08-18T00:40:00.000-07:00</published><updated>2007-08-18T00:54:25.570-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='EMNLP'/><category scheme='http://www.blogger.com/atom/ns#' term='NLP'/><category scheme='http://www.blogger.com/atom/ns#' term='Humor'/><title type='text'>NLP and Global Warming</title><content type='html'>&lt;span style="font-family:verdana;"&gt;Those of us who were at EMNLP-CONLL 2007 remember the "NLP and Global Warming" exchange between James Clarke, Jason Eisner, and Dan Bikel at the Q/A session of the &lt;/span&gt;&lt;a style="font-family: verdana;" href="http://acl.ldc.upenn.edu/D/D07/D07-1001.pdf"&gt;Clarke and Lapata paper&lt;/a&gt;&lt;span style="font-family:verdana;"&gt;. The transcript of this funny conversation is now &lt;/span&gt;&lt;a style="font-family: verdana;" href="http://www.cs.jhu.edu/%7Ejason/advice/conf/NLP-and-global-warming.html"&gt;online&lt;/a&gt;&lt;span style="font-family:verdana;"&gt;, thanks to Jason.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:verdana;"&gt;I really liked Hal's ending remark.&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7732226477253686468-3697259487579971802?l=resnotebook.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://resnotebook.blogspot.com/feeds/3697259487579971802/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7732226477253686468&amp;postID=3697259487579971802' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/3697259487579971802'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/3697259487579971802'/><link rel='alternate' type='text/html' href='http://resnotebook.blogspot.com/2007/08/nlp-and-global-warming.html' title='NLP and Global Warming'/><author><name>Delip Rao</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7732226477253686468.post-6587353228043902504</id><published>2007-08-15T16:16:00.001-07:00</published><updated>2007-08-18T01:03:14.191-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='KDD'/><category scheme='http://www.blogger.com/atom/ns#' term='IR'/><category scheme='http://www.blogger.com/atom/ns#' term='data mining'/><category scheme='http://www.blogger.com/atom/ns#' term='search'/><category scheme='http://www.blogger.com/atom/ns#' term='NLP'/><title type='text'>People Search on the Web</title><content type='html'>&lt;span style="font-family:verdana;"&gt;Wired has an &lt;/span&gt;&lt;a style="font-family: verdana;" href="http://www.wired.com/techbiz/startups/news/2007/08/spock_reputation"&gt;article&lt;/a&gt;&lt;span style="font-family:verdana;"&gt; about &lt;/span&gt;&lt;a style="font-family: verdana;" href="http://www.spock.com/"&gt;spock.com&lt;/a&gt;&lt;span style="font-family:verdana;"&gt;, a people search engine that combines crawled and user added content. From the few searches I did, looks like this is good for celebrity names than a regular person with web content. For instance, searching a name like "David Smith" produces these &lt;a href="http://www.spock.com/q/David-Smith"&gt;results&lt;/a&gt;. Of the top 10 results, only 3 of them actually have the name "David Smith" or something closer and the first result is not one of them. Compare this with a general purpose search engine like &lt;/span&gt;&lt;a style="font-family: verdana;" href="http://www.google.com/search?source=ig&amp;hl=en&amp;amp;q=David+Smith"&gt;Google&lt;/a&gt;&lt;span style="font-family:verdana;"&gt;. Among a dozen random NLP/ML academic names (professors) I tried, it only got Jason Eisner and Tom Mitchell correct. One reason for this poor recall is probably they don't get content from user home pages.&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:verdana;"&gt;(Some sites where this data is derived from include MySpace, Friendster, IMDB, Wikipedia, ratemyprofessors.com, etc.)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:verdana;"&gt;Nevertheless, this website is a representative of interesting KDD-style problems that one could do with people names. It is also interesting as people names that we look for fall in the "long tail" without sufficient data to support calling for clever machine learning techniques.&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7732226477253686468-6587353228043902504?l=resnotebook.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://resnotebook.blogspot.com/feeds/6587353228043902504/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7732226477253686468&amp;postID=6587353228043902504' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/6587353228043902504'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/6587353228043902504'/><link rel='alternate' type='text/html' href='http://resnotebook.blogspot.com/2007/08/people-search-on-web_15.html' title='People Search on the Web'/><author><name>Delip Rao</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7732226477253686468.post-2268172507132234245</id><published>2007-08-12T12:30:00.000-07:00</published><updated>2007-08-12T13:33:40.681-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='patents'/><category scheme='http://www.blogger.com/atom/ns#' term='data mining'/><category scheme='http://www.blogger.com/atom/ns#' term='NLP'/><category scheme='http://www.blogger.com/atom/ns#' term='&quot;machine learning&quot;'/><title type='text'>Digital Reasoning awarded contextual similarity patent?</title><content type='html'>&lt;span style="font-family:verdana;"&gt;I was lead to &lt;/span&gt;&lt;a style="font-family: verdana;" href="http://www.forbes.com/businesswire/feeds/businesswire/2007/07/31/businesswire20070731005886r1.html"&gt;this article&lt;/a&gt;&lt;span style="font-family:verdana;"&gt; on Forbes via &lt;/span&gt;&lt;a style="font-family: verdana;" href="http://www.inma.ucl.ac.be/%7Efrancois/blog/entries/entry_594.php"&gt;Damien's post&lt;/a&gt;&lt;span style="font-family:verdana;"&gt;.  The article is about a company Digital Reasoning getting patent on what sounded to me as contextual similarity. Their "white paper" makes reference to a &lt;/span&gt;&lt;a style="font-family: verdana;" href="http://tinyurl.com/2h6nz4"&gt;patent number 7249117&lt;/a&gt;&lt;span style="font-family:verdana;"&gt; (via USPTO). Unlike research papers, reading the patent document was so difficult. Will get to it sometime later but here is an extract from their press release about what their technology can do.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;* Learn the meanings of words, classes of words, and other symbols based on how they are used in context in natural language&lt;br /&gt;* Create and manipulate models of this "meaning" - i.e. the mathematical patterns of usage - including the detection of groups or similar categories of words or development of hierarchies or creation of relationships between words&lt;br /&gt;* Improve the models based on human feedback or using other         structured information after model construction &lt;br /&gt;* The representation or sharing of this model or learning in an ontology, graph structure, or programming languages&lt;/blockquote&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:verdana;"&gt;Anyone from the ACL/ML/AI community can immediately recognize this and start citing their favorite papers on these topics starting from at least a decade ago. A promotional video from the company on YouTube can be found &lt;/span&gt;&lt;a style="font-family: verdana;" href="http://www.youtube.com/watch?v=R5ihr4kx3dQ"&gt;here&lt;/a&gt;&lt;span style="font-family:verdana;"&gt;. Excerpt from the video: "... We treat the text representation of human language as a signal ... ". &lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:verdana;"&gt;I think everyone should stop taking patents seriously. Wishful thinking?&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7732226477253686468-2268172507132234245?l=resnotebook.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://resnotebook.blogspot.com/feeds/2268172507132234245/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7732226477253686468&amp;postID=2268172507132234245' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/2268172507132234245'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/2268172507132234245'/><link rel='alternate' type='text/html' href='http://resnotebook.blogspot.com/2007/08/digital-reasoning-awarded-contextual.html' title='Digital Reasoning awarded contextual similarity patent?'/><author><name>Delip Rao</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7732226477253686468.post-58766609740961746</id><published>2007-08-02T19:40:00.000-07:00</published><updated>2007-08-02T21:25:18.661-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='research'/><category scheme='http://www.blogger.com/atom/ns#' term='IR'/><category scheme='http://www.blogger.com/atom/ns#' term='NLP'/><category scheme='http://www.blogger.com/atom/ns#' term='Information Retrieval'/><title type='text'>Recommending scientific papers</title><content type='html'>&lt;div style="font-family: verdana;"&gt;I noticed a new feature in Citeseer which tries suggest an "alternate document" for a paper.&lt;br /&gt;&lt;div&gt;&lt;img id="BLOGGER_PHOTO_ID_5094312320294261490" style="margin: 0px auto 10px; display: block; text-align: center; width: 463px; height: 94px;" alt="" src="http://bp1.blogger.com/_O_xB_EK1QO4/RrKiEIIJEvI/AAAAAAAAAEc/olirvV210xw/s400/recommendation.jpg" border="0" /&gt;Clearly it does not do what it implies to do and it doesn't show up for all papers. (Experimental?) So, an interesting question is how does one recommend scientific papers? Something more than mere document similarity is required. If I am reading a CRF paper then there is no point in listing all papers containing similar words. Just listing nodes connected to inward and outward links of the paper in the citation graph wont suffice either. Ideal recommendations for a paper would depend on the role the user is playing. When I am reading a paper about some new topic, I would like to get pointed to original papers on the topic, some recent papers on the topic, and may be some survey papers or books. On the other hand when I am writing a paper, I would like to be pointed to all papers related to the topic (recall important than precision here to avoid reviewer comments on "missing reference") in some magical order that puts papers more relevant to your work above. Also these papers might not be related in directly through citations. If there is a recent related work in the Annals of Statistics, for instance, then it should show up when I am working on, say, approximate inference methods for graphical models. (Possible to deduce this from my previous queries?)&lt;br /&gt;&lt;br /&gt;In spite of more information being present in a scientific paper than its text, recommending or ranking papers appears to be quite challenging.&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7732226477253686468-58766609740961746?l=resnotebook.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://resnotebook.blogspot.com/feeds/58766609740961746/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7732226477253686468&amp;postID=58766609740961746' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/58766609740961746'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/58766609740961746'/><link rel='alternate' type='text/html' href='http://resnotebook.blogspot.com/2007/08/recommending-scientific-papers.html' title='Recommending scientific papers'/><author><name>Delip Rao</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://bp1.blogger.com/_O_xB_EK1QO4/RrKiEIIJEvI/AAAAAAAAAEc/olirvV210xw/s72-c/recommendation.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7732226477253686468.post-3685487613637899115</id><published>2007-07-26T22:14:00.000-07:00</published><updated>2007-07-26T23:31:05.068-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='research'/><category scheme='http://www.blogger.com/atom/ns#' term='Google'/><category scheme='http://www.blogger.com/atom/ns#' term='NLP'/><title type='text'>Google allows data binging for researchers</title><content type='html'>Google now opened access to university researchers to its search and MT systems in &lt;a href="http://googleresearch.blogspot.com/2007/07/drink-from-firehose-with-university.html"&gt;today's announcement&lt;/a&gt; on their &lt;a href="http://googleresearch.blogspot.com/"&gt;research blog&lt;/a&gt;. The search API documentation does not mention any restriction on the number of queries that can be posted for search (The earlier limit was 1000). Whatever the number is I am guessing it will be large (&lt;span style="font-style: italic;"&gt;Drinking from the firehose?&lt;/span&gt;). However, the MT API allows 1000 queries per day with the documentation hinting that this need not be a hard limit.&lt;br /&gt;&lt;br /&gt;Looking at the search API output, two things I really miss is the number of hits and the snippet for each search result. The number of hits has been used in several papers for &lt;a href="http://portal.acm.org/citation.cfm?id=1073153"&gt;interesting&lt;/a&gt; &lt;a href="http://www.cwi.nl/%7Epaulv/papers/amdug.pdf"&gt;results&lt;/a&gt;. The other useful feature is snippets. Every search result from Google is accompanied by a small snippet extracted from the page, as shown below for an example query "Dekang Lin".&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp2.blogger.com/_O_xB_EK1QO4/RqmHhoIJEtI/AAAAAAAAAEM/BEERi5Y7WjM/s1600-h/snippet.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://bp2.blogger.com/_O_xB_EK1QO4/RqmHhoIJEtI/AAAAAAAAAEM/BEERi5Y7WjM/s400/snippet.jpg" alt="" id="BLOGGER_PHOTO_ID_5091749865496056530" border="0" /&gt;&lt;/a&gt;The information in the snippets can be used as informative features in different tasks like this one in &lt;a href="http://www.cs.jhu.edu/%7Engarera/publications/snippetsSEMEVAL07.pdf"&gt;person name disambiguation&lt;/a&gt;. (BTW, Dekang is now at Google)&lt;br /&gt;&lt;br /&gt;Despite these minor quibbles, these new APIs will be quite useful to all of us and will certainly result in more papers on &lt;a href="http://portal.acm.org/citation.cfm?id=1245144"&gt;Googleology&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;Later addition:&lt;/span&gt; Turns out we can sort of get the counts by simply counting the number of search results by repeatedly executing the request (only ten results per request) but the API caps this limit to 100. That means you could get a maximum of 1000 results. Which is not quite the same as "&lt;span style=""&gt;Results &lt;b&gt;1&lt;/b&gt; - &lt;b&gt;10&lt;/b&gt; of about &lt;b&gt;779,000,000&lt;/b&gt;". &lt;/span&gt;Though that number is approximate, it is still indicative of how strong the query is w.r.t the web. For example GoogleCount("Horse+animal") &gt;&gt; GoogleCount("Horse+truck").&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7732226477253686468-3685487613637899115?l=resnotebook.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://resnotebook.blogspot.com/feeds/3685487613637899115/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7732226477253686468&amp;postID=3685487613637899115' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/3685487613637899115'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/3685487613637899115'/><link rel='alternate' type='text/html' href='http://resnotebook.blogspot.com/2007/07/google-allows-data-binging-for.html' title='Google allows data binging for researchers'/><author><name>Delip Rao</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://bp2.blogger.com/_O_xB_EK1QO4/RqmHhoIJEtI/AAAAAAAAAEM/BEERi5Y7WjM/s72-c/snippet.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7732226477253686468.post-2936846724748152357</id><published>2007-07-24T09:53:00.000-07:00</published><updated>2007-07-24T09:58:26.666-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='IR'/><category scheme='http://www.blogger.com/atom/ns#' term='SIGIR 2007'/><category scheme='http://www.blogger.com/atom/ns#' term='Information Retrieval'/><category scheme='http://www.blogger.com/atom/ns#' term='&quot;machine learning&quot;'/><title type='text'>Readings from SIGIR 2007</title><content type='html'>&lt;span style=";font-family:trebuchet ms;font-size:100%;"  &gt;&lt;a href="http://www.sigir2007.org/"&gt;SIGIR 2007&lt;/a&gt; is happening now at Amsterdam!&lt;br /&gt;&lt;br /&gt;Latent Concept Expansion Using Markov Random Fields, Donald Metzler, Bruce Croft&lt;br /&gt;&lt;br /&gt;Random Walks on the Click Graph, Nick Craswell, Martin Szummer&lt;br /&gt;&lt;br /&gt;Towards Automatic Extraction of Event and Place Semantics from Flickr Tags, Tye Rattenbury, Nathaniel Good, Mor Naaman&lt;br /&gt;&lt;br /&gt;Clustering of Documents with Local and Global Regularization, Fei Wang, Changshui Zhang, Tao Li&lt;br /&gt;&lt;br /&gt;Detecting, Categorizing and Clustering Entity Mentions in Chinese Text, Wenjie Li, Donglei Qian, Chunfa Yuan, Qin Lu&lt;br /&gt;&lt;br /&gt;Principles of Hash-based Text Retrieval, Benno Stein&lt;br /&gt;&lt;br /&gt;DiffusionRank: A Possible Penicillin for Web Spamming, Haixuan Yang, Irwin King, Michael R. Lyu&lt;br /&gt;&lt;br /&gt;Context Sensitive Stemming for Web Search, Fuchun Peng, Nawaaz Ahmed, Xin Li, Yumao Lu&lt;br /&gt;&lt;br /&gt;Combining Content and Link for Classification using Matrix Factorization, Shenghuo Zhu, Kai Yu, Yun Chi, Yihong Gong&lt;br /&gt;&lt;br /&gt;ARSA: A Sentiment-Aware Model for Predicting Sales Performance Using Blogs, Yang Liu, Jimmy Huang, Aijun An, Xiaohui Yu&lt;br /&gt;&lt;br /&gt;Heavy-Tailed Distributions and Multi-Keyword Queries, Arnd Konig, Surajit Chaudhuri, Liying Sui, Kenneth Church&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7732226477253686468-2936846724748152357?l=resnotebook.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://resnotebook.blogspot.com/feeds/2936846724748152357/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7732226477253686468&amp;postID=2936846724748152357' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/2936846724748152357'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/2936846724748152357'/><link rel='alternate' type='text/html' href='http://resnotebook.blogspot.com/2007/07/readings-from-sigir-2007.html' title='Readings from SIGIR 2007'/><author><name>Delip Rao</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7732226477253686468.post-7272624398397213395</id><published>2007-07-23T15:56:00.000-07:00</published><updated>2007-07-23T22:17:27.903-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='&quot;AAAI 2007&quot;'/><category scheme='http://www.blogger.com/atom/ns#' term='reading list'/><category scheme='http://www.blogger.com/atom/ns#' term='NLP'/><category scheme='http://www.blogger.com/atom/ns#' term='&quot;machine learning&quot;'/><title type='text'>Readings from AAAI 2007</title><content type='html'>&lt;a style="font-family: trebuchet ms;" href="http://www.aaai.org/Conferences/AAAI/2007/aaai07program.php"&gt;AAAI 2007&lt;/a&gt;&lt;span style="font-family:trebuchet ms;"&gt; is now going on at &lt;/span&gt;Vancouver&lt;span style="font-family:trebuchet ms;"&gt;. Here is my selection of NLP and Learning papers I would like to know more about.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:100%;"&gt;&lt;span style="font-family:trebuchet ms;"&gt;Deriving a Large-Scale Taxonomy from Wikipedia, &lt;/span&gt;&lt;span style="font-family:trebuchet ms;"&gt;Simone Paolo Ponzetto, Michael Strube&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:trebuchet ms;"&gt;Relation Extraction from Wikipedia Using Subtree Mining, &lt;/span&gt;&lt;span style="font-family:trebuchet ms;"&gt;Dat P. T. Nguyen, Yutaka Matsuo, Mitsuru Ishizuka&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:trebuchet ms;"&gt;Finding Related Pages Using Green Measures: An Illustration with Wikipedia, &lt;/span&gt;&lt;span style="font-family:trebuchet ms;"&gt;Yann Ollivier, Pierre Senellart&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:trebuchet ms;"&gt;Graph Partitioning Based on Link Distributions, &lt;/span&gt;&lt;span style="font-family:trebuchet ms;"&gt;Bo Long, Mark (Zhongfei) Zhang, Philip S. Yu&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:trebuchet ms;"&gt;Semi-supervised Learning by Mixed Label Propagation, &lt;/span&gt;&lt;span style="font-family:trebuchet ms;"&gt;Wei Tong, Rong Jin&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:trebuchet ms;"&gt;Semi-Supervised Learning with Very Few Labeled Training Examples, &lt;/span&gt;&lt;span style="font-family:trebuchet ms;"&gt;Zhi-Hua Zhou, De-Chuan Zhan, Qiang Yang&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:trebuchet ms;"&gt;Clustering with Local and Global Regularization, &lt;/span&gt;&lt;span style="font-family:trebuchet ms;"&gt;Fei Wang, Changshui Zhang, Tao Li&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:trebuchet ms;"&gt;Isometric Projection, &lt;/span&gt;&lt;span style="font-family:trebuchet ms;"&gt;Deng Cai, Xiaofei He, Jiawei Han&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:trebuchet ms;"&gt;Improving Similarity Measures for Short Segments of Text, &lt;/span&gt;&lt;span style="font-family:trebuchet ms;"&gt;Wen-tau Yih, Christopher Meek&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:trebuchet ms;"&gt;Topic Segmentation Algorithms for Text Summarization and Passage Retrieval: An Exhaustive Evaluation, &lt;/span&gt;&lt;span style="font-family:trebuchet ms;"&gt;Gaël Dias, Elsa Alves. José Gabriel Pereira Lopes&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:trebuchet ms;"&gt;Robust Estimation of Google Counts for Social Network Extraction, &lt;/span&gt;&lt;span style="font-family:trebuchet ms;"&gt;Yutaka Matsuo, Hironori Tomobe, Takuichi Nishimura&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:trebuchet ms;"&gt;Harvesting Relations from the Web - Quantifiying the Impact of Filtering Functions, &lt;/span&gt;&lt;span style="font-family:trebuchet ms;"&gt;Sebastian Blohm, Philipp Cimiano, Egon Stemle&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:trebuchet ms;"&gt;Template-Independent News Extraction Based on Visual Consistency, &lt;/span&gt;&lt;span style="font-family:trebuchet ms;"&gt;Shuyi Zheng, Ruihua Song, Ji-Rong Wen&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:trebuchet ms;"&gt;Comprehending and Generating Apt Metaphors: A Web-driven, Case-based Approach to Figurative Language, &lt;/span&gt;&lt;span style="font-family:trebuchet ms;"&gt;Tony Veale, Yanfen Hao&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:trebuchet ms;"&gt;Mobile Service for Reputation Extraction from Weblogs - Public Experiment and Evaluation, &lt;/span&gt;&lt;span style="font-family:trebuchet ms;"&gt;Takahiro Kawamura, Shinichi Nagano, Masumi Inaba, Yumiko Mizoguchi&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:trebuchet ms;"&gt;The Impact of Time on the Accuracy of Sentiment Classifiers Created from a Web Log Corpus, &lt;/span&gt;&lt;span style="font-family:trebuchet ms;"&gt;Kathleen T. Durant, Michael D. Smith&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:trebuchet ms;"&gt;Nectar: Learning by Combining Observations and User Edits, &lt;/span&gt;&lt;span style="font-family:trebuchet ms;"&gt;Vittorio Castelli, Lawrence Bergman, Daniel Oblinger&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:trebuchet ms;"&gt;Multi-Label Learning by Instance Differentiation, &lt;/span&gt;&lt;span style="font-family:trebuchet ms;"&gt;Min-Ling Zhang, Zhi-Hua Zhou&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:trebuchet ms;"&gt;Extracting Influential Nodes for Information Diffusion on a Social Network, &lt;/span&gt;&lt;span style="font-family:trebuchet ms;"&gt;Masahiro Kimura, Kazumi Saito, Ryohei Nakano&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:trebuchet ms;"&gt;Temporal and Information Flow Based Event Detection from Social Text Streams, &lt;/span&gt;&lt;span style="font-family:trebuchet ms;"&gt;Qiankun Zhao, Prasenjit Mitra, Bi Chen&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:trebuchet ms;"&gt;Analyzing Reading Behavior by Blog Mining, &lt;/span&gt;&lt;span style="font-family:trebuchet ms;"&gt;Tadanobu Furukawa, Mitsuru Ishizuka, Yutaka Matsuo, Ikki Ohmukai, Koki Uchiyama&lt;/span&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="down" style="display: block;font-family:trebuchet ms;" id="formatbar_CreateLink" title="Link" onmouseover="ButtonHoverOn(this);" onmouseout="ButtonHoverOff(this);" onmouseup="" onmousedown="CheckFormatting(event);FormatbarButton('richeditorframe', this, 8);ButtonMouseDown(this);" &gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7732226477253686468-7272624398397213395?l=resnotebook.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://resnotebook.blogspot.com/feeds/7272624398397213395/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7732226477253686468&amp;postID=7272624398397213395' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/7272624398397213395'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/7272624398397213395'/><link rel='alternate' type='text/html' href='http://resnotebook.blogspot.com/2007/07/readings-from-aaai-2007.html' title='Readings from AAAI 2007'/><author><name>Delip Rao</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7732226477253686468.post-3581776411456299111</id><published>2007-07-18T12:57:00.000-07:00</published><updated>2007-07-18T13:06:55.109-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='KDD'/><category scheme='http://www.blogger.com/atom/ns#' term='research'/><category scheme='http://www.blogger.com/atom/ns#' term='data mining'/><category scheme='http://www.blogger.com/atom/ns#' term='learning'/><category scheme='http://www.blogger.com/atom/ns#' term='reading list'/><title type='text'>Reading List from KDD 2007</title><content type='html'>&lt;span style="font-family:trebuchet ms;"&gt;&lt;a href="http://www.kdd2007.com/"&gt;KDD 2007&lt;/a&gt; will be on Aug 12-15 in the neighborhood at San Jose. Here is my selection:&lt;br /&gt;&lt;br /&gt;"Extracting Semantic Relations from Query Logs",  Ricardo Baeza-Yates and Alessandro Tiberi&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family:trebuchet ms;"&gt;"Efficient Incremental Clustering with Constraints",  Ian Davidson, S.S. Ravi, and Martin Ester&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family:trebuchet ms;"&gt;"A Probabilistic Framework for Relational Clustering",  Bo Long, Zhongfei Zhang, and Philip S. Yu&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family:trebuchet ms;"&gt;"Tracking Multiple Topics for Finding Interesting Articles",  Raymond Pon, Alfonso Cardenas, David Buttler, and Terence Critchlow&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family:trebuchet ms;"&gt;"Feature Selection Methods for Text Classification",  Anirban Dasgupta, Petros Drineas, Boulos Harb, Vanja Josifovski, and Michael Mahoney&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family:trebuchet ms;"&gt;"Hierarchical Mixture Models: a Probabilistic Analysis",  Mark Sandler&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family:trebuchet ms;"&gt;"Information distance from a question to an answer",  Xian Zhang, Yu Hao, Xiaoyan Zhu, and Ming Li&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family:trebuchet ms;"&gt;"Statistical Change Detection for Multi-Dimensional Data",  Xiuyao Song, Mingxi Wu, Chris Jermaine, and Sanjay Ranka&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family:trebuchet ms;"&gt;"Constraint-Driven Clustering",  Rong Ge, Martin Ester, Wen Jin, and Ian Davidson&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family:trebuchet ms;"&gt;"Enhancing Semi-Supervised Clustering: A Feature Projection Perspective",  Wei Tang, Hui Xiong, Shi Zhong, and Jie Wu&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:trebuchet ms;"&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7732226477253686468-3581776411456299111?l=resnotebook.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://resnotebook.blogspot.com/feeds/3581776411456299111/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7732226477253686468&amp;postID=3581776411456299111' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/3581776411456299111'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/3581776411456299111'/><link rel='alternate' type='text/html' href='http://resnotebook.blogspot.com/2007/07/reading-list-from-kdd-2007.html' title='Reading List from KDD 2007'/><author><name>Delip Rao</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7732226477253686468.post-4224423250071692550</id><published>2007-07-17T18:40:00.000-07:00</published><updated>2007-07-18T00:09:04.862-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='India'/><category scheme='http://www.blogger.com/atom/ns#' term='research'/><category scheme='http://www.blogger.com/atom/ns#' term='NLP'/><title type='text'>NLP in India?</title><content type='html'>I was surprised to see &lt;span class="entry-author-name"&gt;&lt;a href="http://blog.outerthoughts.com/"&gt;Alex&lt;/a&gt;'s &lt;a href="http://blog.outerthoughts.com/2007/07/link-nlp-the-indian-perspective/"&gt;post&lt;/a&gt; to which I don't agree fully.&lt;/span&gt;&lt;br /&gt;&lt;blockquote&gt;&lt;span style="font-style: italic;"&gt;... because NLP is so underdeveloped in India, even undergraduate-level projects may be contributing to the cutting edge of research.&lt;/span&gt;&lt;br /&gt;&lt;/blockquote&gt;Turns out he was referring to &lt;a href="http://technigal.wordpress.com/2007/07/16/natural-language-processing-the-indian-perspective/"&gt;this post&lt;/a&gt; from an undergrad which tries to give "the Indian perspective", rather inaccurately. Having worked on NLP at one of the &lt;a href="http://www.iitm.ac.in/"&gt;IIT&lt;/a&gt;s I am compelled to write from a grad student perspective. Sunayana's post is interesting as it brings out several issues in Indic computing.&lt;br /&gt;&lt;br /&gt;1. Lack of annotation data - corpora, treebanks, and aligned texts which are sinews and bones of any language processing system. Resources exist, largely due to the efforts of &lt;a href="http://www.ciil.org/"&gt;CIIL&lt;/a&gt;, various universities and other government agencies but these are dwarfed compared to resources that exist for other languages, like English or the European languages.&lt;br /&gt;&lt;br /&gt;However, the rich morphology in Indian languages can be exploited to mitigate the amount of annotation data required for certain tasks, for instance &lt;a href="http://www.cse.iitb.ac.in/%7Epb/papers/ACL-2006-Hindi-POS-Tagging.pdf"&gt;POS tagging&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;2. Encoding issues - As rightly pointed by Sunayana, before the adoption of unicode, several data sources were locked up in the fonts they use. But things are changing, there is more and more Indian language content in unicode today than ever.  Websites like BBC and Wikipedia are spewing out a lot of content in unicode for those interested in collecting monolingual, comparable corpora. A cursory glance at &lt;a href="http://en.wikipedia.org/wiki/Wikipedia:Multilingual_statistics"&gt;Wikipedia statistics&lt;/a&gt; shows the number of articles in, say &lt;a href="http://stats.wikimedia.org/EN/ChartsWikipediaHI.htm"&gt;Hindi&lt;/a&gt; or &lt;a href="http://stats.wikimedia.org/EN/ChartsWikipediaTA.htm"&gt;Tamil&lt;/a&gt; for example, has more than doubled in the past six months.&lt;br /&gt;&lt;br /&gt;3. Visibility - While there has been an increasing trend to publish in reputed conferences like ICML or ACL, more participation is certainly desirable. &lt;a href="http://www.ijcai-07.org/"&gt;IJCAI 2007&lt;/a&gt; was held in India and I highly recommend, if you are around, to submit (sub. deadline: Jul 31st) and/or attend &lt;a href="http://www.ijcnlp2008.org/"&gt;IJCNLP 2008&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;This is an exciting time to do NLP research on Indian languages. There is both corporate as well as government motivations which translate to grants and support to universities. The group at IIT Bombay, for example, implemented and deployed, local language based systems for helping farmers. Similar efforts have been taken by other institutes. Microsoft research at Bangalore, and IBM research at New Delhi and Bangalore are working on various projects on Indian Languages, including speech recognition.&lt;br /&gt;&lt;br /&gt;At the end of all this, I must partially agree with the quote I made from Alex's blog. Yes, &lt;span style="font-style: italic;"&gt;some&lt;/span&gt; undergrads do make brilliant contributions which is just because of what they have in their bones. This is true for any country or university.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7732226477253686468-4224423250071692550?l=resnotebook.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://resnotebook.blogspot.com/feeds/4224423250071692550/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7732226477253686468&amp;postID=4224423250071692550' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/4224423250071692550'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/4224423250071692550'/><link rel='alternate' type='text/html' href='http://resnotebook.blogspot.com/2007/07/i-was-surprised-to-see-alex-s-post-to.html' title='NLP in India?'/><author><name>Delip Rao</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7732226477253686468.post-4660476835359204508</id><published>2007-07-17T09:42:00.000-07:00</published><updated>2007-07-23T21:40:27.097-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='graduate students'/><category scheme='http://www.blogger.com/atom/ns#' term='research'/><category scheme='http://www.blogger.com/atom/ns#' term='gradlife'/><category scheme='http://www.blogger.com/atom/ns#' term='phd'/><title type='text'>Effect of Spouses on PhD</title><content type='html'>&lt;a style="font-family: trebuchet ms;" href="http://www.people.cornell.edu/pages/jpp34/"&gt;&lt;/a&gt;&lt;a href="http://www.people.cornell.edu/pages/jpp34/"&gt;Joseph Price&lt;/a&gt; studies the effect of marriage on graduation in his &lt;a href="http://www.ilr.cornell.edu/cheri/wp/cheri_wp94.pdf"&gt;paper&lt;/a&gt;, "Does a Spouse Slow You Down?: Marriage and Graduate Student Outcomes".&lt;br /&gt;&lt;br /&gt;Here is a quick abstract:&lt;br /&gt;&lt;blockquote style="font-style: italic;"&gt;Using data on 11,000 graduate students from 100 departments over a 20 year period, I test whether graduate student outcomes (graduation rates, time to degree, publication success, and initial job placement) differ based on a student’s gender and marital status. I find that married men have better outcomes across every measure than single men. Married women do no worse than single women on any measure and actually have more publishing success and complete their degree in less time. The outcomes of cohabiting students generally fall between those of single and married students.&lt;/blockquote&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7732226477253686468-4660476835359204508?l=resnotebook.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://resnotebook.blogspot.com/feeds/4660476835359204508/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7732226477253686468&amp;postID=4660476835359204508' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/4660476835359204508'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/4660476835359204508'/><link rel='alternate' type='text/html' href='http://resnotebook.blogspot.com/2007/07/effect-of-spouses-on-phd.html' title='Effect of Spouses on PhD'/><author><name>Delip Rao</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7732226477253686468.post-3310590613876336480</id><published>2007-07-02T10:57:00.000-07:00</published><updated>2007-07-02T10:59:24.342-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='&quot;machine learning&quot;'/><title type='text'>Papers from COLT: Occam's Hammer</title><content type='html'>&lt;a href="http://hunch.net/?p=272"&gt;John Langford&lt;/a&gt; recommends:&lt;br /&gt;&lt;a href="http://ida.first.fraunhofer.de/%7Eblanchard/"&gt;&lt;/a&gt;&lt;blockquote&gt;&lt;a href="http://ida.first.fraunhofer.de/%7Eblanchard/"&gt;Gilles Blanchard&lt;/a&gt; and &lt;a href="http://cvlab.epfl.ch/%7Efleuret/"&gt;François Fleuret&lt;/a&gt;, &lt;a href="http://cvlab.epfl.ch/%7Efleuret/papers/blanchard-fleuret-colt2007.pdf"&gt;Occam’s Hammer&lt;/a&gt;. When we are interested in very tight bounds on the true error rate of a classifier, it is tempting to use a PAC-Bayes bound which can (empirically) be &lt;a href="http://hunch.net/%7Ejl/projects/prediction_bounds/nn_bound/not_bound_final.ps"&gt;quite tight&lt;/a&gt;. A disadvantage of the PAC-Bayes bound is that it applies to a classifier which is randomized over a set of base classifiers rather than a single classifier. This paper shows that a similar bound can be proved which holds for a single classifier drawn from the set. The ability to safely use a single classifier is very nice. This technique applies generically to any base bound, so it has other applications covered in the paper.&lt;/blockquote&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7732226477253686468-3310590613876336480?l=resnotebook.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://resnotebook.blogspot.com/feeds/3310590613876336480/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7732226477253686468&amp;postID=3310590613876336480' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/3310590613876336480'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/3310590613876336480'/><link rel='alternate' type='text/html' href='http://resnotebook.blogspot.com/2007/07/papers-from-colt-occams-hammer.html' title='Papers from COLT: Occam&apos;s Hammer'/><author><name>Delip Rao</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7732226477253686468.post-1238819228492023191</id><published>2007-07-02T10:47:00.001-07:00</published><updated>2007-07-16T18:57:41.980-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='ICML'/><category scheme='http://www.blogger.com/atom/ns#' term='&quot;machine learning&quot;'/><title type='text'>ICML 2007 reading list</title><content type='html'>&lt;span style="font-family:verdana;"&gt;Some papers I would like reading right away:&lt;br /&gt;&lt;br /&gt;Discriminative Learning for Differing Training and Test Distributions&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:verdana;"&gt;Steffen Bickel - Max Planck Institute for Computer Science, Germany&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:verdana;"&gt;Michael Brüeckner - Max Planck Institute for Computer Science, Germany&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:verdana;"&gt;Tobias Scheffer - Max Planck Institute for Computer Science, Germany &lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:verdana;"&gt;Sparse Eigen Methods by D.C. Programming&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:verdana;"&gt;Bharath Sriperumbudur - University of California, San Diego, USA&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:verdana;"&gt;David Torres - University of California, San Diego, USA&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:verdana;"&gt;Gert Lanckriet - University of California, San Diego, USA &lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:verdana;"&gt;Graph Clustering With Network Structure Indices&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:verdana;"&gt;Matthew J. Rattigan - University of Massachusetts Amherst, USA&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:verdana;"&gt;Marc Maier - University of Massachusetts Amherst, USA&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:verdana;"&gt;David Jensen - University of Massachusetts Amherst, USA&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:verdana;"&gt;Fast and Effective Kernels for Relational Learning from Texts&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:verdana;"&gt;Alessandro Moschitti - University of Trento, Italy&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:verdana;"&gt;Fabio Massimo Zanzotto - University of Rome, Italy &lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:verdana;"&gt;Three New Graphical Models for Statistical Language Modelling&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:verdana;"&gt;Andriy Mnih - University of Toronto, Canada&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:verdana;"&gt;Geoffrey Hinton - University of Toronto, Canada &lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:verdana;"&gt;Simple, Robust, Scalable Semi-supervised Learning via Expectation Regularization&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:verdana;"&gt;Gideon S. Mann - University of Massachusetts, USA&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:verdana;"&gt;Andrew McCallum - University of Massachusetts, USA &lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:verdana;"&gt;The Rendezvous Algorithm: Multiclass Semi-Supervised Learning with Markov Random Walks&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:verdana;"&gt;Arik Azran - University of Cambridge, UK &lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:verdana;"&gt;Information-Theoretic Metric Learning  (one of the best paper awardees)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:verdana;"&gt;Jason V. Davis - University of Texas at Austin, USA&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:verdana;"&gt;Brian Kulis - University of Texas at Austin, USA&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:verdana;"&gt;Prateek Jain - University of Texas at Austin, USA&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:verdana;"&gt;Suvrit Sra - University of Texas at Austin, USA&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:verdana;"&gt;Inderjit S. Dhillon - University of Texas at Austin, USA &lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:verdana;"&gt;Agnostic Active Learning - not from ICML 2007 but exciting as it was discovered last year, theoretical bounds were proved this year in ICML 2007.&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:verdana;"&gt;http://hunch.net/~jl/projects/agnostic_active/agnostic-active.pdf&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:verdana;"&gt;A Bound on the Label Complexity of Agnostic Active Learning&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:verdana;"&gt;Steve Hanneke - Carnegie Mellon University, USA &lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7732226477253686468-1238819228492023191?l=resnotebook.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://resnotebook.blogspot.com/feeds/1238819228492023191/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7732226477253686468&amp;postID=1238819228492023191' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/1238819228492023191'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/1238819228492023191'/><link rel='alternate' type='text/html' href='http://resnotebook.blogspot.com/2007/07/icml-2007-reading-list.html' title='ICML 2007 reading list'/><author><name>Delip Rao</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7732226477253686468.post-8668158270193950210</id><published>2007-07-02T09:53:00.000-07:00</published><updated>2007-07-16T18:56:53.494-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='&quot;kernel methods&quot;'/><category scheme='http://www.blogger.com/atom/ns#' term='&quot;machine learning&quot;'/><title type='text'>Learning about Kernels</title><content type='html'>&lt;span style="font-family:verdana;"&gt;Stumbled on Alekh Agarwal's tech report on Kernels.  A good survey on kernel methods that includes recent work on this topic.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:verdana;"&gt;http://www.cse.iitb.ac.in/~alekh/seminar/report.pdf&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:verdana;"&gt;Another place to begin would be Thomas Gartner's SIGKDD explorations survey paper.&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7732226477253686468-8668158270193950210?l=resnotebook.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://resnotebook.blogspot.com/feeds/8668158270193950210/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7732226477253686468&amp;postID=8668158270193950210' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/8668158270193950210'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/8668158270193950210'/><link rel='alternate' type='text/html' href='http://resnotebook.blogspot.com/2007/07/learning-about-kernels.html' title='Learning about Kernels'/><author><name>Delip Rao</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7732226477253686468.post-541090882784593304</id><published>2007-06-07T13:35:00.000-07:00</published><updated>2007-07-02T09:53:22.418-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='reading list'/><category scheme='http://www.blogger.com/atom/ns#' term='NLP'/><title type='text'>UMASS Statistical NLP reading list</title><content type='html'>&lt;span style="font-family:trebuchet ms;"&gt;http://ciir.cs.umass.edu/~fuchun/readlist_all/readlist.pdf&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7732226477253686468-541090882784593304?l=resnotebook.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://resnotebook.blogspot.com/feeds/541090882784593304/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7732226477253686468&amp;postID=541090882784593304' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/541090882784593304'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/541090882784593304'/><link rel='alternate' type='text/html' href='http://resnotebook.blogspot.com/2007/06/statistical-nlp-reading-list.html' title='UMASS Statistical NLP reading list'/><author><name>Delip Rao</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7732226477253686468.post-6024319750649237376</id><published>2007-04-30T13:33:00.000-07:00</published><updated>2007-04-30T13:35:12.566-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='todo'/><category scheme='http://www.blogger.com/atom/ns#' term='reading list'/><title type='text'>Reading List from EMNLP 2007</title><content type='html'>&lt;span style="font-family: trebuchet ms;"&gt;A GRAPH-BASED APPROACH TO NAMED ENTITY CATEGORIZATION IN WIKIPEDIA&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: trebuchet ms;"&gt;USING CONDITIONAL RANDOM FIELDS&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: trebuchet ms;"&gt;Yotaro Watanabe, Masayuki Asahara and Yuji Matsumoto&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: trebuchet ms;"&gt;A TOPIC MODEL FOR WORD SENSE DISAMBIGUATION&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: trebuchet ms;"&gt;Jordan Boyd-Graber, Xiaojin Zhu and David Blei&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: trebuchet ms;"&gt;BOOTSTRAPPING INFORMATION EXTRACTION FROM FIELD BOOKS&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: trebuchet ms;"&gt;Sander Canisius and Caroline Sporleder&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: trebuchet ms;"&gt;CROSS-LINGUAL DISTRIBUTIONAL PROFILES OF CONCEPTS FOR MEASURING&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: trebuchet ms;"&gt;SEMANTIC DISTANCE&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: trebuchet ms;"&gt;Saif Mohammad, Iryna Gurevych, Graeme Hirst and Torsten Zesch&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: trebuchet ms;"&gt;CRYSTAL: ANALYZING PREDICTIVE OPINIONS ON THE WEB&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: trebuchet ms;"&gt;Soo-Min Kim and Eduard Hovy&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: trebuchet ms;"&gt;EXPLORATIONS IN AUTOMATIC BOOK SUMMARIZATION&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: trebuchet ms;"&gt;Rada Mihalcea and Hakan Ceylan&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: trebuchet ms;"&gt;LARGE SCALE NAMED ENTITY DISAMBIGUATION BASED ON WIKIPEDIA DATA&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: trebuchet ms;"&gt;Silviu Cucerzan&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: trebuchet ms;"&gt;LEXICAL SEMANTIC RELATEDNESS WITH RANDOM GRAPH WALKS&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: trebuchet ms;"&gt;Thad Hughes and Daniel Ramage&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: trebuchet ms;"&gt;TOWARDS ROBUST UNSUPERVISED PERSONAL NAME DISAMBIGUATION&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: trebuchet ms;"&gt;Ying Chen and James Martin&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: trebuchet ms;"&gt;WORD SENSE DISAMBIGUATION INCORPORATING LEXICAL AND STRUCTURAL SEMANTIC&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: trebuchet ms;"&gt;INFORMATION&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: trebuchet ms;"&gt;Takaaki Tanaka, Francis Bond, Timothy Baldwin, Sanae Fujita and Chikara&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: trebuchet ms;"&gt;Hashimoto&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7732226477253686468-6024319750649237376?l=resnotebook.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://resnotebook.blogspot.com/feeds/6024319750649237376/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7732226477253686468&amp;postID=6024319750649237376' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/6024319750649237376'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/6024319750649237376'/><link rel='alternate' type='text/html' href='http://resnotebook.blogspot.com/2007/04/reading-list-from-emnlp-2007.html' title='Reading List from EMNLP 2007'/><author><name>Delip Rao</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7732226477253686468.post-4654257218505112936</id><published>2007-04-15T17:47:00.000-07:00</published><updated>2007-04-30T13:42:58.446-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='latex'/><title type='text'>LaTex cool!</title><content type='html'>The constant &lt;img src="http://www.forkosh.dreamhost.com/mimetex.cgi?%5Cpi" align="middle" border="0" /&gt; is defined as &lt;img src="http://www.forkosh.dreamhost.com/mimetex.cgi?%5Cpi%20=%20%5Cint_%7B0%7D%5E%7B1%7D%20%5Cfrac%7B4%7D%7B1+x%5E%7B2%7D%7D" align="middle" border="0" /&gt;&lt;br /&gt;For more details on the setup, refer &lt;a href="http://wolverinex02.googlepages.com/emoticonsforblogger2"&gt;here&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7732226477253686468-4654257218505112936?l=resnotebook.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://resnotebook.blogspot.com/feeds/4654257218505112936/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7732226477253686468&amp;postID=4654257218505112936' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/4654257218505112936'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/4654257218505112936'/><link rel='alternate' type='text/html' href='http://resnotebook.blogspot.com/2007/04/latex-cool.html' title='LaTex cool!'/><author><name>Delip Rao</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7732226477253686468.post-5905000596278353042</id><published>2007-03-22T21:26:00.000-07:00</published><updated>2007-03-22T21:31:29.467-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='graphs'/><category scheme='http://www.blogger.com/atom/ns#' term='charts'/><title type='text'>Swivel: For cool looking graphs</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.swivel.com/"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://bp0.blogger.com/_O_xB_EK1QO4/RgNX_QEKX2I/AAAAAAAAAC8/ElBKB8S9LdM/s400/graph.png" alt="" id="BLOGGER_PHOTO_ID_5044972751740886882" border="0" /&gt;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7732226477253686468-5905000596278353042?l=resnotebook.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://resnotebook.blogspot.com/feeds/5905000596278353042/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7732226477253686468&amp;postID=5905000596278353042' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/5905000596278353042'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/5905000596278353042'/><link rel='alternate' type='text/html' href='http://resnotebook.blogspot.com/2007/03/swivel-for-cool-looking-graphs.html' title='Swivel: For cool looking graphs'/><author><name>Delip Rao</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://bp0.blogger.com/_O_xB_EK1QO4/RgNX_QEKX2I/AAAAAAAAAC8/ElBKB8S9LdM/s72-c/graph.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7732226477253686468.post-9167660107529822884</id><published>2007-03-21T20:12:00.000-07:00</published><updated>2007-03-21T20:20:42.318-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='linguistics'/><category scheme='http://www.blogger.com/atom/ns#' term='language'/><title type='text'>Internet Language Resources</title><content type='html'>....&lt;br /&gt;&lt;pre&gt;&lt;b&gt;&lt;i&gt;&lt;b&gt;&lt;b&gt;&lt;i&gt;&lt;b&gt;&lt;a name="Bafia"&gt;Bafia&lt;/a&gt; (Mbam Cameroon)                        Wayumbe&lt;br /&gt;Bagesu (Central Africa)                      Watulire?&lt;br /&gt;Bagesu (Central Africa) [answer]             Natulire nili mlahi&lt;br /&gt;Bajawa (Indonesia) ['where are you going']   Male de?&lt;br /&gt;Bakitara (Central Africa) [morning]          Oirwota?&lt;br /&gt;Bakitara (Central Africa) [answer]           Ndabanta&lt;br /&gt;Bakitara (Central Africa) [after absense]    Mirembe&lt;br /&gt;&lt;a name="Bakweri"&gt;Bakweri&lt;/a&gt; (Cameroon) [morning]                 O wusi&lt;br /&gt;Balanta (Guinea-Bissau)                      Abala, lite utchole&lt;br /&gt;Balinese (Bali)                              Om swastyastu&lt;br /&gt;Balinese (Bali) [reply]                      Om shanti shanti shanti&lt;br /&gt;Balti (India, Pakistan)                      Yang chi halyo?&lt;br /&gt;Balti (India, Pakistan) [answer]             Lyakhmo&lt;/b&gt;&lt;/i&gt;&lt;/b&gt;&lt;/b&gt;&lt;/i&gt;&lt;/b&gt;&lt;/pre&gt;....&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: trebuchet ms;"&gt;That was "hello" in some languages. Jennifer Runner has this page with "&lt;/span&gt;&lt;a style="font-family: trebuchet ms;" href="http://www.elite.net/%7Erunner/jennifers/hello.htm"&gt;Hello&lt;/a&gt;&lt;span style="font-family: trebuchet ms;"&gt;" and other pleasantries in a large number of languages.  Don't forget to check her &lt;/span&gt;&lt;a style="font-family: trebuchet ms;" href="http://www.elite.net/%7Erunner/jennifers/language.htm"&gt;Internet Language Resources&lt;/a&gt;&lt;span style="font-family: trebuchet ms;"&gt; page.&lt;br /&gt;&lt;/span&gt;&lt;span style="font-size:100%;"&gt;&lt;b&gt;&lt;b&gt;&lt;br /&gt;&lt;span style="font-family: trebuchet ms; color: rgb(153, 153, 153);"&gt;Khau bulyghyz&lt;/span&gt;&lt;/b&gt;&lt;/b&gt;&lt;span style="font-family: trebuchet ms; color: rgb(153, 153, 153);"&gt;!&lt;/span&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7732226477253686468-9167660107529822884?l=resnotebook.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://resnotebook.blogspot.com/feeds/9167660107529822884/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7732226477253686468&amp;postID=9167660107529822884' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/9167660107529822884'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/9167660107529822884'/><link rel='alternate' type='text/html' href='http://resnotebook.blogspot.com/2007/03/internet-language-resources.html' title='Internet Language Resources'/><author><name>Delip Rao</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7732226477253686468.post-4646795877879167070</id><published>2007-03-18T21:34:00.000-07:00</published><updated>2007-03-18T22:10:52.989-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='CIA'/><category scheme='http://www.blogger.com/atom/ns#' term='job'/><category scheme='http://www.blogger.com/atom/ns#' term='NLP'/><title type='text'>Job with "The Company"</title><content type='html'>&lt;div style="text-align: left;"&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp2.blogger.com/_O_xB_EK1QO4/Rf4S4B_mdHI/AAAAAAAAAC0/_YuuUySi07I/s1600-h/lingjob.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://bp2.blogger.com/_O_xB_EK1QO4/Rf4S4B_mdHI/AAAAAAAAAC0/_YuuUySi07I/s400/lingjob.png" alt="" id="BLOGGER_PHOTO_ID_5043489386518705266" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;/div&gt;&lt;span style="font-family:trebuchet ms;"&gt;Showed up while reading my regular mails. Notice the typo in the headline.&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7732226477253686468-4646795877879167070?l=resnotebook.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://resnotebook.blogspot.com/feeds/4646795877879167070/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7732226477253686468&amp;postID=4646795877879167070' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/4646795877879167070'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/4646795877879167070'/><link rel='alternate' type='text/html' href='http://resnotebook.blogspot.com/2007/03/job-with-company.html' title='Job with &quot;The Company&quot;'/><author><name>Delip Rao</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://bp2.blogger.com/_O_xB_EK1QO4/Rf4S4B_mdHI/AAAAAAAAAC0/_YuuUySi07I/s72-c/lingjob.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7732226477253686468.post-4651159256062280103</id><published>2007-03-18T20:42:00.000-07:00</published><updated>2007-03-18T21:03:12.537-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='WWW'/><category scheme='http://www.blogger.com/atom/ns#' term='todo'/><category scheme='http://www.blogger.com/atom/ns#' term='papers'/><category scheme='http://www.blogger.com/atom/ns#' term='NAACL'/><title type='text'>Interesting papers from NAACL and WWW 2007</title><content type='html'>&lt;span style="font-family: trebuchet ms; font-weight: bold;"&gt;NAACL&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: trebuchet ms;"&gt;Computing Semantic Similarity between Skill Statements for Approximate Matching&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: trebuchet ms;"&gt;    Feng Pan and Robert Farrell &lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: trebuchet ms;"&gt;Extracting Appraisal Expressions&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: trebuchet ms;"&gt;    Kenneth Bloom, Shlomo Argamon and Navendu Garg&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: trebuchet ms;"&gt;Unsupervised Resolution of Objects and Relations on the Web&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: trebuchet ms;"&gt;    Alexander Yates and Oren Etzioni&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: trebuchet ms;"&gt;Near-Synonym Choice in an Intelligent Thesaurus&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: trebuchet ms;"&gt;    Diana Inkpen &lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: trebuchet ms;"&gt;Using Wikipedia for Automatic Word Sense Disambiguation&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: trebuchet ms;"&gt;    Rada Mihalcea &lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: trebuchet ms;"&gt;An integrated approach to measuring Semantic Similarity between Words using Information available on the Web&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: trebuchet ms;"&gt;    Danushka Bollegala, Yutaka Matsuo and Mitsuru Ishizuka&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: trebuchet ms;"&gt;Improving Relation Extraction Using Domain Information&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: trebuchet ms;"&gt;    Alfio Massimiliano Gliozzo, Marco Pennacchiotti and Patrick Pantel&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: trebuchet ms;"&gt;High-Performance, Language-Independent Morphological Segmentation&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: trebuchet ms;"&gt;    Sajib Dasgupta and Vincent Ng &lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: trebuchet ms;"&gt;A Systematic Exploration of The Feature Space for Relation Extraction&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: trebuchet ms;"&gt;    Jing Jiang and ChengXiang Zhai&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: trebuchet ms;"&gt;Data-Driven Graph Construction for Semi-Supervised Graph-Based Learning in NLP&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: trebuchet ms;"&gt;    Andrei Alexandrescu and Katrin Kirchhoff &lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: trebuchet ms; font-weight: bold;"&gt;WWW&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: trebuchet ms;"&gt;Towards Domain-Independent Information Extraction from Web Tables&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: trebuchet ms;"&gt;Wolfgang Gatterbauer, Paul Bohunsky, Marcus Herzog, Bernhard Kroepl, Bernhard Pollak&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: trebuchet ms;"&gt;Organizing and Searching the World Wide Web of Facts - Step Two: Harnessing the Wisdom of the Crowds&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: trebuchet ms;"&gt;Marius Pasca&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: trebuchet ms;"&gt;A New Suffix Tree Similarity Measure for Document Clustering&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: trebuchet ms;"&gt;Hung Chim, Xiaotie Deng&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: trebuchet ms;"&gt;Scaling Up All-Pairs Similarity Search&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: trebuchet ms;"&gt;Roberto Bayardo, Yiming Ma, Ramakrishnan Srikant&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: trebuchet ms;"&gt;Wherefore Art Thou R3579X? Anonymized Social Networks, Hidden Patterns, and Structural Steganography&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: trebuchet ms;"&gt;Lars Backstrom, Cynthia Dwork, Jon Kleinberg&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: trebuchet ms;"&gt;Topic Sentiment Mixture: Modeling Facets and Opinions in Weblogs&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: trebuchet ms;"&gt;Qiaozhu Mei, Xu Ling, Matthew Wondra, Hang Su, ChengXiang Zhai&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: trebuchet ms;"&gt;Measuring Semantic Similarity between Words Using Web Search Engines&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: trebuchet ms;"&gt;Danushka Bollegala, Yutaka Matsuo, Mitsuru Ishizuka&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: trebuchet ms;"&gt;Using Google Distance to weight approximate ontology matches&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: trebuchet ms;"&gt;Risto Risto Gligorov, Zharko Aleksovski, Warner ten Kate, Frank van Harmelen&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7732226477253686468-4651159256062280103?l=resnotebook.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://resnotebook.blogspot.com/feeds/4651159256062280103/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7732226477253686468&amp;postID=4651159256062280103' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/4651159256062280103'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/4651159256062280103'/><link rel='alternate' type='text/html' href='http://resnotebook.blogspot.com/2007/03/interesting-papers-from-naacl-and-www.html' title='Interesting papers from NAACL and WWW 2007'/><author><name>Delip Rao</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7732226477253686468.post-5016653277302636746</id><published>2007-03-18T20:38:00.000-07:00</published><updated>2007-03-18T21:15:54.217-07:00</updated><title type='text'>NLP on VVLC</title><content type='html'>Papers to read&lt;br /&gt;1. Banko and Brill, ACL 2001&lt;br /&gt;2. Deepak Ravichandran, ACL 2005&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7732226477253686468-5016653277302636746?l=resnotebook.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://resnotebook.blogspot.com/feeds/5016653277302636746/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7732226477253686468&amp;postID=5016653277302636746' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/5016653277302636746'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/5016653277302636746'/><link rel='alternate' type='text/html' href='http://resnotebook.blogspot.com/2007/03/nlp-on-vvlc.html' title='NLP on VVLC'/><author><name>Delip Rao</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7732226477253686468.post-7929427880788187429</id><published>2007-01-26T21:41:00.000-08:00</published><updated>2007-01-26T21:53:21.299-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='beryl'/><category scheme='http://www.blogger.com/atom/ns#' term='ubuntu'/><title type='text'>Beryl, Oh yeah!</title><content type='html'>&lt;span style="font-family: trebuchet ms;"&gt;After much struggling with my ubuntu, I got &lt;a href="http://www.beryl-project.org/"&gt;Beryl&lt;/a&gt; finally on my laptop.&lt;br /&gt;Getting it work with ATI was always a problem (for me) until &lt;a href="http://lhansen.blogspot.com/2006/10/3d-desktop-beryl-and-xgl-on-ubuntu-edgy.html"&gt;this guide&lt;/a&gt;.&lt;br /&gt;&lt;/span&gt;&lt;object style="font-family: trebuchet ms;" height="350" width="425"&gt;&lt;param name="movie" value="http://www.youtube.com/v/dlhD_4pK4MM"&gt;&lt;param name="wmode" value="transparent"&gt;&lt;embed src="http://www.youtube.com/v/dlhD_4pK4MM" type="application/x-shockwave-flash" wmode="transparent" height="350" width="425"/&gt;&lt;/object&gt;&lt;br /&gt;&lt;span style="font-family: trebuchet ms;"&gt;&lt;br /&gt;Incidentally, this is just a few days from the Vista launch. Who cares about Vista anymore?&lt;br /&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7732226477253686468-7929427880788187429?l=resnotebook.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://resnotebook.blogspot.com/feeds/7929427880788187429/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7732226477253686468&amp;postID=7929427880788187429' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/7929427880788187429'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/7929427880788187429'/><link rel='alternate' type='text/html' href='http://resnotebook.blogspot.com/2007/01/beryl-oh-yeah.html' title='Beryl, Oh yeah!'/><author><name>Delip Rao</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7732226477253686468.post-6184351140688842136</id><published>2007-01-25T19:06:00.000-08:00</published><updated>2007-01-25T19:17:09.052-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MT'/><category scheme='http://www.blogger.com/atom/ns#' term='&quot;Machine Translation&quot;'/><title type='text'>Multibabel</title><content type='html'>&lt;span style="font-family: trebuchet ms;"&gt;Ever wondered what happened when a message is converted from one language to another and so on and finally back to the source language?&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: trebuchet ms;"&gt;Try: http://www.tashian.com/multibabel&lt;br /&gt;&lt;br /&gt;Even a simple sentence, &lt;span style="font-style: italic; font-weight: bold;"&gt;I am fine&lt;/span&gt;, gets distorted as:&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;tt&gt;They are much bond.&lt;br /&gt;&lt;/tt&gt;&lt;tt&gt;They are much plugging.&lt;br /&gt;&lt;/tt&gt;&lt;tt&gt;They are covering much.&lt;br /&gt;&lt;br /&gt;&lt;/tt&gt;&lt;span style="font-family: trebuchet ms;"&gt;Uses BabelFish underneath, I wouldn't be surprised if Google or any other MT system also shows similar output.&lt;/span&gt;&lt;tt&gt;&lt;br /&gt;&lt;/tt&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7732226477253686468-6184351140688842136?l=resnotebook.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://resnotebook.blogspot.com/feeds/6184351140688842136/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7732226477253686468&amp;postID=6184351140688842136' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/6184351140688842136'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/6184351140688842136'/><link rel='alternate' type='text/html' href='http://resnotebook.blogspot.com/2007/01/multibabel.html' title='Multibabel'/><author><name>Delip Rao</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7732226477253686468.post-8383132113791487290</id><published>2007-01-18T16:33:00.000-08:00</published><updated>2007-01-18T16:38:49.467-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='ML'/><category scheme='http://www.blogger.com/atom/ns#' term='learning'/><category scheme='http://www.blogger.com/atom/ns#' term='&quot;machine learning&quot;'/><title type='text'>Interesting ML notes</title><content type='html'>Machine Leaning. Where are we heading? Tom Mitchell says it all - &lt;a href="http://www.cs.cmu.edu/%7Etom/pubs/MachineLearning.pdf"&gt;machine learning != statistics.&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;How to write a machine learning paper?&lt;br /&gt;http://www-csli.stanford.edu/icml2k/craft.html&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7732226477253686468-8383132113791487290?l=resnotebook.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://resnotebook.blogspot.com/feeds/8383132113791487290/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7732226477253686468&amp;postID=8383132113791487290' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/8383132113791487290'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/8383132113791487290'/><link rel='alternate' type='text/html' href='http://resnotebook.blogspot.com/2007/01/interesting-ml-notes.html' title='Interesting ML notes'/><author><name>Delip Rao</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7732226477253686468.post-2301992238893846451</id><published>2007-01-18T15:08:00.000-08:00</published><updated>2007-01-18T15:39:34.739-08:00</updated><title type='text'>Back from IJCAI</title><content type='html'>&lt;span style="font-family: trebuchet ms;"&gt;I spent much of the fall break attending &lt;/span&gt;&lt;a style="font-family: trebuchet ms;" href="http://www.ijcai-07.org"&gt;IJCAI 2007&lt;/a&gt;&lt;span style="font-family: trebuchet ms;"&gt; and &lt;/span&gt;&lt;a style="font-family: trebuchet ms;" href="http://www.iiit.net/icon2007/"&gt;ICON 2007&lt;/a&gt;&lt;span style="font-family: trebuchet ms;"&gt;.&lt;/span&gt;&lt;span style="font-family: trebuchet ms;"&gt; IJCAI had a record attendance of 1500 plus participants.&lt;br /&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7732226477253686468-2301992238893846451?l=resnotebook.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://resnotebook.blogspot.com/feeds/2301992238893846451/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7732226477253686468&amp;postID=2301992238893846451' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/2301992238893846451'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/2301992238893846451'/><link rel='alternate' type='text/html' href='http://resnotebook.blogspot.com/2007/01/back-from-ijcai.html' title='Back from IJCAI'/><author><name>Delip Rao</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7732226477253686468.post-868752657543202473</id><published>2006-12-26T15:28:00.000-08:00</published><updated>2007-03-18T21:23:57.529-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='IR'/><category scheme='http://www.blogger.com/atom/ns#' term='WSD'/><category scheme='http://www.blogger.com/atom/ns#' term='meaning'/><category scheme='http://www.blogger.com/atom/ns#' term='Information Retrieval'/><title type='text'>Book Review: Geometry and Meaning</title><content type='html'>&lt;span style="font-family:trebuchet ms;"&gt;Title: Geometry and Meaning&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:trebuchet ms;"&gt;Author: Dominic Widdows&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:trebuchet ms;"&gt;URL : http://infomap.stanford.edu/book/&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;a style="font-family: trebuchet ms;" onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp3.blogger.com/_O_xB_EK1QO4/RZGxu7AZS4I/AAAAAAAAACg/DT7Gj72mLAE/s1600-h/cover-small.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://bp3.blogger.com/_O_xB_EK1QO4/RZGxu7AZS4I/AAAAAAAAACg/DT7Gj72mLAE/s400/cover-small.jpg" alt="" id="BLOGGER_PHOTO_ID_5012983279911521154" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;span style="font-family:trebuchet ms;"&gt;This is an excellent book for even high school students to learn about IR. However, if you have read a few papers in this field then reading this book is a waste of time, except for the jewels in boxes throughout the book.&lt;br /&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7732226477253686468-868752657543202473?l=resnotebook.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://resnotebook.blogspot.com/feeds/868752657543202473/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7732226477253686468&amp;postID=868752657543202473' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/868752657543202473'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/868752657543202473'/><link rel='alternate' type='text/html' href='http://resnotebook.blogspot.com/2006/12/book-review-geometry-and-meaning.html' title='Book Review: Geometry and Meaning'/><author><name>Delip Rao</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://bp3.blogger.com/_O_xB_EK1QO4/RZGxu7AZS4I/AAAAAAAAACg/DT7Gj72mLAE/s72-c/cover-small.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7732226477253686468.post-2972635447743261646</id><published>2006-12-21T17:09:00.000-08:00</published><updated>2006-12-23T16:12:47.842-08:00</updated><title type='text'>semantic ambiguity?</title><content type='html'>&lt;div style="text-align: center;"&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp3.blogger.com/_O_xB_EK1QO4/RYswdbAZS3I/AAAAAAAAACU/LYP1VxJsWU0/s1600-h/ambiguity.gif"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://bp3.blogger.com/_O_xB_EK1QO4/RYswdbAZS3I/AAAAAAAAACU/LYP1VxJsWU0/s400/ambiguity.gif" alt="" id="BLOGGER_PHOTO_ID_5011152292403563378" border="0" /&gt;&lt;/a&gt;&lt;span style="font-size:78%;"&gt;(c) Bill Watterson&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7732226477253686468-2972635447743261646?l=resnotebook.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://resnotebook.blogspot.com/feeds/2972635447743261646/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7732226477253686468&amp;postID=2972635447743261646' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/2972635447743261646'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/2972635447743261646'/><link rel='alternate' type='text/html' href='http://resnotebook.blogspot.com/2006/12/semantic-ambiguity.html' title='semantic ambiguity?'/><author><name>Delip Rao</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://bp3.blogger.com/_O_xB_EK1QO4/RYswdbAZS3I/AAAAAAAAACU/LYP1VxJsWU0/s72-c/ambiguity.gif' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7732226477253686468.post-2156271669156858057</id><published>2006-12-21T13:03:00.000-08:00</published><updated>2006-12-21T13:13:03.157-08:00</updated><title type='text'>lectures on cognitive computing</title><content type='html'>&lt;span style="font-family: trebuchet ms;"&gt;A series of twelve lectures from the conference on cognitive computing at IBM's Almaden &lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: trebuchet ms;"&gt;Research lab with topics ranging from memory to consciousness to thought. &lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;a style="font-family: trebuchet ms;" href="http://video.google.com/videosearch?q=almaden+cognitive+computing&amp;so=0&amp;amp;start=0"&gt;Lecture videos&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7732226477253686468-2156271669156858057?l=resnotebook.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://resnotebook.blogspot.com/feeds/2156271669156858057/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7732226477253686468&amp;postID=2156271669156858057' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/2156271669156858057'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/2156271669156858057'/><link rel='alternate' type='text/html' href='http://resnotebook.blogspot.com/2006/12/lectures-on-cognitive-computing.html' title='lectures on cognitive computing'/><author><name>Delip Rao</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7732226477253686468.post-2866804979200144589</id><published>2006-12-21T00:55:00.000-08:00</published><updated>2006-12-21T12:46:12.603-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='smoothing'/><category scheme='http://www.blogger.com/atom/ns#' term='estimation'/><title type='text'>smoothing</title><content type='html'>&lt;span style="font-family:trebuchet ms;"&gt;A good &lt;/span&gt;&lt;a href="http://ieeexplore.ieee.org/iel5/89/17730/00817452.pdf"&gt;&lt;span style="font-family: trebuchet ms;"&gt;review article on smoothing&lt;/span&gt;&lt;/a&gt;&lt;span style="font-family:trebuchet ms;"&gt; by Stan Chen and Roni Rosenfeld.&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:trebuchet ms;"&gt;A thorough treatment can be found &lt;/span&gt;&lt;span style="font-family: trebuchet ms;"&gt;&lt;a href="http://www.cs.cmu.edu/%7Esfc/papers/h015a-techreport.ps.gz"&gt;here&lt;/a&gt;.&lt;br /&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7732226477253686468-2866804979200144589?l=resnotebook.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://resnotebook.blogspot.com/feeds/2866804979200144589/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7732226477253686468&amp;postID=2866804979200144589' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/2866804979200144589'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/2866804979200144589'/><link rel='alternate' type='text/html' href='http://resnotebook.blogspot.com/2006/12/smoothing.html' title='smoothing'/><author><name>Delip Rao</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7732226477253686468.post-1783907732312575035</id><published>2006-12-20T18:39:00.000-08:00</published><updated>2006-12-20T18:41:57.075-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='experimentation'/><category scheme='http://www.blogger.com/atom/ns#' term='results'/><category scheme='http://www.blogger.com/atom/ns#' term='Accuracy'/><title type='text'>Accuracy vs. Perplexity</title><content type='html'>&lt;div style="direction: ltr; font-family: trebuchet ms;"&gt;If model A has higher accuracy than model B, does it necessarily imply&lt;br /&gt;perplexity(A) &lt; perplexity(B)?&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Jason's reply&lt;/span&gt;:&lt;br /&gt;&lt;br /&gt;No, that is not implied.&lt;br /&gt;Accuracy = how correct is the highest-probability hypothesis?&lt;br /&gt;Perplexity = how probable is the correct hypothesis?&lt;br /&gt;              (or more generally, how probable is the observed data?)&lt;br /&gt;&lt;br /&gt;So they are really measuring different things.&lt;br /&gt;Accuracy is what you really care about, in a sense,&lt;br /&gt;but (1) it is only defined if you have supervised data,&lt;br /&gt;(2) it requires an evaluation method for measuring degree&lt;br /&gt;of correctness, (3) it is usually not a continuous function&lt;br /&gt;of the parameters (since an epsilon change in the parameters&lt;br /&gt;may not change which hypothesis has the highest probability)&lt;br /&gt;and is therefore hard to optimize.&lt;br /&gt;&lt;br /&gt;I usually recommend reporting both, which has become&lt;br /&gt;the convention in speech recognition, where people report&lt;br /&gt;WER (word error rate) and perplexity.&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7732226477253686468-1783907732312575035?l=resnotebook.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://resnotebook.blogspot.com/feeds/1783907732312575035/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7732226477253686468&amp;postID=1783907732312575035' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/1783907732312575035'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/1783907732312575035'/><link rel='alternate' type='text/html' href='http://resnotebook.blogspot.com/2006/12/accuracy-vs-perplexity.html' title='Accuracy vs. Perplexity'/><author><name>Delip Rao</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7732226477253686468.post-9010379996359965198</id><published>2006-12-20T16:50:00.000-08:00</published><updated>2006-12-20T16:54:37.861-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='concordances'/><category scheme='http://www.blogger.com/atom/ns#' term='coling'/><title type='text'>Finding concordances on the web</title><content type='html'>&lt;span style="font-family: trebuchet ms;"&gt;A cool website that does this: http://www.webcorp.org.uk&lt;br /&gt;Allows using patterns but awfully slow.&lt;br /&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7732226477253686468-9010379996359965198?l=resnotebook.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://resnotebook.blogspot.com/feeds/9010379996359965198/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7732226477253686468&amp;postID=9010379996359965198' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/9010379996359965198'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/9010379996359965198'/><link rel='alternate' type='text/html' href='http://resnotebook.blogspot.com/2006/12/finding-concordances-on-web.html' title='Finding concordances on the web'/><author><name>Delip Rao</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7732226477253686468.post-4151579340692531044</id><published>2006-12-20T09:18:00.000-08:00</published><updated>2006-12-20T09:31:15.001-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='ontology'/><category scheme='http://www.blogger.com/atom/ns#' term='semantic web'/><category scheme='http://www.blogger.com/atom/ns#' term='taxonomy'/><category scheme='http://www.blogger.com/atom/ns#' term='TKDE'/><title type='text'>TKDE special issue on semantic web</title><content type='html'>&lt;span style="font-family: trebuchet ms;"&gt;Its here ... slurrrp! yum yum.&lt;br /&gt;&lt;br /&gt;Interesting papers to my general reading list:&lt;br /&gt;&lt;/span&gt;  &lt;span style="font-family: trebuchet ms;" class="title"&gt;1. &lt;a href="http://csdl2.computer.org/persagen/DLAbsToc.jsp?resourcePath=/dl/trans/tk/&amp;toc=comp/trans/tk/2007/02/k2toc.xml&amp;amp;DOI=10.1109/TKDE.2007.31"&gt;From Wrapping to Knowledge&lt;/a&gt;&lt;br /&gt;2. &lt;a href="http://csdl2.computer.org/persagen/DLAbsToc.jsp?resourcePath=/dl/trans/tk/&amp;toc=comp/trans/tk/2007/02/k2toc.xml&amp;amp;DOI=10.1109/TKDE.2007.36"&gt;Mining Generalized Associations of Semantic Relations from Textual Web Content&lt;/a&gt;&lt;br /&gt;3. &lt;a href="http://csdl2.computer.org/persagen/DLAbsToc.jsp?resourcePath=/dl/trans/tk/&amp;toc=comp/trans/tk/2007/02/k2toc.xml&amp;amp;DOI=10.1109/TKDE.2007.21"&gt;A Taxonomy Learning Method and Its Application to Characterize a Scientific Web Community&lt;/a&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: trebuchet ms;"&gt;You can access all articles &lt;a href="http://csdl2.computer.org/persagen/DLAbsToc.jsp?resourcePath=/dl/trans/tk/&amp;amp;toc=comp/trans/tk/2007/02/k2toc.xml"&gt;here&lt;/a&gt; &lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7732226477253686468-4151579340692531044?l=resnotebook.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://resnotebook.blogspot.com/feeds/4151579340692531044/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7732226477253686468&amp;postID=4151579340692531044' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/4151579340692531044'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/4151579340692531044'/><link rel='alternate' type='text/html' href='http://resnotebook.blogspot.com/2006/12/tkde-special-issue-on-semantic-web.html' title='TKDE special issue on semantic web'/><author><name>Delip Rao</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7732226477253686468.post-8059926997559503767</id><published>2006-12-18T16:25:00.000-08:00</published><updated>2006-12-18T16:30:52.198-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='IR'/><category scheme='http://www.blogger.com/atom/ns#' term='Information Retrieval'/><title type='text'>New IR book</title><content type='html'>&lt;span style="font-family: trebuchet ms;"&gt;Just discovered &lt;a href="http://www-csli.stanford.edu/%7Eschuetze/information-retrieval-book.html"&gt;this book&lt;/a&gt; by Chris Manning, Prabhakar Raghavan, and Hinrich Schütze.&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: trebuchet ms;"&gt;This is going to be in my reading list for the &lt;/span&gt;&lt;a style="font-family: trebuchet ms;" href="http://www.cs.jhu.edu/%7Eyarowsky/cs466.html"&gt;IR Course&lt;/a&gt;&lt;span style="font-family: trebuchet ms;"&gt;, next spring.&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7732226477253686468-8059926997559503767?l=resnotebook.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://resnotebook.blogspot.com/feeds/8059926997559503767/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7732226477253686468&amp;postID=8059926997559503767' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/8059926997559503767'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/8059926997559503767'/><link rel='alternate' type='text/html' href='http://resnotebook.blogspot.com/2006/12/new-ir-book.html' title='New IR book'/><author><name>Delip Rao</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7732226477253686468.post-9169062634794495176</id><published>2006-12-18T15:21:00.000-08:00</published><updated>2006-12-21T12:38:13.361-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='HMM'/><category scheme='http://www.blogger.com/atom/ns#' term='HTK'/><category scheme='http://www.blogger.com/atom/ns#' term='sequence labeling'/><title type='text'>Hidden Markov Models</title><content type='html'>&lt;span style="font-family:trebuchet ms;"&gt;After a brief venture in developing &lt;a href="http://www.cs.jhu.edu/%7Ejason/465/hw5/hw5.pdf"&gt;HMMs for sequence labeling&lt;/a&gt; at the NLP class, I am planning to use the HTK toolkit for more fun!&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:trebuchet ms;"&gt;Get it today from: http://htk.eng.cam.ac.uk/&lt;br /&gt;A tutorial style manual on HTK can be &lt;a href="http://www.google.com/search?hl=en&amp;q=The+HTK+book&amp;amp;btnG=Google+Search"&gt;obtained here&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Also don't forget to read Hal's &lt;a href="http://nlpers.blogspot.com/2006/11/getting-started-in-sequence-labeling.html"&gt;wonderful writeup&lt;/a&gt; on sequence labeling.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Update&lt;/span&gt;: If you are planning to write a HMM tagger of your own, in addition to the above handout, have a look at the following:&lt;br /&gt;&lt;br /&gt;1. &lt;a href="http://acl.ldc.upenn.edu/A/A92/A92-1018.pdf"&gt;A practical Part-of-Speech tagger&lt;/a&gt;&lt;br /&gt;A general introduction. Involves right mix of math and implementation details.&lt;br /&gt;2. &lt;a href="http://www.cs.brown.edu/people/ec/papers/equfortag.ps"&gt;Equations for Part-of-Speech tagging&lt;/a&gt;&lt;br /&gt;Derives all equations for PoS tagging using HMMs from first principles&lt;br /&gt;(smoothing &amp;amp; EM included)&lt;br /&gt;&lt;br /&gt;Though not related to HMMs, &lt;a href="http://www.cs.utah.edu/%7Ehal/TagChunk/"&gt;TagChunk&lt;/a&gt; by &lt;/span&gt;&lt;span style="font-family:trebuchet ms;"&gt;Hal Daume &lt;/span&gt;&lt;span style="font-family:trebuchet ms;"&gt;is another way for sequence labeling (software included)&lt;br /&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7732226477253686468-9169062634794495176?l=resnotebook.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://resnotebook.blogspot.com/feeds/9169062634794495176/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7732226477253686468&amp;postID=9169062634794495176' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/9169062634794495176'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/9169062634794495176'/><link rel='alternate' type='text/html' href='http://resnotebook.blogspot.com/2006/12/hidden-markov-models.html' title='Hidden Markov Models'/><author><name>Delip Rao</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7732226477253686468.post-3746453217655826591</id><published>2006-12-18T07:42:00.000-08:00</published><updated>2006-12-18T07:47:38.231-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='IR'/><category scheme='http://www.blogger.com/atom/ns#' term='spam'/><title type='text'>Adversarial Information Retrieval 2007</title><content type='html'>Contents:&lt;br /&gt;&lt;br /&gt;   1. AIRWeb'07 Topics&lt;br /&gt;   2. Web Spam Challenge&lt;br /&gt;   3. Timeline&lt;br /&gt;   4. Organizers and Program Committee&lt;br /&gt;&lt;br /&gt;1. AIRWEB'07 TOPICS&lt;br /&gt;&lt;br /&gt;Adversarial Information Retrieval addresses tasks such as gathering,&lt;br /&gt;indexing, filtering, retrieving and ranking information from collections&lt;br /&gt;wherein a subset has been manipulated maliciously.  On the Web, the&lt;br /&gt;predominant form of such manipulation is "search engine spamming" or&lt;br /&gt;spamdexing, i.e., malicious attempts to influence the outcome of ranking&lt;br /&gt;algorithms, aimed at getting an undeserved high ranking for some items&lt;br /&gt;in the collection.&lt;br /&gt;&lt;br /&gt;We solicit both full and short papers on any aspect of adversarial&lt;br /&gt;information retrieval on the Web. Particular areas of interest include,&lt;br /&gt;but are not limited to:&lt;br /&gt;&lt;br /&gt;  * Link spam&lt;br /&gt;  * Content spam&lt;br /&gt;  * Cloaking&lt;br /&gt;  * Comment spam&lt;br /&gt;  * Spam-oriented blogging&lt;br /&gt;  * Click fraud detection&lt;br /&gt;  * Reverse engineering of ranking algorithms&lt;br /&gt;  * Web content filtering&lt;br /&gt;  * Advertisement blocking&lt;br /&gt;  * Stealth crawling&lt;br /&gt;  * Malicious tagging&lt;br /&gt;&lt;br /&gt;Proceedings of the workshop will be included in the ACM Digital Library.&lt;br /&gt;Full papers are limited to 8 pages; work-in progress will be permitted 4&lt;br /&gt;pages.&lt;br /&gt;&lt;br /&gt;For more information, see &lt;http://airweb.cse.lehigh.edu/2007/&gt;&lt;br /&gt;&lt;br /&gt;2. WEB SPAM CHALLENGE&lt;br /&gt;&lt;br /&gt;This year, we are introducing a novel element: a Web Spam Challenge for&lt;br /&gt;testing web spam detection systems. We will be using the WEBSPAM-UK2006&lt;br /&gt;collection for Web Spam Detection &lt;http://www.yr-bcn.es/webspam&gt;.&lt;br /&gt;&lt;br /&gt;The collection includes large set of web pages, a web graph, and&lt;br /&gt;human-provided labels for a set of hosts. We will also provide a set of&lt;br /&gt;features extracted from the contents and links in the collection, which&lt;br /&gt;may be used by the participant teams in addition to any automatic&lt;br /&gt;technique they choose to use.&lt;br /&gt;&lt;br /&gt;We ask that participants of the Web Spam Challenge submit predictions&lt;br /&gt;(normal/spam) for all unlabeled hosts in the collection. Predictions&lt;br /&gt;will be evaluated and results will be announced at the AIRWeb 2007&lt;br /&gt;workshop.&lt;br /&gt;&lt;br /&gt;For more information, see &lt;http://webspam.lip6.fr/&gt;&lt;br /&gt;&lt;br /&gt;3. TIMELINE&lt;br /&gt;&lt;br /&gt;  - 7 February 2007: E-mail intention to submit a workshop paper&lt;br /&gt;                     (optional, but helpful)&lt;br /&gt;  - 15 February 2007: Deadline for workshop paper submissions&lt;br /&gt;  - 15 March 2007: Notification of acceptance of workshop papers&lt;br /&gt;  - 30 March 2007: Camera-ready copy due&lt;br /&gt;  - 30 March 2007: Challenge submissions due&lt;br /&gt;  - 8 May 2007: Date of workshop&lt;br /&gt;&lt;br /&gt;...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7732226477253686468-3746453217655826591?l=resnotebook.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://resnotebook.blogspot.com/feeds/3746453217655826591/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7732226477253686468&amp;postID=3746453217655826591' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/3746453217655826591'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/3746453217655826591'/><link rel='alternate' type='text/html' href='http://resnotebook.blogspot.com/2006/12/adversarial-information-retrieval-2007.html' title='Adversarial Information Retrieval 2007'/><author><name>Delip Rao</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7732226477253686468.post-3970839972600634960</id><published>2006-12-16T21:25:00.000-08:00</published><updated>2006-12-16T21:32:38.788-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='development'/><category scheme='http://www.blogger.com/atom/ns#' term='vista'/><category scheme='http://www.blogger.com/atom/ns#' term='font'/><category scheme='http://www.blogger.com/atom/ns#' term='coding'/><title type='text'>eye candy for developers</title><content type='html'>&lt;span style="font-family:trebuchet ms;"&gt;This is not directly related to research but if you are like me, spending a long time in front of the screen staring at the console (coding or even playing hangman!),  check out the cool new fonts that come with vista. Although I have been windows free (just like my lab) for a year now, these vista fonts are something I can vouch for. Search Google for "six new vista fonts" to download them. Since they are in TTF, they work perfectly on my Ubuntu (not sure if this is right, heck I enjoy the font. Thanks Bill!).&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:trebuchet ms;"&gt;I have been using the Consolas font for my Gnome terminal and it rocks!&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7732226477253686468-3970839972600634960?l=resnotebook.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://resnotebook.blogspot.com/feeds/3970839972600634960/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7732226477253686468&amp;postID=3970839972600634960' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/3970839972600634960'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/3970839972600634960'/><link rel='alternate' type='text/html' href='http://resnotebook.blogspot.com/2006/12/this-is-not-directly-related-to.html' title='eye candy for developers'/><author><name>Delip Rao</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7732226477253686468.post-5240979511604393255</id><published>2006-12-16T20:47:00.001-08:00</published><updated>2006-12-19T19:07:46.511-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Matlab'/><title type='text'>MIT EECS Matlab tutorial</title><content type='html'>Nice introduction to &lt;a href="http://ocw.mit.edu/NR/rdonlyres/Electrical-Engineering-and-Computer-Science/6-050JInformation-and-EntropySpring2003/8F1E260E-E643-48A9-A143-0D79266AA77A/0/matlabnew.pdf"&gt;Matlab&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7732226477253686468-5240979511604393255?l=resnotebook.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://resnotebook.blogspot.com/feeds/5240979511604393255/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7732226477253686468&amp;postID=5240979511604393255' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/5240979511604393255'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/5240979511604393255'/><link rel='alternate' type='text/html' href='http://resnotebook.blogspot.com/2006/12/mit-eecs-matlab-tutorial.html' title='MIT EECS Matlab tutorial'/><author><name>Delip Rao</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7732226477253686468.post-5953032520028387085</id><published>2006-12-16T20:40:00.000-08:00</published><updated>2006-12-19T19:08:10.365-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Graphical Models'/><title type='text'>Graphical models reading group</title><content type='html'>&lt;span style="font-family:trebuchet ms;"&gt;Collection of some basic reading material on the subject. Warning: its old (2004)&lt;br /&gt;http://www.cs.cmu.edu/~zhuxj/courseproject/graphical.html&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7732226477253686468-5953032520028387085?l=resnotebook.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://resnotebook.blogspot.com/feeds/5953032520028387085/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7732226477253686468&amp;postID=5953032520028387085' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/5953032520028387085'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/5953032520028387085'/><link rel='alternate' type='text/html' href='http://resnotebook.blogspot.com/2006/12/graphical-models-reading-group.html' title='Graphical models reading group'/><author><name>Delip Rao</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7732226477253686468.post-450278059457979316</id><published>2006-12-14T06:13:00.000-08:00</published><updated>2006-12-14T06:15:18.957-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='language modeling'/><category scheme='http://www.blogger.com/atom/ns#' term='dragon'/><category scheme='http://www.blogger.com/atom/ns#' term='IR'/><category scheme='http://www.blogger.com/atom/ns#' term='NLP'/><title type='text'>The Dragon Tooolkit</title><content type='html'>&lt;span style="font-family:trebuchet ms;"&gt;The Dragon Tooolkit is Java-based development package for academic research use in language modeling (LM) and information retrieval (IR). Language modeling has recently emerged as an attractive new framework for text information retrieval and text mining (TM). However, most Java-based free search engines such as Lucene does not support LM very well. The Lemur toolkit is designed for LM and IR, but written in C and C++, which may be a hindrance to people who prefer Java programming. Basically, the dragon toolkit is tailored for researchers who work on large-scale LM and IR and prefer Java programming. Moreover, different from Lucene and Lemur, it provides built-in supports for semantic-based IR and TM. The dragon tookit seamlessly intergrates and implements a set of NLP tools, which enable the toolkit to index text collections with various representation schemes including words, phrases, ontology-based concepts and relationships. However, to minimize the learning time, we intentionally keep the package small and simple. The toolkit does not have some features including distributed IR and cross-language IR which are part of Lemur toolkit.&lt;br /&gt;&lt;br /&gt;How to Cite Dragon Toolkit&lt;br /&gt;&lt;br /&gt;If you are using the Dragon Toolkit for research work, please cite it in your published papers:&lt;br /&gt;&lt;br /&gt;Zhou, X., Zhang, X., and Hu, X., The Dragon Toolkit, Data Mining &amp; Bioinformatics Lab, iSchool at Drexel University, http://www.ischool.drexel.edu/dmbio/dragontool&lt;br /&gt;&lt;br /&gt;Download Dragon Toolkit&lt;br /&gt;&lt;br /&gt;Get the Dragon Toolkit source code and binary libraries (including external libraries) and necessary supporting data. Click http://www.ischool.drexel.edu/dmbio/dragontool/default.asp to download.&lt;br /&gt; &lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7732226477253686468-450278059457979316?l=resnotebook.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://resnotebook.blogspot.com/feeds/450278059457979316/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7732226477253686468&amp;postID=450278059457979316' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/450278059457979316'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/450278059457979316'/><link rel='alternate' type='text/html' href='http://resnotebook.blogspot.com/2006/12/dragon-tooolkit.html' title='The Dragon Tooolkit'/><author><name>Delip Rao</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7732226477253686468.post-1896342079848900822</id><published>2006-12-14T06:12:00.000-08:00</published><updated>2006-12-14T06:16:12.459-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='TextGraph'/><category scheme='http://www.blogger.com/atom/ns#' term='NLP'/><title type='text'>Graph-based Methods for Natural Language Processing</title><content type='html'>&lt;div align="left"&gt;&lt;span style="font-family:trebuchet ms;"&gt;NAACL/HLT 2007 Workshop&lt;br /&gt;Graph-based Methods for Natural Language Processing&lt;br /&gt;&lt;br /&gt;http://www.textgraphs.org/ws07&lt;br /&gt;&lt;br /&gt;Rochester, NY, April 26, 2007&lt;br /&gt;.................................................&lt;br /&gt;&lt;br /&gt;Recent years have shown an increased interest in bringing the field of&lt;br /&gt;graph theory into Natural Language Processing. In many NLP&lt;br /&gt;applications entities can be naturally represented as nodes in a graph&lt;br /&gt;and relations between them can be represented as edges. Recent&lt;br /&gt;research has shown that graph-based representations of linguistic&lt;br /&gt;units as diverse as words, sentences and documents give rise to novel&lt;br /&gt;and efficient solutions in a variety of NLP tasks, ranging from part&lt;br /&gt;of speech tagging, word sense disambiguation and parsing to&lt;br /&gt;information extraction, semantic role assignment, summarization and&lt;br /&gt;sentiment analysis.&lt;br /&gt;&lt;br /&gt;This workshop builds on the success of the first TextGraphs workshop at &lt;/span&gt;&lt;/div&gt;&lt;div align="left"&gt;&lt;span style="font-family:trebuchet ms;"&gt;HLT-NAACL 2006. The aim of this workshop is to bring together researchers &lt;/span&gt;&lt;/div&gt;&lt;div align="left"&gt;&lt;span style="font-family:trebuchet ms;"&gt;working on problems related to the use of graph-based algorithms for natural&lt;br /&gt;language processing and on the theory of graph-based methods.&lt;br /&gt;It will address a broader spectrum of research areas to foster&lt;br /&gt;exchange of ideas and help to identify principles of using the graph&lt;br /&gt;notions that go beyond an ad-hoc usage.&lt;br /&gt;Unveiling these principles will give rise to applying generic graph&lt;br /&gt;methods to many new problems that can be encoded in this framework.&lt;br /&gt;&lt;br /&gt;We invite submissions of papers on graph-based methods applied to&lt;br /&gt;NLP-related problems. Topics include, but are not limited to:&lt;br /&gt;&lt;br /&gt;- Graph representations for ontology learning and word sense disambiguation&lt;br /&gt;- Graph algorithms for Information Retrieval, text mining and understanding&lt;br /&gt;- Graph matching for Information Extraction&lt;br /&gt;- Random walk graph methods and Spectral graph clustering&lt;br /&gt;- Graph labeling and edge labeling for semantic representations&lt;br /&gt;- Encoding semantic distances in graphs&lt;br /&gt;- Ranking algorithms based on graphs&lt;br /&gt;- Small world graphs in natural language&lt;br /&gt;- Semi-supervised graph-based methods&lt;br /&gt;- Statistical network analysis and methods for NLP&lt;br /&gt;&lt;br /&gt;Submission format:&lt;br /&gt;&lt;br /&gt;Submissions will consist of regular full papers of max. 8 pages and&lt;br /&gt;short papers of max. 4 pages, formatted following the NAACL 2007&lt;br /&gt;guidelines. Papers should be submitted using the online submission&lt;br /&gt;form: http://www.cs.rochester.edu/meetings/hlt-naacl07/workshops.shtml&lt;br /&gt;&lt;br /&gt;Important dates:&lt;br /&gt;&lt;br /&gt;Regular paper submission January 29&lt;br /&gt;Short paper submissions February 4&lt;br /&gt;Notification of acceptance February 22&lt;br /&gt;Camera-ready papers March 1&lt;br /&gt;Workshop April 26&lt;/span&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7732226477253686468-1896342079848900822?l=resnotebook.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://resnotebook.blogspot.com/feeds/1896342079848900822/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7732226477253686468&amp;postID=1896342079848900822' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/1896342079848900822'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/1896342079848900822'/><link rel='alternate' type='text/html' href='http://resnotebook.blogspot.com/2006/12/graph-based-methods-for-natural.html' title='Graph-based Methods for Natural Language Processing'/><author><name>Delip Rao</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7732226477253686468.post-3107631587478934376</id><published>2006-12-14T06:10:00.001-08:00</published><updated>2006-12-14T06:10:56.071-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Web'/><category scheme='http://www.blogger.com/atom/ns#' term='AI'/><category scheme='http://www.blogger.com/atom/ns#' term='&quot;AAAI 2007&quot;'/><title type='text'>AAAI 2007 track on AI and the Web</title><content type='html'>AAAI 2007 (July 22-26, Vancouver CN) will have a special&lt;br /&gt;technical track on Artificial Intelligence and the Web.  The&lt;br /&gt;track seeks research papers on AI techniques, systems and&lt;br /&gt;concepts involving or applied to the Web. Papers should&lt;br /&gt;describe Web related research or clearly explain how the&lt;br /&gt;work addresses problems, opportunities or issues underlying&lt;br /&gt;the Web or Web-based systems. See [1] for suggested topics&lt;br /&gt;and more track information and [2] for information on the&lt;br /&gt;conference and details on submitting. Relevant deadlines are:&lt;br /&gt;&lt;br /&gt; - Jan 25: student abstracts&lt;br /&gt; - Feb  1: technical paper abstracts&lt;br /&gt; - Feb  2: doctoral consortium applications&lt;br /&gt; - Feb  6: technical papers&lt;br /&gt; - Feb 27: nectar and senior member papers&lt;br /&gt; - Apr  3: intelligent systems demo proposals&lt;br /&gt;&lt;br /&gt;[1] http://cs.umbc.edu/aaai07/&lt;br /&gt;[2] http://www.aaai.org/Conferences/AAAI/aaai07.php&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7732226477253686468-3107631587478934376?l=resnotebook.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://resnotebook.blogspot.com/feeds/3107631587478934376/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7732226477253686468&amp;postID=3107631587478934376' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/3107631587478934376'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/3107631587478934376'/><link rel='alternate' type='text/html' href='http://resnotebook.blogspot.com/2006/12/aaai-2007-track-on-ai-and-web.html' title='AAAI 2007 track on AI and the Web'/><author><name>Delip Rao</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7732226477253686468.post-6217653914209743607</id><published>2006-12-14T06:06:00.000-08:00</published><updated>2006-12-14T06:08:12.881-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='&quot;shared task&quot;'/><category scheme='http://www.blogger.com/atom/ns#' term='multilingual dependency parsing'/><category scheme='http://www.blogger.com/atom/ns#' term='&quot;CoNLL 2007&quot;'/><title type='text'>CoNLL Shared Task 2007: multilingual dependency parsing</title><content type='html'>CoNLL Shared Task 2007&lt;br /&gt;       &lt;br /&gt;                 Pre-Announcement&lt;br /&gt;                 &lt;br /&gt;                 &lt;br /&gt;          &lt;br /&gt;Keeping up the successful tradition, the Conference on&lt;br /&gt;Computational Natural Language Learning (CoNLL) 2007 will&lt;br /&gt;as usual include a shared task. For the second year running,&lt;br /&gt;the task will be multilingual dependency parsing. The first&lt;br /&gt;call for participation is scheduled to appear in later in&lt;br /&gt;December with release of training data in late January and&lt;br /&gt;submission of test results in late March. The CoNLL&lt;br /&gt;conference scheduled to take place in June 2007.&lt;br /&gt;&lt;br /&gt;The website for the shared task will be&lt;br /&gt;http://nextens.uvt.nl/depparse-wiki/SharedTaskWebsite.&lt;br /&gt;&lt;br /&gt;Enquiries about the shared task can be sent to&lt;br /&gt;conll07st@uvt.nl.&lt;br /&gt;&lt;br /&gt;The organizers&lt;br /&gt;&lt;br /&gt;Joakim Nivre&lt;br /&gt;Johan Hall&lt;br /&gt;Sandra KŸbler&lt;br /&gt;Ryan McDonald&lt;br /&gt;Jens Nilsson&lt;br /&gt;Sebastian Riedel&lt;br /&gt;Deniz Yuret&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7732226477253686468-6217653914209743607?l=resnotebook.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://resnotebook.blogspot.com/feeds/6217653914209743607/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7732226477253686468&amp;postID=6217653914209743607' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/6217653914209743607'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/6217653914209743607'/><link rel='alternate' type='text/html' href='http://resnotebook.blogspot.com/2006/12/conll-shared-task-2007-multilingual.html' title='CoNLL Shared Task 2007: multilingual dependency parsing'/><author><name>Delip Rao</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7732226477253686468.post-6880741271414200511</id><published>2006-12-14T06:03:00.000-08:00</published><updated>2006-12-14T06:05:40.808-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='datasets'/><category scheme='http://www.blogger.com/atom/ns#' term='&quot;sentiment analysis&quot;'/><title type='text'>The congressional speech corpus</title><content type='html'>&lt;span style="font-family:trebuchet ms;"&gt;The "congressional speech" corpus and associated graph information&lt;br /&gt;used in our "Get out the vote: Determining support or opposition from&lt;br /&gt;Congressional floor-debate transcripts" EMNLP 2006 paper is now&lt;br /&gt;available.&lt;br /&gt;&lt;br /&gt;Specifically, the data includes speeches as individual documents,&lt;br /&gt;together with:&lt;br /&gt;&lt;br /&gt;    * automatically-derived labels for whether the speakers supported&lt;br /&gt;      the legislation under discussion or not, allowing for&lt;br /&gt;      experiments with this kind of sentiment analysis&lt;br /&gt;&lt;br /&gt;    * indications of which debate each speech comes from (and the&lt;br /&gt;      position within the debate), allowing for consideration of&lt;br /&gt;      conversational structure&lt;br /&gt;&lt;br /&gt;    * indications of by-name references between speakers, allowing for&lt;br /&gt;      experiments with agreement classification (if one determines the&lt;br /&gt;      "true" labels from the support/oppose labels assigned to the&lt;br /&gt;      pair of speakers in question)&lt;br /&gt;&lt;br /&gt;    * the edge weights and other information we derived to create the&lt;br /&gt;      graphs we used for our experiments upon this data, facilitating&lt;br /&gt;      implementation of alternative graph-based classification methods&lt;br /&gt;      upon the graphs we constructed&lt;br /&gt;&lt;br /&gt;The download site is:&lt;br /&gt;http://www.cs.cornell.edu/home/llee/data/convote.html&lt;br /&gt;&lt;br /&gt;Matt Thomas, Bo Pang, and Lillian Lee&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7732226477253686468-6880741271414200511?l=resnotebook.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://resnotebook.blogspot.com/feeds/6880741271414200511/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7732226477253686468&amp;postID=6880741271414200511' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/6880741271414200511'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7732226477253686468/posts/default/6880741271414200511'/><link rel='alternate' type='text/html' href='http://resnotebook.blogspot.com/2006/12/congressional-speech-corpus.html' title='The congressional speech corpus'/><author><name>Delip Rao</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry></feed>
