Those of us who were at EMNLP-CONLL 2007 remember the "NLP and Global Warming" exchange between James Clarke, Jason Eisner, and Dan Bikel at the Q/A session of the Clarke and Lapata paper. The transcript of this funny conversation is now online, thanks to Jason.
I really liked Hal's ending remark.
Saturday, August 18, 2007
NLP and Global Warming
- Delip Rao at 12:40 AM 0 comments
Wednesday, August 15, 2007
People Search on the Web
Wired has an article about spock.com, a people search engine that combines crawled and user added content. From the few searches I did, looks like this is good for celebrity names than a regular person with web content. For instance, searching a name like "David Smith" produces these results. Of the top 10 results, only 3 of them actually have the name "David Smith" or something closer and the first result is not one of them. Compare this with a general purpose search engine like Google. Among a dozen random NLP/ML academic names (professors) I tried, it only got Jason Eisner and Tom Mitchell correct. One reason for this poor recall is probably they don't get content from user home pages.
(Some sites where this data is derived from include MySpace, Friendster, IMDB, Wikipedia, ratemyprofessors.com, etc.)
Nevertheless, this website is a representative of interesting KDD-style problems that one could do with people names. It is also interesting as people names that we look for fall in the "long tail" without sufficient data to support calling for clever machine learning techniques.
- Delip Rao at 4:16 PM 0 comments
Principal Components: data mining, IR, KDD, NLP, search
Sunday, August 12, 2007
Digital Reasoning awarded contextual similarity patent?
I was lead to this article on Forbes via Damien's post. The article is about a company Digital Reasoning getting patent on what sounded to me as contextual similarity. Their "white paper" makes reference to a patent number 7249117 (via USPTO). Unlike research papers, reading the patent document was so difficult. Will get to it sometime later but here is an extract from their press release about what their technology can do.
* Learn the meanings of words, classes of words, and other symbols based on how they are used in context in natural language
* Create and manipulate models of this "meaning" - i.e. the mathematical patterns of usage - including the detection of groups or similar categories of words or development of hierarchies or creation of relationships between words
* Improve the models based on human feedback or using other structured information after model construction
* The representation or sharing of this model or learning in an ontology, graph structure, or programming languages
Anyone from the ACL/ML/AI community can immediately recognize this and start citing their favorite papers on these topics starting from at least a decade ago. A promotional video from the company on YouTube can be found here. Excerpt from the video: "... We treat the text representation of human language as a signal ... ".
I think everyone should stop taking patents seriously. Wishful thinking?
- Delip Rao at 12:30 PM 1 comments
Principal Components: "machine learning", data mining, NLP, patents
Thursday, August 2, 2007
Recommending scientific papers
In spite of more information being present in a scientific paper than its text, recommending or ranking papers appears to be quite challenging.
- Delip Rao at 7:40 PM 0 comments
Principal Components: Information Retrieval, IR, NLP, research