Friday, July 3, 2009

Dealing with large scale graphs

To a hammer everything looks like a nail but one great hammer to have in your toolbox is the graph. The ACL anthology alone lists more than 300 results for the query "graph based". Graphs based formalisms allow us to write down solutions in succinct linear algebra representation. However implementation of such solutions for large problems, or even for small datasets with blown-up graph representations can be challenging in limited resource environments. While some go for interesting approximate solutions, an alternative solution is to pool in several limited resource nodes into a map-reduce cluster and design a parallel algorithm to conquer scale with concurrency. This is easier said than done since designing some parallel algorithms requires a different perspective of the problem. This is well worth the effort as the new insights gained will reveal connections between things you already knew. For instance, in our TextGraphs 2009 paper we started out scaling up Label Propagation but eventually the connection to PageRank became obvious. To me this was a bigger learning moment than getting Label Propagation work for large graphs. [Preprint Copy]

For the actual implementation, we used Hadoop (surprise!) although bulk synchronous parallel models make more sense given the locality of the operations in most graph algorithms.

Tuesday, March 31, 2009

Sentiment Analysis is AI-Hard

In a breezy article on sentiment analysis, Alex Wright quotes Bo Pang saying:

We are dealing with sentiment that can be expressed in subtle ways.
This is so true with the examples I've encountered while working and my favorite is this one I saw on iTunes recently.

While I commend Alex for writing an informative yet accessible article on the topic, I disagree with the article's opinion that sentiment analysis is a series of "filters". That is clearly an euphemism. Any working sentiment analysis system is actually an engineering feat often consisting of a series of hacks duct-taped by a glue handling special cases.

The article also seems to suggest that extracting factual information is somehow easier than opinions. I invite them to participate here.

Saturday, February 28, 2009

On the way to Brewer's Art

Never mind how we got to this topic:

me: Parsing is for fogies.
Markus: What?
Jason: I think he means crusty old linguists.
Markus: You should probably use a shallow parser.
me: I'm shallower than that; I use n-grams.