Thursday, December 14, 2006

The congressional speech corpus

The "congressional speech" corpus and associated graph information
used in our "Get out the vote: Determining support or opposition from
Congressional floor-debate transcripts" EMNLP 2006 paper is now
available.

Specifically, the data includes speeches as individual documents,
together with:

* automatically-derived labels for whether the speakers supported
the legislation under discussion or not, allowing for
experiments with this kind of sentiment analysis

* indications of which debate each speech comes from (and the
position within the debate), allowing for consideration of
conversational structure

* indications of by-name references between speakers, allowing for
experiments with agreement classification (if one determines the
"true" labels from the support/oppose labels assigned to the
pair of speakers in question)

* the edge weights and other information we derived to create the
graphs we used for our experiments upon this data, facilitating
implementation of alternative graph-based classification methods
upon the graphs we constructed

The download site is:
http://www.cs.cornell.edu/home/llee/data/convote.html

Matt Thomas, Bo Pang, and Lillian Lee

No comments: