Misc Research Stuff: datasets

Thursday, December 14, 2006

The congressional speech corpus

The "congressional speech" corpus and associated graph information
used in our "Get out the vote: Determining support or opposition from
Congressional floor-debate transcripts" EMNLP 2006 paper is now
available.

Specifically, the data includes speeches as individual documents,
together with:

* automatically-derived labels for whether the speakers supported
the legislation under discussion or not, allowing for
experiments with this kind of sentiment analysis

* indications of which debate each speech comes from (and the
position within the debate), allowing for consideration of
conversational structure

* indications of by-name references between speakers, allowing for
experiments with agreement classification (if one determines the
"true" labels from the support/oppose labels assigned to the
pair of speakers in question)

* the edge weights and other information we derived to create the
graphs we used for our experiments upon this data, facilitating
implementation of alternative graph-based classification methods
upon the graphs we constructed

The download site is:
http://www.cs.cornell.edu/home/llee/data/convote.html

Matt Thomas, Bo Pang, and Lillian Lee

Misc Research Stuff

Nearest Neighbors

Discover

NLPers

Learning Theory

These days on ACL Wiki

Delip Rao

Twitter

Search

Blog Archive

Misc Links

Thursday, December 14, 2006

The congressional speech corpus