The "congressional speech" corpus and associated graph information
used in our "Get out the vote: Determining support or opposition from
Congressional floor-debate transcripts" EMNLP 2006 paper is now
available.
Specifically, the data includes speeches as individual documents,
together with:
* automatically-derived labels for whether the speakers supported
the legislation under discussion or not, allowing for
experiments with this kind of sentiment analysis
* indications of which debate each speech comes from (and the
position within the debate), allowing for consideration of
conversational structure
* indications of by-name references between speakers, allowing for
experiments with agreement classification (if one determines the
"true" labels from the support/oppose labels assigned to the
pair of speakers in question)
* the edge weights and other information we derived to create the
graphs we used for our experiments upon this data, facilitating
implementation of alternative graph-based classification methods
upon the graphs we constructed
The download site is:
http://www.cs.cornell.edu/home/llee/data/convote.html
Matt Thomas, Bo Pang, and Lillian Lee
Thursday, December 14, 2006
The congressional speech corpus
- Delip Rao at 6:03 AM
Principal Components: "sentiment analysis", datasets
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment