Twitter corpora

The Internet Argument Corpus

The Internet Argument Corpus (IAC) is a corpus for research in political debate on internet forums. It consists of ~11,000 disscussions, ~390,000 posts, and some ~73,000,000 words. Subsets of the data have been annotated for topic, stance, agreement, sarcasm, and nastiness among others. WWW

The data is stored in JSON files with most annotations in CSV format (see included readme for details). Python code to load and use the data is included. The zip archive is 158MB.