Besides the Movie Corpus, these are some other corpora from Brigham Young University:
- The TV Corpus is based on TV episodes from the 1950s to the present. It includes American, British and Australian television programmes.
- The SOAP Corpus is based on American soap operas from the early 2000s.
- The TIME Corpus is based on articles from TIME magazine from 1923-2006.
- The Wikipedia Corpus contains the full text of Wikipedia – 1.9 billion words in more than 4.4 million articles.
- The iWeb Corpus contains 14 billion words in 22 million web pages.
Full list here.