More BYU corpora

Besides the Movie Corpus, these are some other corpora from Brigham Young University:

  • The TV Corpus is based on TV episodes from the 1950s to the present. It includes American, British and Australian television programmes.
  • The SOAP Corpus is based on American soap operas from the early 2000s.
  • The TIME Corpus is based on articles from TIME magazine from 1923-2006.
  • The Wikipedia Corpus contains the full text of Wikipedia – 1.9 billion words in more than 4.4 million articles.
  • The iWeb Corpus contains 14 billion words in 22 million web pages.

Full list here.