Language Corpus name Words
Chinese Simplified Guangwai - Lancaster Chinese Learner Corpus 1,289,060
English ACL Anthology Reference Corpus (ARC) 62,196,334
English British Academic Spoken English Corpus (BASE) 1,186,290
English British Academic Written English Corpus (BAWE) 6,968,089
English Brown 1,007,299
English EcoLexicon English (Environment) 23,169,446
N'Ko Corpus Nko ߒߞߏ ߝߊ߬ߘߌ߬ߞߋ߬ߟߋ߲߬ߡߊ 4,102,593

