Collins Memorial Library: ENGL 433: History of the English Language: Texts & Corpora

Corpus Linguistics

A corpus is a collection of texts or text extracts that have been put together to be used as a sample of a language or language variety. It consists of texts that have been produced in "natural contexts" (published books, ordinary conversation, letters, newspapers, lectures etc), which means it mirrors natural language.

A reference corpus (created to be a balanced sample of a language variety) can be used as the basis of comparison between a text/genre and standard language.

Specialized corpora can be used to examine or compare different language varieties, such as language from a particular area, covering a certain genre or text type, produced by particular language users, etc.

Corpora can be synchrone (covering one time) or diachrone (covering several time periods), consist of different media (written or spoken language) and be composed of different languages.

Annotated corpora have extra information added, usually linguistic information (part-of-speech) or metadata (information about the material in the corpus, speakers/authors, situation, extra-linguistic information, etc).

There are corpora that can be consulted online, via a custom-built interface, and ones that you explore with stand-alone tools that you install on your computer.

The Routledge Handbook of Corpus Linguistics by Michael McCarthy (Editor); Anne O'Keeffe (Editor)
Call Number: Click on the title above for online access

ISBN: 0415464897

Publication Date: 2010

Provides a timely overview of a dynamic and rapidly growing area with a widely applied methodology. Bringing together experts in the key areas of development and change, the handbook is structured around six themes which take the reader through building and designing a corpus to using a corpus to study literature and translation.

Corpora of the English Language

American National Corpus
The Open American National Corpus (OANC) is a massive electronic collection of American English, including texts of all genres and transcripts of spoken data produced from 1990 onward. All data and annotations are fully open and unrestricted for any use.
English-Corpora.org
Created by Professor Mark Davies at Brigham Young University. Each corpora contains between 45 to 450 million words each. Includes access to the British National Corpus.
Google Books NGram Viewer
Use Google Books to chart word usage! (See example below).
Corpus of Middle English Prose and Verse
Part of the University of Michigan's Middle English Compendium, the Corpus of Middle English Prose and Verse is a searchable collection of Middle English text. This is not an exhaustive linguistic corpus, but it allows one to see examples of word, phrase, and collocation usage in a large body of material.

Google Books NGram Viewer

Is it linguistics or philology? Take a look at word usage frequency over the past three centuries!

Featured Resource: EEBO

Early English Books Online (EEBO)
Contains digital facsimile page images of virtually every work printed in England, Ireland, Scotland, Wales and British North America and works in English printed elsewhere from 1473-1700.

HathiTrust Digital Library

HathiTrust contains millions of digital books, journals, government documents, and other volumes, all digitized from research libraries. The collection includes both public domain and in-copyright works across a full range of subjects. Over half of the content is in English, but hundreds of languages are represented, including large amounts of material in German, French, Chinese, Russian and Spanish. Although all items are discoverable, viewability depends on the rights status of the individual item.

Check out our guide to using HathiTrust Digital Library for more information.

HathiTrust Digital Library
A partnership of academic and research institutions, offering a collection of millions of titles digitized from libraries around the world.

To access content, "LOG IN" by choosing the University of Puget Sound as your member library. You'll be prompted to enter your Puget Sound credentials. This opens up access to available content.