The OED in two minutes

Modern English includes words from a wide variety of different sources around the world. Patterns of word-borrowing over time reflect the changing demography of its speakers; cultural and economic influences on Britain; the spread of English-speaking explorers, traders, and settlers; and encounters with other cultures.

Each data point shown here represents the first recorded use of a word in English, positioned according to the language from which the word was borrowed. The size of the data point indicates the frequency of the word: larger bubbles for higher-frequency words, smaller bubbles for lower-frequency words. The progress bar at the bottom tracks the growth of English, subdivided into the major language groups from which words are derived.

Click here to access the feature, or read on to learn more about how the interactive map works:

Dating

The date shown for a given word is based on the date of first recorded use, as evidenced in the word’s entry in the OED. The date of first recorded use may not always be quite the same as first actual use – see here for a discussion of OED’s use of evidence – but should be close enough in most cases for this to give a good picture of trends in borrowing.

Note that the date of a word’s appearance in the map above may be slightly different from the first date given in the OED entry. Particularly in the early period (pre-1500), when precise dating can be difficult, first dates tend to be artificially clumped on round numbers (c.1350, c.1400, etc.). When timing the appearance of data points on the map above, a small amount of dither is applied to such dates in order to spread them more evenly.

Why start at 1150?

The history of English goes back well before 1150, but the date of first appearance of words in written texts becomes a much less reliable indication of when they actually entered the English language. A word likemighty, for example, occurs in the earliest written records of English, and has cognates in numerous Germanic languages; it almost certainly formed part of the language that the earliest Germanic settlers brought with them to the British Isles. In a case like this, the date of first recorded use tells us more about when English starts to be documented in written sources than it does about when this word first entered the language.

For more on this period, see Old English—an overview and Old English in the OED.

Geography

Words borrowed from a given language are positioned in the geographical area associated with that language. For some languages, e.g. Arabic, this area may be quite extensive; the exact position of the data point within that area is essentially random, and should not be taken to indicate a specific site.

Of course, the fact that a language is associated with a particular area does not mean that the borrowing of a word into English actually occurred in that area. For example, data points for words acquired from Dutch all appear in the area of the modern Netherlands, even though a number of these words were acquired through encounters between British and Dutch traders in the Far East in the 17th and 18th centuries.

Size of data points

The size of a bubble on the map is a rough indication of the frequency of the word in modern English: larger bubbles for higher-frequency words, smaller bubbles for lower-frequency words.

Note that this is a logarithmic relationship: the radius of the bubble is proportional to the log of the word’s frequency. This means that a small increase in radius represents a very large increase in frequency. Hover over the bubble to see the actual frequency.

Number of words and word frequency

The progress bar at the bottom of the map indicates how much of the vocabulary of modern English was established by a given date, and how that breaks down between major language families.

When considering the influence on English of a particular language or region of the world, we need to take into account not only the number of words borrowed, but also the frequency of those words: a small number of common words may have had more influence on English than a larger number of very rare words.

For example, in the early 19th century there are many borrowings from Maori – words like hongi, kakapo, andPakeha. But most of these are very low-frequency; hongi occurs only about once every 100 million words in modern English. In spite of the large number of words, then, it may be argued that the total influence of Maori on English (in terms of word borrowing, at least) is very small.

To reflect this, the progress bar at the bottom of the map represents the summed frequencies (in modern English) of words borrowed from different areas, not a simple count of word borrowings. This makes clear that English has remained overwhelmingly dominated by the major language families of western Europe – Germanic, Romance, and Latin. The small yellow component at the end of the progress bar represents the summed frequencies of all borrowings from languages outside Europe: at 2010, this includes about 5200 words recorded in the OED, but accounts for only about 0.2% of all English usage. By contrast, the 7700 words derived from Germanic languages account for 49% of all English usage. On average, a word derived from a Germanic language is likely to be about 200 times as common in English as a word derived from a non-European language.

It’s also striking to see how much of modern English was already established by 1150. Although the language at this stage contains relatively few words that have survived into modern English, these include most of the core words that we use all the time (the, run, head, etc.). So the summed frequencies of these words is very high, which is why at 1150 the progress bar is already over half-way to its modern total.

Where does the frequency data come from?

The frequency data used here is derived from the Google Ngrams data set. Specifically, frequency ‘in modern English’ is based on average frequency between 1970 and 2008. For a given word, this is calculated by adding together the average frequencies of all the surface forms for that word; these include inflections (e.g. mend + mends + mended + mending for the verb mend) and alternative spellings (e.g. theatre + theatres + theater + theaters).

There are a number of difficulties in mapping n-grams to OED entries. Homographs pose a particular problem: given frequency values for the n-gram pen, for example, it’s not easy to determine how these values should be subdivided between OED’s different entries for the writing implement, the animal enclosure, and the female swan. We’ve attempted to address this problem using a number of heuristics, and by cross-checking with secondary data sources; but the results often involve approximation. Frequencies given for individual words should therefore be treated with caution. However, this does not significantly affect aggregated values (overall rates of growth, etc.).

Rate of growth

The grey circle in the bottom left indicates the rate of growth of the language at each point in time. Again, this is measured in terms of the summed frequencies of words being acquired. The circle is larger when large numbers of words are being added to the language – or a smaller number of relatively high-frequency words – and smaller when fewer or lower-frequency words are being added.

Direct and indirect borrowings

The attribution of a borrowed word to a given language of origin tends to reflect the direct source of the borrowing, i.e. the language from which the word was adopted or adapted into English (as indicated in the OED’s etymology). This may not always reflect the word’s earlier origin, and can sometimes give unexpected results.

For example, the OED’s etymology for mongoose indicates that the word’s route into English was via the Portuguese mangus. (The first recorded use of mongoose in English is in a 1673 letter by Gerald Aungier, who served as governor of Bombay a few years after its transfer from Portuguese control.) Of course, mangus was not originally a Portuguese word; Portuguese had in turn borrowed it from the Marathi language of western India. Nevertheless, because the direct source is Portuguese, mongoose is attributed to Portuguese in the map above.

As this example illustrates, direct borrowing often tells only a small part of a word’s history. Reading the OED’s etymology will give a more complete account.

Coverage

The map does not display every entry in the OED. Omitted entries fall into several categories:

By far the largest category are entries for words formed in English (not borrowed from other languages). These include:

derivatives – words formed by adding prefixes and suffixes to existing English words, e.g. regimentalfrom regiment, literalness from literal, decommission from commission
compounds – words formed by combining existing English words, e.g. quarterback, moonlight, penfriend.

However, these are counted towards the overall growth of English in the progress bar at the bottom of the map: words formed in English constitute the white component of the bar.

Various other types of word have origins which cannot be represented in a map of this kind. These include words of uncertain origin (nifty, reggae), eponyms and toponyms (macadam, Darwinian), words formed by onomatopoeia (moo, pow), and invented words (jabberwock, quark).
Where OED has separate entries for different parts of speech of the same root word (e.g. muscle, n. andmuscle, v.), these are registered as a single data point (using the date of the earlier entry, and combining the frequencies of both).