Vocabulary size | Text coverage |
1000 2000 3000 4000 5000 6000 15,851 | 72.0% 79.7% 84.0% 86.8% 88.7% 89.9% 97.8% |
Vocabulary size | % coverage | Density of unknown words |
2000 words 2000 + proper nouns 2600 words 5000 words | 90% 93.7% 96% 98.5% | 1 in every 10 1 in every 16 1 in every 25 1 in every 67 |
Researchers | 1st 1000 | 2nd 1000 | Total |
Sutarsyah (1993) academic texts a long economics text Hwang (1989) a range of texts Hirsh (1992) short novels | 74.1% 77.7% 77.2% 84.8% | 4.3% 4.8% 4.9% 5.8% | 78.4% 82.5% 82.1% 90.6% |
What is also interesting is the number of different words (word types)
from the second 1000 that actually occurred in a mixture of different kinds
of texts compared with more homogeneous texts. In any one text, such as
a novel or a textbook, around 400 to 550 of the second 1000 words from the
GSL actually occurred. When a mixture of texts was looked at however around
700 to 800 of the second 1000 words occurred (Hirsh and Nation, 1992; Sutarsyah,
Nation and Kennedy, 1994).
The second 1000 words behave in this way because they are lower frequency
words than the first 1000 words and have a narrower range of occurrence.
That is their occurrence is more closely related to the topic or subject
area of a text than the wide ranging more general purpose words in the first
1000. But given a range of topics and genres, and enough texts, the second
1000 words are more generally useful than other lists of words.
After the 2000 high frequency words of the GSL, what vocabulary does a second
language learner need? The answer to this question depends on what the language
learner intends to use English for. If the learner has no special academic
purpose then the learner should work on the strategies for dealing with
low frequency words. If however the learner intends to go on to academic
study in upper high school or at university, then there is a clear need
for general academic vocabulary. This can be found in the 836 word list
called the University Word List (UWL) (Xue and Nation, 1984; Nation, 1990).
The UWL consists of words that are not in the first 2000 words of the GSL
but which are frequent and of wide range in academic texts. Wide range means
that the words occur not just in one or two disciplines like economics or
mathematics, but occur across a wide range of disciplines. The word frustrate
for example which is in the UWL can be found in many different disciplines.
The UWL is really a compilation from four separate studies, Lynn (1973),
Ghadessy (1979), Campion and Elley (1971), and Praninskas (1972). Here are
some items from it.
accompany formulate index major objective
biology genuine indicate maintain occur
comply hemisphere individual maximum passive
deficient homogeneous job modify persist
edit identify labour negative quote
feasible ignore locate notion random
(Nation, 1990)
The value of the UWL can be seen when we look at the coverage of academic
text that it provides.
Researchers | 1st 2000 | UWL | Total |
Hwang
(1989) academic texts Sutarsyah (1993) an economics text | 78.1% 82.5% | 8.5% 8.7% | 86.6% 91.2% |
Source | 1st 2000 (GSL) | UWL | Total |
Academic
Newspapers | 78.1% 80.3% 82.9% 87.4% | 8.5% 3.9% 4.0% 1.7% | 86.6% 84.2% 86.9% 89.1% |
Note the low coverage the UWL has of fiction. Newspapers and magazines
which are more formal make use of more of the UWL. Very formal academic
text makes the greatest use of the UWL. The UWL is thus a word list for
learners with specific purposes namely academic reading. The purpose behind
the setting up of the UWL is to create a list of high frequency words for
learners with academic purposes, so that these words can be taught and directly
studied in the same way as the words from the GSL can.
Word frequency lists
The major theme of this paper has been that we need to have clear sensible
goals for vocabulary learning. Frequency information provides a rational
basis for making sure that learners get the best return for their vocabulary
learning effort. Vocabulary frequency lists which take account of range
have an important role to play in curriculum design and in setting learning
goals.
This does not necessarily mean that learners must be provided with large
vocabulary lists as the major source of their vocabulary learning. It does
mean however that course designers should have lists to refer to when they
consider the vocabulary component of a language course, and teachers need
to have reference lists to judge whether a particular word deserves attention
or not, and whether a text is suitable for a class.
The availability of powerful computers and very large corpora now make the
development of such lists a much easier job than it was when Thorndike and
Lorge (1944) and their colleagues manually counted 18,000,000 running words.
The making of a frequency list however is not simply a mechanical task,
and judgements based on well established criteria need to be made. The following
list suggests several of the factors that would need to be considered in
the development of a resource list of high frequency words.
1Representativeness The corpora that the list is based on should
adequately represent the wide range of uses of language. In the past, most
word lists have been based on written corpora. There needs to be a substantial
spoken corpus involved in the development of a general service list. The
spoken and written corpora used should also cover a range of representative
text types. Biber's (1990) corpus studies have shown how particular language
features cluster in particular text types. The corpora used should contain
a wide range of useful types so that the biases of a particular text type
do not unduly influence the resulting list.
2Frequency and range Most frequency studies have given recognition
to the importance of range of occurrence. A word should not become part
of a general service list because it occurs frequently. It should occur
frequently across a wide range of texts. This does not mean that its frequency
has to be roughly the same across the different texts, but means that it
should occur in some form or other in most of the different texts or groupings
of texts.
3Word families The development of a general service list needs to
make use of a sensible set of criteria regarding what forms and uses are
counted as being members of the same family. Should governor be counted
as part of the word family represented by govern? When making this
decision, the purposes of the list and the learners for which it is intended
need to be considered. As well as basing the decision on features such as
regularity, productivity, and frequency (Bauer and Nation, 1993), the likelihood
of learners seeing these relationships needs to be considered (Nagy and
Anderson, 1984).
4Idioms and set expressions Some items larger than a word behave
like high frequency words. That is, they occur frequently as a unit (Good
morning, Never mind), and their meaning is not clear from the
meaning of the parts (at once, set out). If the frequency
of such items is high enough to get them into a general service list in
direct competition with single words, then perhaps they should be there.
Certainly the arguments for idioms are strong, whereas set expressions could
be included under one of their constituent words (but see Nagy, this volume).
5Range of information To be of full use in course design, a list
of high frequency words would need to include the following information
for each word - the forms and parts of speech included in a word family,
frequency, the underlying meaning of the word, variations of meaning and
collocations and the relative frequency of these meanings and uses, and
restrictions on the use of the word with regard to politeness, geographical
distribution etc. Some dictionaries, notably the revised edition of the
COBUILD dictionary, include much of this information, but still do not go
far enough. This variety of information needs to be set out in a way that
is readily accessible to teachers and learners.
6Other criteria West (1953: ix) found that frequency and range alone
were not sufficient criteria for deciding what goes into a word list designed
for teaching purposes. West made use of ease or difficulty of learning (it
is easier to learn another related meaning for a known word than to learn
another word), necessity (words that express ideas that cannot be expressed
through other words), cover (it is not efficient to be able to express the
same idea in different ways. It is more efficient to learn a word that covers
a quite different idea), stylistic level and emotional words (West saw second
language learners as initially needing neutral vocabulary). One of the many
interesting findings of the COBUILD project was that different forms of
a word often behave in different ways, taking their own set of collocates
and expressing different shades of meaning (Sinclair, 1991). Careful consideration
would need to be given to these and other criteria in the final stages of
making a general service list.
With a continuing emphasis on communication in language teaching there is
a tendency to give less attention to the selection and checking of language
forms in course design. Now that the benefits of form focused instruction
are being positively reassessed, we may see a change in attitude towards
vocabulary lists and frequency studies. The benefits of giving attention
to principles of selection and gradation in teaching however remain important
no matter what approach to teaching is being used. A goal of this review
of the findings of research on vocabulary size and frequency is to show
that this information can result in considerable benefits for both teachers
and learners.
References
Bauer, L. and I.S.P. Nation. 1993. Word families. International Journal
of Lexicography 6, 4: 253-279.
Biber, D. 1990. A typology of English texts. Linguistics 27: 3-43.
Campion, M.E. and W.B. Elley. 1971. An Academic Vocabulary List.
Wellington: NZCER.
Carroll, J.B., P. Davies and B. Richman. 1971. The American Heritage
Word Frequency Book. New York: American Heritage Publishing Co.
Carter, R. and M. McCarthy (eds.) 1988. Vocabulary and Language Teaching.
London: Longman.
D'Anna, C.A., E.B. Zechmeister and J.W. Hall. 1991. Toward a meaningful
definition of vocabulary size. Journal of Reading Behavior 23: 109-122.
DeRocher, J.E. 1973. The Counting of Words: A Review of the History,
Techniques and Theory of Word Counts with Annotated Bibliography. New
York: Syracuse University Research Corp.
Dupuy, H.J. 1974. The Rationale, Development and Standardization of a
Basic Word Vocabulary Test. Washington, D.C.: U.S. Government Printing
Office.
Ellis, R. 1990. Instructed Second Language Acquisition. London: Blackwell.
Engels, L.K. 1968. The fallacy of word counts. IRAL 6: 213-231.
Fox, J. and J. Mahood. l982. Lexicons and the ELT materials writer. English
Language Teaching Journal 36, 2: l25-l29.
Francis, W.N. and H. Kucera. 1982. Frequency Analysis of English Usage.
Boston: Houghton Mifflin Company.
Fries, C.C. and A.A. Traver. 1960. English Word Lists. Ann Arbor:
George Wahr.
Ghadessy, M. 1979. Frequency counts, word lists, and materials preparation:
a new approach. English Teaching Forum 17, 1:24-27.
Goulden, R., P. Nation and J. Read. 1990. How large can a receptive vocabulary
be? Applied Linguistics 11: 341-363.
Hazenburg, S. and J. Hulstijn. 1996. Defining a minimal receptive second-language
vocabulary for non-native university students: An empirical investigation.
Applied Linguistics 17, 1: in press.
Hirsh, D. 1992. The vocabulary demands and vocabulary learning opportunities
in short novels. Unpublished MA thesis, Victoria University of Wellington,
New Zealand.
Hirsh, D. and P. Nation. 1992. What vocabulary size is needed to read unsimplified
texts for pleasure? Reading in a Foreign Language 8, 2: 689-696.
Hwang, K. 1989. Reading newspapers for the improvement of vocabulary and
reading skills. Unpublished MA thesis, Victoria University of Wellington,
New Zealand.
Hwang, K. and P. Nation. 1989. Reducing the vocabulary load and encouraging
vocabulary learning through reading newspapers. Reading in a Foreign
Language 6, 1: 323-35.
Hwang, K. and I.S.P. Nation. 1995. Where would general service vocabulary
stop and special purposes vocabulary begin? System 23, 1: 35-41.
Jamieson, P. 1976. The acquisition of English as a second language by young
Tokelau children living in New Zealand. Unpublished Ph.D. thesis, Victoria
University of Wellington.
Joe, A., P. Nation, and J. Newton. 1996. Speaking activities and vocabulary
learning. English Teaching Forum 34, 1: in press.
Judd, E. L. l978. Vocabulary teaching and TESOL: a need for re-evaluation
of existing assumptions. TESOL Quarterly l2, 1: 7l-76.
Kucera, H. 1982. The mathematics of language. In The American Heritage
Dictionary. Boston: Houghton Mifflin. 2nd ed.
Laufer, B. 1989. What percentage of text-lexis is essential for comprehension?
In C. Lauren and M. Nordman (eds.), Special Language: From Humans Thinking
to Thinking Machines. Clevedon: Multilingual Matters.
Liu Na and I.S.P. Nation. 1985. Factors affecting guessing vocabulary in
context. RELC Journal 16, 1: 33-42.
Long, M. 1988. Instructed interlanguage development. In L. Beebe (ed.) Issues
in Second Language Acquisition. New York: Newbury House.
Lorge, I. and J. Chall. l963. Estimating the size of vocabularies of children
and adults: an analysis of methodological issues. Journal of Experimental
Education 32, 2: l47-l57.
Lynn, R.E. 1973. Preparing word lists: a suggested method. RELC Journal
4, 1: 25-32.
McKeown, M.G. and M.E. Curtis (eds.) 1987. The Nature of Vocabulary Acquisition.
Hillsdale, N.J.: Erlbaum.
Meara, P. and G. Jones. 1990. The Eurocentres Vocabulary Size Tests.
10KA. Zurich: Eurocentres.
McIntosh, X., M. Halliday and P. Strevens. 1961.
Milton, J. and P. M. Meara. 1995. How periods abroad affect vocabulary growth
in a foreign language. ITL 107-108: 17-34.
Nagy, W.E. and R.C. Anderson l984. How many words are there in printed school
English? Reading Research Quarterly l9: 304-330
Nagy, W.E., P. Herman, and R.C. Anderson. l985. Learning words from context.
Reading Research Quarterly 20: 233-253.
Nation, I.S.P. l982. Beginning to learn foreign vocabulary: a review of
the research. RELC Journal l3: 14-36.
Nation, I.S.P. 1990. Teaching and Learning Vocabulary. New York:
Newbury House.
Nation, I.S.P. 1993a. Using dictionaries to estimate vocabulary size: essential,
but rarely followed, procedures. Language Testing 10, 1: 27-40.
Nation, I.S.P. 1993b. Vocabulary size, growth and use. In The Bilingual
Lexicon. ed. R. Schreuder and B. Weltens, Amsterdam/Philadelphia: John
Benjamins. pp. 115-134.
Nation, I. S. P. forthcoming. Teaching Listening and Speaking.
Paivio, A. and A. Desrochers. 1981. Mnemonic techniques in second language
learning. Journal of Educational Psychology. 73, 6: 780-795.
Praninskas, J. 1972. American University Word List. London: Longman.
Pressley, M., J.R. Levin and H. Delaney. l982. The mnemonic keyword method.
Review of Educational Research 52: 6l-9l.
Richards, J.C. l974. Word lists: problems and prospects. RELC Journal
5: 69-84.
Rosenweig, M.R. and D. McNeill. l962. Inaccuracies in the semantic count
of Lorge and Thorndike. American Journal of Psychology 75: 3l6-3l9.
Schmitt, N. and D. Schmitt. 1995. Vocabulary notebooks: theoretical underpinnings
and practical suggestions. English Language Teaching Journal 49,
2: 133-143.
Schonell, F.J., I.G. Meddleton and B.A. Shaw. l956. A Study of the Oral
Vocabulary of Adults. Brisbane: University of Queensland Press.
Seashore, R.H. and L.D. Eckerson. l940. The measurement of individual differences
in general English vocabularies. Journal of Educational Psychology
3l: l4-38.
Sinclair, J. 1991. Corpus, Concordance, Collocation Oxford: Oxford
University Press.
Sternberg, R.J. 1987. Most vocabulary is learned from context. In McKeown
and Curtis, 89 105.
Sutarsyah, C. 1993. The Vocabulary of Economics and Academic English. Unpublished
MA thesis, Victoria University of Wellington, New Zealand.
Sutarsyah, C., I.S.P. Nation and G. Kennedy. 1994. How useful is EAP vocabulary
for ESP? A corpus based case study. RELC Journal 25, 2: 34-50.
Thorndike, E.L. and I. Lorge. l944. The Teacher's Word Book of 30,000
Words. Teachers College, Columbia University.
Thorndike, E.L. l924. The vocabularies of school pupils. In J. Carelton
Bell (ed.) Contributions to Education. New York: World Book Co.
Webster's Third New International Dictionary. 1963. Massachusetts: G.
& C. Merriam Co.
West, Michael l953. A General Service List of English Words. London:
Longman, Green &
Co.
Xue Guoyi and I.S.P. Nation. 1984. A university word list. Language Learning
and Communication 3: 215-229.
Contact Info:
Rob Waring
Notre Dame Seishin University, 2-16-9 Ifuku-cho, Okayama, Japan 700
Tel 086 252 1155 Fax 255 7663 Home 086 223 0341
Email:Rob Waring
Return to Main menu of papers