Speech Recognition Mechanism of Second Language Learners in Short-Term Memory

Kazuko TOJO

Department of English Literature, Kyushu Women's University
Yahatanishi-ku, Kitakyushu, Fukuoka 807 JAPAN

In an effort to realize an effective man-machine interaction, the analysis of human dialogue in a natural environment renders useful insights. What is no less important in the area of human dialogue is the study of communicative behavior in one's second language. This is especially the case where its acquisition process is expected to reveal some important aspects in each developmental phase. Among the so-called four skills of language proficiency (reading, writing, speaking and listening), listening skill has direct relevance to the treatment of speech recognition. Few studies deal with the learner's speech recognition process because of its latent and receptive nature. The purpose of this study is to clarify the speech recognition mechanism of second language learners by analyzing how Japanese learners of English in various stages of listening proficiency recognize spoken English sentences and increase their comprehension.

{Framework of study}
In comprehending a spoken English sentence, provided the linguistic elements such as semantics and syntax are under the learner's control, speech rate (speed) and the length of the sentence play a crucial role. The faster and longer sentence poses more difficulty in comprehension especially when the learner's speech recognition skill is not sufficient. Here, short-term memory plays a very important role and imposes the learner with a finite capacity in auditory information processing. 7+/-2 is a widely known number that a person can store in one's short-term memory in terms of any perceptual information unit (Miller, 1956). If a word in a sentence is taken as a unit, the proposed number would fail to legitimatize the fact that a person can process with ease much longer sentences in a natural environment. To solve the seeming contradiction, the following are hypothesized.
(1) A word is regarded as the smallest information unit.
(2) Speech recognition is restricted by a finite human channel capacity imposed by short-term memory.
(3) When the number of the units reaches the upper limit, more information will be obtained by expanding the capacity within each unit.

For example, the sentence "Why/ don't/ you/ meet/ me/ at/ 10/ o'clock/ in/ front/ of/ the/ library" contains 13 words which presumably go beyond the proposed number of words. But if the words are recognized in chunks as "Why don't you meet me/ at 10 o'clock/ in front of the library," the number of units is reduced to 3 and hence the burden of storing them in short-term memory is greatly minimized. This, in other words, increases the efficacy of auditory information processing.

In order to embody the latent and receptive process of speech recognition, an elicited repetition task was given to the subjects. This is an appropriate task to examine the learner's channel capacity observed in the overlap between input (a taped sentence) and output (the subject's reproduction). The subjects chosen for the study were 80 Japanese college students learning English. 30 English sentences were sampled from natural conversational American English. The number of the words in a sentence varied from 3 to 17 words. Vocabulary, grammar and the topic of the sentences were carefully controlled so that it would not pause any difficulties to the subjects. The test sentences were tape-recorded by a native speaker (an American female). The test which took 6 minutes was conducted in a language laboratory and the subjects' reproductions were tape-recorded. The recorded reproductions were scored by the tester according to the number of words correctly reproduced. The acceptability criterion was set by the minimum recognizability necessary to retrieve the original words, as pronunciation accuracy being of secondary importance.

{Results and Discussion}
The following results were obtained.

(1) Mechanical responses to the given sentences were found. The subjects reproduced the beginning of the sentence most successfully, then the end. The middle section of the sentence was the first to be dropped. Their characteristic responses to the sections of the sentence were confirmed with test sentence variations where an adverbial phrase was moved from the middle or the end section to the beginning of the sentence respectively. Comparison of the results for the variations indicated the subjects' responses were determined by the location of the words in the sentence rather than the meaning or function of the words.

(2) As the sentences got longer, some of the subjects started to respond only to the end section of the sentence.

(3) The reproduction errors appeared characteristically with the more successful subjects. Some errors were observed as the result of semantic reprocessing and the others as syntactic reprocessing.

(4) The most successful subjects' reproductions were characterized with a longer reproduction span.

It is safe to presume, therefore, that speech recognition is clearly controlled by a limitation posed by short-term memory when the subject's language proficiency is not sufficient. The fact that some of the subjects responded only to the end section as the sentences got longer is regarded as one of the strategies used to cope with the limitation. Better performance is realized by recognizing words in a longer span, which proves the hypothesis that the amount of information is increased by expanding the capacity of each unit. Also, speech recognition is more successfully implemented when the processings take place at multiple levels of semantic and syntactic areas simultaneously, rather than at a phonetic level alone. Deployment of various linguistic knowledge can back up the accurate speech recognition.
The findings regarding the speech recognition mechanism of second language learners are meaningful for a successful man-machine interaction. Some of the limitations a person is posed by short-term memory is a dimension that should be taken into account in a man-machine dialogue modeling.

Keywords: speech recognition, short-term memory, information unit, language acquisition