Welcome to the Multilingual Corpus of Second Language Speech
The Multilingual Corpus of Second Language Speech is being developed by researchers at the University of Utah’s Second Language Teaching & Research Center. It provides researchers and teachers with an unprecedentedly large and varied set of transcribed and tagged L2 speech samples as well as access to the original MP3 recordings.
When complete, the corpus will include samples from three learning contexts (child classroom immersion, adult classroom, adult immersive) across six languages: Chinese, French, German, Portuguese, Russian and Spanish. For each speech sample, a user can listen to the audio file and access both a basic transcription and a transcription tagged according to CHAT protocols established by CHILDES.1 These latter transcripts can be used to run various analyses in CLAN. All samples come from testing situations (ACTFL Assessment of Performance toward Proficiency in Languages (AAPPL) online tool in the case of child samples and ACTFL Oral Proficiency Interview by computer (OPIc) for adult samples).
The corpus is searchable using various filters, e.g., age, gender, language, learning context, topic. Because the samples come from testing, each has been independently rated and samples can also be searched by proficiency rating.
This is an ongoing project and we welcome feedback and suggestions. New samples will continue to be added so check back regularly.
1 MacWhinney, B. (2000). The CHILDES Project: Tools for Analyzing Talk. 3rd Edition. Mahwah, NJ: Lawrence Erlbaum Associates.
We are grateful for seed funding from the VP for Research and the College of Humanities at the University of Utah, as well as funding to support corpus development from the Language Flagship. Here is the link to our pilot site.