ABOUT MUSSEL

The Multilingual Corpus of Second Language Speech (MuSSeL) is being developed by researchers at the University of Utah’s Second Language Teaching & Research Center. MuSSeL offers both researchers and educators a uniquely extensive and diverse collection of transcribed and annotated second language speech samples.

The corpus includes samples from three learning contexts (child classroom immersion, adult classroom, and adult immersive) across five languages: French, Mandarin Chinese, Portuguese, Russian and Spanish. For each speech sample, users can listen to the audio file and access the transcription in two file formats: CHAT and TXT. The transcriptions adhere to CHAT conventions set by CHILDES (MacWhinney, 2000), enabling users to conduct a variety of analyses using the CLAN software. All samples come from two testing situations: ACTFL Assessment of Performance toward Proficiency in Languages (AAPPL) online tool in the case of child samples and ACTFL Oral Proficiency Interview by computer (OPIc) for adult samples. The corpus is searchable using various filters, e.g., language, age group, gender, learning context, topic, and proficiency level. For more detailed information about the construction of the corpus, transcription conventions, and more, please navigate to the FAQ page.

MuSSeL is a corpus in progress, and we will consistently update it with new samples. Make sure to check back frequently for the latest additions.

How to Cite MuSSeL

Rubio, F., Kia, E., Schnur, E., & Hacking, J. (2021). Multilingual corpus of second language speech (MuSSeL) [Learner corpus]. University of Utah. Retrieved Month Day, Year, fromhttps://l2trec.utah.edu/learner-corpora/mussel/

Distribution of Texts across Languages in Sample Corpus

Languages	# of Students	# of Texts	# of Words
Chinese	1	16	225
French	1	7	349
Portuguese	1	9	408
Russian	1	18	1,329
Spanish	1	21	412
Total	5	71	2,723

Learner Corpora

L2TReC

Languages

# of Students

# of Texts

# of Words