Multilingual Corpus of Second Language Speech (MuSSeL) - Learner Corpora - L2TReC

About MuSSeL

The Multilingual Corpus of Second Language Speech (MuSSeL) is being developed by researchers at the University of Utah’s Second Language Teaching & Research Center. It provides researchers and teachers with an unprecedentedly large and varied set of transcribed and tagged L2 speech samples as well as access to the original MP3 recordings.

Learn More About MuSSeL

Interested in the MuSSeL corpus?

Before applying for full access, we invite you to explore our sample corpus. This preview offers a glimpse into the variety of texts and the organizational structure of the corpus.

We encourage you to examine the sample corpus to ensure its suitability for your research objectives. Should it meet your requirements, detailed instructions are available to guide you through the process of gaining access to the full MuSSeL corpus.

Explore Sample Corpus

Already Registered? Search Mussel Database

Register for MuSSeL

STEP 1

Choose Access Level

Allows users to open individual speech samples (MP3 and TXT) in a web browser. Users won't be able to download the corpus for linguistic analyses. Temporary Online Access will automatically terminate three months after the registration date.

Required Materials:

Upload a signed Non-Disclosure Agreement (NDA).
Project Description for Temporary Access

Provide a brief description of your project, specifically focusing on how your project intends to explore second language development with the use of MuSSeL (100-300 words).

Allows users to download the whole corpus for offline use. Users are required to show proof of IRB approval or exemption to receive full offline access. Online Access to the corpus will expire on the expected project completion date unless you request an extension.

Required Materials:

Upload a signed Non-Disclosure Agreement (NDA).
Project Description for Full Access

Provide a concise yet detailed description of your research purpose, specifically focusing on how your project intends to explore second language development with the use of MuSSeL. Emphasize how the anticipated outcomes will contribute to the field of second language learning and/or teaching (200-500 words).

Expected Project Completion Date

Please be aware that your online access to the corpus files is set to expire on the date you have chosen. Should you require continued access beyond this date, you must submit a request for an extension.

Proof of IRB Approval or Exemption for Full Access

Please submit a PDF copy of the Institutional Review Board (IRB) approval or exemption document. For guidance on the application process for IRB approval, click here

STEP 2

Fill Out Database
Registration

Link to Registration Form

NOTICE: You must prepare the required materials for the desired access level listed in Step 1 before filling out the form.

STEP 3

Sign a Material Transfer Agreement

Only for Non-UofU Affiliates

MuSSeL file names include the student ID and a file number separated by an “_.” The student ID consists of 11 characters and specifies students’ language, unique ID, age group, and the overall assigned rating on the ACTFL test. The file number counts each student’s produced speech files. Each student usually has multiple speech files. For example, file c0002cadr01_1 means that the speaker is a Chinese adult learner with an ILR rating of 1.

mussel file name table

Other character combinations or codes in file names:

Character 1: Language

c for Chinese, f for French, p for Portuguese, r for Russian, and s for Spanish.

Characters 7 & 8: Context or Learner age group

“ad” marks adult students.
If the learner is a child, a 2-digit code indicates child’s grade level (i.e., 03, 04, 05, 06, 07, 08, 09, 10).

Characters 10 & 11: Rating

Child Ratings

“ba” means child rating below N1. “b” stands for below, and “a” stands for Form “A” in the AAPPL test. N1 is the lowest rating possible when a student takes Form A of the AAPPL test and the AAPPL rating, “below N1”, means that raters were unable to rate the speaker’s performance based on the ACTFL rating scale.
"bb" means child rating below N4. The first “b” stands for below, and the second “b” stands for Form “B” in the AAPPL test. N4 is the lowest rating possible when a student takes Form B of the AAPPL test and the AAPPL rating, “below N4” means that the student took Form B of the AAPPL test and raters were unable to assign a rating to the speaker since the performance was below the lowest rating possible.
"aa" marks an Advanced child rating. The character “a” was repeated to achieve consistency in file name length.
Other child rating codes are n1, n2, n3, n4, i1, i2, i3, i4, i5.

Adult Ratings

“00”, “01”, “02”, “03” represent adult ratings, 0, 1, 2, and 3 (ILR rating scale for the OPIc Test), respectively.
“0p”, “1p”, “2p” correspond to adult ratings, 0+, 1+, and 2+ (ILR rating scale for the OPIc Test), respectively.
“aa” specifies the adult rating, AL-AH. This rating is given when the rater cannot choose a specific sub-level (Low, Medium, High) under the Advanced level.
"ss" marks adult rating, S or superior.
Other adult rating codes are nl, nm, nh, il, im, ih, al, am, ah.

c0003cadr0p_1

c0003cadr0p_3

c0003cadr0p_4

c0003cadr0p_6

c0003cadr0p_7

c0003cadr0p_8

When a file number is missing from the database, it could mean one of the following:

The file was EMPTY or fully unintelligible.
The file was corrupt, and wouldn’t open, so we were unable to transcribe it. (Rare case)

The MP3 file is the original speech file produced on the test by the speaker. The MP3 files were transcribed according to the 2021 CHAT transcription protocols established by CHILDES (MacWhinney, 2000). CHA files are the transcriptions written in CLAN. Including CHA files allows the users to enjoy the multitude of tools for tagging and linguistics analyses available on the CLAN program. The TXT file is a copy of the transcriptions in CHA format with a few modifications: 1) angle brackets have surrounded the TXT file headers to separate the headers from the main text and allow the analysis of the main text in corpus analysis tools, 2) the bullets or time stamps that link the audio files to the CHA files have been removed since they have no use in TXT files. TXT files may be used by corpus users who are unfamiliar with the CLAN program or prefer to use other corpus analysis tools. Finally, the PDF file format allows the users to preview the files in their browsers before downloading them. Including the additional formats improves overall accessibility for MuSSeL.

Gender data was not collected from adult speakers in the past few years. Most recent adult data usually include gender information.

The adult files in MuSSeL come from the Oral Proficiency Interview by Computer (OPIc) tests. “An OPIc can be rated according to the ACTFL scale, the Interagency Language Roundtable (ILR) scale, or the Common European Framework of Reference for Languages (CEFR) scale (Language Testing International, n.d.). In MuSSeL, the adult files either had the ACTFL rating or the ILR rating. “An ACTFL OPIc reports a rating between Novice and Superior on the ACTFL scale. An ILR OPIc rating reported is between ILR 0 (No Proficiency) and ILR 3 (Professional Proficiency)” (Language Testing International, n.d.). The following table demonstrates the correspondence between ACTFL and CEFR ratings on OPIc tests.

ACTFL and CEFR Rating Alignment on OPIc(adapted from the ACTFL report, Assigning CEFR Ratings to ACTFL Assessments)

ACTFL Proficiency Scale

ACTFL Rating on OPIc

Corresponding CEFR Rating

Superior

Advanced

Advanced (AL-AH)

B2-C1

Advanced High

Advanced Mid

B2.2

Advanced Low

B2.1

Intermediate

Intermediate High

B1.2

Intermediate Mid

B1.1

Intermediate Low

Novice

Novice High

Novice Mid

As for the ILR rating scale, we have not provided the corresponding CEFR or ACTFL ratings since there is no consensus in the literature on the alignment between the ILR-scaled score (0, 0+, 1, 1+, 2, 2+, 3) and the other two scales.

AAPPL Rating Alignment with ACTFL and CEFR Scales (ACTFL Proficiency Guidelines, 2012)

ACTFL Proficiency Scale

Corresponding ACTFL Rating

AAPPL Score

Corresponding CEFR Rating

Advanced

Advanced Low-High

B2-C1

Intermediate

Intermediate High

B1.2

Intermediate Mid

B1.1

Intermediate Low

Novice

Novice High

Novice Mid

Novice Low

Below N4

Below N1

List of topics in MuSSeL were created in five steps: 1) transcribers identified the subjects of the speech files and added annotations to the transcription files, 2) all topics were collected from the files and the database spreadsheets and ordered based on frequency, 3) topics with lower frequency were merged into broader categories, 4) the list of topics under each broad category was recorded and reshared with the transcribers, 5) the assigned topics were revised on the database spreadsheet to match the finalized list of topics. The following are lists of topics that emerged from the child and adult sub-corpora.

List of Topics in the Child Sub-Corpus

About Yourself
Activities
Clothes
Colors
Current Affairs
Family
Food
Holidays
Introductions
Locations
Other People
Routines
School
Sports
Time, Seasons, Climate
Unspecified (Unable to Specify a Topic, Unintelligible Content)

List of Topics in the Adult Sub-Corpus

About Yourself
Business & Technology
Current affairs
Education
Entertainment
Events & Activities
Family
Food
Jobs
Locations
People
Questions
Routines
Sports
Travel
Unspecified (Unable to Specify a Topic, Unintelligible content)

Tutorials

Disclaimer: The following tutorials introduce the pilot version of MuSSeL and the old search filters, which do not match the current status of the corpus.

MuSSeL Corpus Tutorial: Introducing AntConc Software and Basic Corpus Searches

Speaker: Dr. Erin Schnur

Date: June 2019

Citation (APA 7th Ed.): Schnur, E. (2019, June 5). MuSSeL corpus tutorial: Introducing AntConc software and basic corpus searches [Video Tutorial]. University of Utah. https://mediaspace.utah.edu/media/t/1_c5x9e2

Tutorials for Language Teachers: Using the Multilingual Spoken Second Language (MuSSeL) Corpus

Speaker: Dr. Erin Schnur

Date: Feb. 2019

Citation (APA 7th Ed.): Schnur, E. (2019, February 5). Tutorials for language teachers: Using the multilingual spoken second language (MuSSeL) corpus [Video Tutorial]. University of Utah. https://mediaspace.utah.edu/media/t/1_k3o5di

The Multilingual Corpus of Second Language Speech (MuSSeL)

Speaker: Dr. Erin Schnur

Date: Nov. 2018

This five-minute tutorial introduces the multilingual spoken second language (MuSSeL) corpus, explains the pilot corpus search filters, and describes how to use AntConc (a corpus analysis freeware by Laurence Anthony) to explore MuSSeL by providing examples.

Citation (APA 7th Ed.): Schnur, E. (2018, November 9). The multilingual corpus of second language speech (MuSSeL) [Video Tutorial]. University of Utah. https://mediaspace.utah.edu/media/t/1_y8lostzz

Learner Corpora

L2TReC

Register for MuSSeL

STEP 1

Choose Access Level

STEP 2

Fill Out Database
Registration

STEP 3

Sign a Material Transfer Agreement

1. What do MuSSeL file names mean?

2. Why are some file numbers missing from the database? For instance, files number 2 and 5 are missing from c0003’s list of associated files in the following list of file names.

3. Each file in MuSSeL is presented in four file formats: CHA, MP3, PDF, and TXT. What is each file format used for?

1. Most adult files in MuSSeL do not have gender information; why is that?

2. What are the two options under the Adult Proficiency Level filter (i.e., OPIc-ACTFL Rating, OPIc-ILR Rating)?

3. What are the corresponding proficiency ratings of the AAPPL performance scores in the MuSSeL child database?

4. What do child and adult topics tell us?

MuSSeL Corpus Tutorial: Introducing AntConc Software and Basic Corpus Searches

Tutorials for Language Teachers: Using the Multilingual Spoken Second Language (MuSSeL) Corpus

The Multilingual Corpus of Second Language Speech (MuSSeL)

Multilingual Corpus of Second Language Speech

Access Mussel

FAQ

Tutorials

About MuSSeL

Interested in the MuSSeL corpus?

Already Registered? Search Mussel Database

Register for MuSSeL

STEP 1

Choose Access Level

STEP 2

Fill Out Database Registration

STEP 3

Sign a Material Transfer Agreement

Sign a Material Transfer Agreement (Only for Non-UofU Affiliates)

Frequently Asked Questions

Tutorials

MuSSeL Corpus Tutorial: Introducing AntConc Software and Basic Corpus Searches

Tutorials for Language Teachers: Using the Multilingual Spoken Second Language (MuSSeL) Corpus

The Multilingual Corpus of Second Language Speech (MuSSeL)

We'd love to hear your questions, comments, or suggestions!

Fill Out Database
Registration