Accents classification with DNN

July 11, 2019

The education area was reinvented during the last 15 years using the Internet. There are a lot of educational portals like coursera.com, udemy.com, online universities provide excellent opportunities to learn different topics. But some processes like tests are heavy to automate. It’s easy to check automatically mathematical calculations or grammar, but esse writing or language speaking check is till demand the coach participation.

Data science can provide additional possibilities in this area, for example, the language speaking clearness definition. It’s a famous issue in language studying. Clear speaking without a native accent like Norwegian or Indian ones is one of the professional language skills. It’s important for language schools or companies like call centers.

Technology

The project goal is to recognize the accent on audio record.

Deep Neural Networks approach was chosen to solve this task. The basic solution was done for “native English” and “non-native English” speakers, but during research, it was excluded from classification different accents (f.e. French, Arabic, etc).

Stella dataset was used to train the Deep Learning model, which provides 30 seconds of audio files from one speaker with few accents. BeautifulSoup library was used for scraping the data from the webpage.

Data preprocessing involves converting each audio file to vectors with 13 unique features. To get all the samples processed Mel-frequency Cepstrum Coefficients (MFCC) technique was utilized, which can be done with Librosa Python library.

The data processed was used as a training data to Convolutional Neural Network with 6 layers.

The classification accuracy of the solution is about 80%.