Simon's Blog

Automatic Speech Recognition for Maltese - Part 1

September 22, 2024

As part of my Master’s Degree in AI, one of my assignments was to fine-tune the OpenAI Whisper model for Maltese.

Maltese is a low-resource language spoken by approximately half a million people, most of whom reside on the tiny island of Malta.

I was inspired by this paper that did the same process for Javanese, another low-resource language spoken mainly by residents of the island of Java.

Using my trusty and toasty Nvidia 1080 Ti and a jupyter notebook provided by huggingface I was able to re-train the model in around 20 hours for whisper-large-v2 using 32.9 hours of Maltese transcripted audio, by combining the Mozilla Common Voice and FLEURS datasets.

Using LoRA parameters of r=32 and alpha=64, resulting in approximately 1% of original parameters being retrained, the results were very encouraging:

Model Maltese WER % on original Whisper Maltese WER % on fine-tuned Whisper Diff
Base 114.66 67.88 -46.78
Small 111.18 44.67 -66.51
Large V2 89.67 32.00 -57.67

After completing my assignment, I decided to take it further and see how far I could push LoRA parameters to improve the model. From my research, the current wisdom is to put the alpha value always double the rank value (at least for LLMs). I could not find any supporting evidence for this for ASR, so I varied on that assumption too.

Here are the results tuning whisper-large-v3:

Rank Alpha Dropout WER % CER %
512 1024 0.05 38.98 11.43
1024 1024 0.05 285.1 210.98
1024 16 0.05 34.74 9.70
1024 24 0.1 32.67 8.93
1024 48 0.1 31.68 8.61
1024 48 0.2 29.59 7.86
1024 64 0.3 29.51 7.94

Increasing the trainable parameters was expected to bring an improvement (rank 1024 results in about 25% of the original model being retrained) but in the end increasing the dropout was what improved both WER and CER the most.

Testing this on data outside of the test data results in something that looks phonetically like Maltese but is not correct. My current intutition is that a lack of data is what is stopping the model from improving further, which will be addressed in part 2.


Written by Simon who lives in Malta. You can find out more about me on the about page, or get in contact.