Research students at the Department of Computer Science and Engineering, University of Moratuwa have developed the country’s first large-scale large language model (LLM) that exclusively include Sinhala, a breakthrough in advancing local language computing.
This project was jointly supervised by Dr Surangika Ranathunga (Massey University, New Zealand), Dr Nisansa de Silva (University of Moratuwa) and Dr Rishemjit Kaur (Central Scientific Instruments Organisation, India).
The model, named “SinLlama,” was built by continually pre-training Llama-3-8B with nearly 10 million Sinhala sentences. According to the research team, SinLlama is the largest Sinhala LLM to date and has already outperformed Llama-3-8B on Sinhala text classification benchmarks.