Fourth-year Data Science students publish innovative paper on text difficulty classification for the Irish language
The paper, entitled ‘Exploring Text Classification for Enhancing Digital Game-Based Language Learning for Irish’, focuses on the development of text classifiers for Irish due to the lack of this feature’s availability for under-resourced languages. Digital game-based language learning (DGBLL) developers have incorporated text difficulty classifiers into their games for many of the most commonly spoken languages, but the researchers haven’t found this to be the case for under-resourced languages such as Irish in this instance.
Leona Mc Cahill and Thomas Baltazar explored several methods of evolving text classifiers for the Irish language. The process consisted of applying linguistic analysis to estimate the level of text complexity. In addition to this, machine learning-based text classification was also applied that examines the use of various machine learning algorithms to address the problem. While these text classification models are still in a preliminary phase, they demonstrate potential, especially in situations with limited resources.
Both students worked on this compelling paper with the DCU School of Computing’s Liang Xu, Dr Monica Ward, and Dr Jennifer Foster. In addition, student Sally Bruen and Dr. Elaine Ui Dhonnchadha from Trinity College also collectively worked on the paper.
Leona and Thomas presented this research alongside PhD student Liang Xu at the 3rd Annual Meeting of the Special Interest Group on Under-resourced Languages at the LREC-Coling conference.
To view the full research paper, see here: https://aclanthology.org/2024.sigul-1.12/