On March 27th, the Department of Computer Science at the University of Hamburg held a vibrant exhibition where ten groups of students showcased their academic achievements through Bachelor's and Master's theses, as well as findings from various projects and internships.
This event provided a platform for students to present their innovative work to a broader audience, including peers, faculty members, potential students, and representatives from the corporate world.
Among the highlights was the CRAMT project, a result of collaborative efforts within the Digital and Data Literacy in Teaching Lab (DDLitLab). The project, led by students Christian Schuler, Tramy Thi Tran, Deepesha Saurty, Anran Wang and Raman Ahmad, supervised by Dr. Seid Muhie Yiman, House of Computing & Data Science (HCDS), was recognized for its novel approach in creating a text corpus that links languages previously unconnected in research.
This innovative work secured the second place at the EXPO, underscoring the university's commitment to fostering cross-disciplinary research and digital literacy.
More about CRAMT
The project MTACR (Multilingual Text As Corpus Repository for Machine Translation of Low-Resource Languages) began with a mission to address the challenges faced by low-resource languages, those with limited presence on the internet. The project's goal was to collect and curate language data to support natural language processing, particularly the development of robust translation systems for low-resource languages such as Mauritian Creole, the Kurdish dialect Kobani, Vietnamese, and Chinese.
Despite receiving 10,000€ funding from DDLitLab at the University of Hamburg, the collection of data for the target languages proved to be challenging. This led to the inception of a complementary project, CRAMT (Cross-Lingual Resource Aggregation of Low-Resource Machine Translation and Metadata). This tool facilitates the creation of multilingual aligned text data for extremely low-resource languages, enhancing the quality of datasets through an annotation schema involving monolingual native speakers.
The project team's efforts culminated in a presentation of CRAMT at EXPO-2024 held at the University of Hamburg, where it received the second-place award and a prize of 200€. This recognition underscores the significance of the project's contributions to the field.
For those interested in exploring CRAMT further, the GitHub repository is available at: https://github.com/christianschuler8989/CRAMT.
Additionally, information about the initial MTACR project can be found on the digital platform of DDLitLab at the University of Hamburg: https://www.isa.uni-hamburg.de/en/ddlitlab/data-literacy-studierendenprojekte/third-round/textcorpus.html."