3. Linguistics - Computing for Human and Animal Communication

Natural language understanding (NLU) is one of the main challenges of artificial intelligence and is the driver for research in machine learning and deep neural networks. SMASH partners are active members of EU research networks and lead or participate in research projects at the EU and national level, addressing both general language understanding challenges through integration of linguistic and common-sense knowledge into deep learning and their application in less-resourced languages. Their excellence was recognised by several awards to research groups and individuals such as the most outstanding research achievement of UL in 2018 to its Centre for Language Resources and Technologies.

Some ongoing research topics in the consortium include:

Research sub-areas

3.1 Linguistic knowledge injection into deep learning

UL FRI is investigating readability assessment (providing valuable information in educational, news media, legal domain, financial, and public services domains) and explainable Machine Learning models (several robust approaches that explain decisions of machine learning models, enabling understanding of otherwise opaque complex models, and providing necessary inspection and trust of AI tools).

Host institution: UL FRI

3.2 Animal communication

Compositionality is a defining characteristics of human language allowing for an infinitude of linguistic expressions used in everyday communication. Marine mammals including dolphins are believed to possess a rich system of vocalizations used in communication among conspecifics as well. Exploring the compositional power of their communication system is essential for understanding the nature of language, communication, human and animal cognition and evolution. This is an ambitious task that requires a concerted inter-disciplinary effort of linguists, experts in signal analysis, (marine) biologists and computer scientists. Using machine learning techniques in this area is expected to dramatically boost the inquiry by systematizing large arrays of vocalization data and detecting emerging patterns of signal co-occurrence and possible hierarchical structure.

Host institution: UNG-CCSL

3.3 Cross-lingual transfer for less-resourced languages

SMASH partners are involved in development of large language resources (e.g., annotation system of morphosyntactic specifications that offers a linguistically informed systematisation of multilingual language data) and computation intensive cross-lingual semantic technologies (e.g., neural pretrained language models for less-resourced languages such as English-Croatian-Slovene model) that enable cross-lingual transfer from English.

Host institution: JSI KT, UL FRI, UNG-CCSL

SMASH research areas
1. Data Science - Machine Learning for Scientific Applications
2. Fundamental Physics - Machine learning for Particle Physics, Astrophysics and Cosmology
3. Linguistics - Computing for Human and Animal Communication
4. Climate - Machine Learning in Climate Research
5. Precision Medicine - Personalised Medicine and Life Sciences