Presentation title:
Towards More Frugal Speech Technologies
Presentation description:
Trained on gigantic amounts of data, and using seemingly infinite computing resources, End-to-End models and LLMs have contributed to significantly improved performance for speech technologies over the last decade. However, balancing performance across varied data types and languages remains quite challenging and performance still degrades significantly on types of data unseen in the training phase. Many applications do not benefit from entering this new era of AI. This is the case for military applications where data is scarce, highly confidential, and extremely costly to annotate, whilst needing to process varied types of data such as rare dialects, highly degraded radio signals or communications based on specific phraseologies. Vocapia's quest for frugal AI speech processing aims to implement highly accurate and portable solutions using reasonable resources, both in training and at runtime. This presentation will address strategies to tackle these frugality challenges on speech processing use cases, and the integration of these advancements in Vocapia's software products. Work presented is partially conducted in the context of projects funded by the European Defence Fund with the aim to bridge the gap in performance between civil and military applications.