Monolingual Subtitling

The production of monolingual subtitles intended in particular to make them accessible to the deaf or hard of hearing public has become a legal requirement for all television broadcasts since 2005, leading to a considerable increase in the number of subtitled hours. This step is often the first step towards the production of foreign-language subtitles, which will increase the audience for French-language programs. The production of subtitles often takes place at the end of the broadcasting process, and is mainly carried out in two very different ways: live captioning, for TV newspapers, talk shows or live events (e.g. political shows or sports competitions); and closed captioning for game shows, documentaries and fiction. In the first case, real time constraints are critical, and the subtitler must adapt to the spontaneity of the speeches and more generally to live broadcasting hazards; in the second case, it is potentially necessary to deal with a greater variety of sound events to be processed and transcribed: songs, laughter, ambient sounds, foreign language interventions, etc.

In this study, we examine the possibility of designing automated processing chains for TV program audio streams, using modern artificial intelligence (AI) methods, whose recent advances have significantly improved the quality of applications such as voice transcription and machine translation. Two issues are mainly addressed: (a) is it possible to achieve full automation of subtitle production for television programs? (b) Is the answer the same for all types and styles of programs, or are some harder to process than others?

To try to answer these questions, we developed algorithms and fully automated closed captioning systems. Starting from an automatic retranscription of the soundtrack in text form, these algorithms use machine translation-inspired methods to compress the statements and calculate their layout on the screen, to ensure that they meet the strictest standards of display and legibility. Secondly, these algorithms have been specialized and adapted to the different types of programs, which correspond to various uses and audiences: you may want to watch a program to learn new information, to entertain yourself, or to learn new skills, and subtitles do not play exactly the same role in all three situations.

Systematic evaluations of these algorithms have been implemented; on the one hand, on the basis of a test set including a diversity of programs; on the other hand based on an analysis by the specialists of france.tv access. These evaluations have highlighted the feasibility of end-to-end captioning systems adapted to the types of programs, but also the limitations of current systems: while the quality level for some live shows approaches the desired level of requirement, the average quality remains well below the target, and is inoperable, particularly for many programs with subtitles produced offline. One reason for this poor quality stems from retranscription errors, which remain too high; another reason stems from the closed captioning systems themselves, which fail to distinguish in the audio stream which elements of meaning need to be preserved.

Related publications:

Buet (2020), Analyse de la régulation de la longueur dans un système neuronal de compression de phrase : une étude du modèle LenInit, Actes de la conférence des jeunes chercheurs en traitement automatique des langues (RECITAL 2020).

François Buet, François Yvon. Toward Genre Adapted Closed Captioning. Interspeech 2021, Aug 2021, Brno (virtual), Czech Republic. pp.4403-4407.

François Buet, François Yvon. Vers la production automatique de sous-titres adaptés à l’aﬀichage. Traitement Automatique des Langues Naturelles, 2021, Lille, France. pp.91-104.