Multilingual Subtitling

SYSTRAN worked on 2 variations of the automatic multilingual subtitle generation task:

  • Starting from the French reference subtitle of france.tv access (translation of subtitles, requires the video to be already subtitled in a source language)
  • Starting from the aligned transcription of the LISN (generating subtitles from the soundtrack, can apply to any video)
3 generations of models were developed and evaluated over the course of the project:

V0 – August 2020 – Proof of Concept – Subtitle Translation Model

  • Single multilingual system, from French to all other languages
  • Amount of training data: 135 million parallel lines (0.74% annotated with subtitle-specific segmentation)
  • Takes as source the French subtitle from france.tv access
  • Integration: developing a first code, from source ttml to target ttml
  • Evaluation: internal (all languages)

V1 – April 2021 – Subtitle Generation Model from Speech Recognition

  • Single multilingual system, from French and English to all other languages
  • Amount of training data: 14.3 million parallel lines (20% annotated with subtitle-specific segmentation)
  • Takes the segmented speech recognition generated by the LISN as source: this is a more interesting task because it allows you to generate subtitles for a video that does not have existing ones
  • Integration: adding colors indicating the type of speech (person on screen, off screen, voice-over, foreign language, song…)
  • Evaluation: internal (all languages), business (English, Spanish), users (Spanish)
 
V2 – July 2021 – Context-assisted subtitle translation model
 
  • Monolingual system, from French to English, standard in machine translation
  • Amount of training data: 1.2 million parallel lines (22% annotated with a subtitle-specific segmentation, 79% contextualized with the previous source and target concatenated to the present example)
  • Takes as source the French subtitle from france.tv access, to evaluate the quality of translation and segmentation without edge effects introduced by speech recognition
  • Consider previous source and target context
  • Integration: improved colors and introduction of headers (hyphens, speaker’s name) from French, timecodes adjusted to avoid subtitles being too short (minimum duration 19 frames =760ms) or too close (minimum interval 8 frames =320ms). The timecodes used are those of French subtitles for all stock broadcasts, and those of speech recognition – closer to real time – for all live broadcasts.
  • Evaluation: internal (English), business (English), users (English)

Demonstration of the French > English subtitling systems in 3 areas:

  •  top left: original subtitles
  • top right: System v0
  • bottom left: System v1
  • bottom right: System v2

MOOC (learning)

Series (entertainment)

20h newspaper (news) 

The internal and business evaluations, complemented by those of the end users of subtitles conducted by LUTIN and HC, concur to show measurable progress over the course of the project, notably in English between the first version of the system (development in April 2021, evaluation in May-July 2021) and the second version of the system (development in July 2021, evaluation in August-November 2021), and in particular for stock programs such as documentaries and magazines. Advances are visible both on the content and its integration in subtitles (synchronization, color display, speakers), which is very important for the user experience.

In view of its quality criteria, france.tv access concludes that the outputs of the developed automatic models are not directly usable for television broadcast. On the other hand, the usefulness of these automatic outputs remains considered by SYSTRAN in other contexts – contexts where specialized human resources are less numerous to meet the need for translation, which has the current result of non-captioning and non-accessibility of content – and in other production chains (with post-editing, training, program-specific information, etc.).

Top