I had this cool idea recently and figured its worth mentioning here.
A weird but common issue with films is that often there can be a scene where characters speak in a foreign language to the language of the film. This is often done intentionally to preserve thematics and obfuscate what they are saying, however sometimes it’s not clear if this is intentional and there can be extended dialogue which confuses the viewer.
Most importantly, often times this audio is not represented in subtitle files, or when it is having the subtitles on during understood spoken language dialogue is distracting.
The idea was to be able to intelligently display subtitles for foreign languages only when they are spoken.
To achieve this one could possibly:
- Track the existing language via meta data
- Train a NN on language classification
- Read ahead and check to see if the language deviates
- If so, display subtitles (potentially checking for deviation in the text as well) only during that period, in fact some deeper analysis of different srt files across languages might actually prove valuable to this prediction
- Alternatively offer an option to transcode using STT, potentially even offline with a model like DeepSpeech, and then feed through a translation model like Google Translate.
I totally get that this is a non trivial idea, but I wanted to write it down anyway since I think it really stands out as something which is truly intelligent and only really possible within very recent advances in DL and Edge Hardware.