The latest information is at this GitHub repo.
The lead for this project is alan.heirich@gmail.com and the end user platform in his intent is a TV – side Raspberry PI implementing a model to do english – spanish translation on the fly. The critical part is model training on a large VCTK corpus provided by Alan.
I started with ‘Listen_Attend_Spell’ and this PyTorch model on an Ubuntu system equipped with an Nvidia GPU.
The project was however placed on hold in 2020 due to technical issues and lack of support.
However, in July 2021 I found a possible new approach using Nvidia NeMo. This report on Automatic Speech Recognition, or ASR explains the details and preliminary results, and I also published an early report on linkedin: The model is relatively small which bodes well for simple edge devices deployment.