Natural Language Processing using Nvidia NeMo

The latest information is at this GitHub repo.

The lead for this project is alan.heirich@gmail.com and the end user platform in his intent is a TV – side Raspberry PI implementing a model to do english – spanish translation on the fly.  The critical part is model training on a large VCTK corpus provided by Alan.

I started with ‘Listen_Attend_Spell’  and  this PyTorch model on an Ubuntu system equipped with an Nvidia GPU. 

The project was however placed on hold in 2020 due to technical issues and lack of support.

However, in July 2021 I found a possible new approach using Nvidia NeMo. This report on Automatic Speech Recognition, or ASR explains the details and preliminary results, and I also published an early report on linkedin: The model is relatively small which bodes well for simple edge devices deployment.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s