A Speech-To-Text app with Flask in which we can upload a video or an audio file and can get transcripts of the speech in the file we upload.
Once we upload a video file, it takes the audio from the video with the information of the file such as sampling rate by using ffmpeg-python, which is a wrapper of ffmpeg. Based on the information, it converts the audio to a 1-D Numpy array which is fed into the DeepSpeech model which trained by machine learning techniques based on Baidu's Deep Speech research paper. The output from the DeepSpeech model is then fed into a language model in order to improve the prediction accuracy.
For more infomation, please visit my site.
-
Clone this repository to your local.
git clone https://github.com/boblef/auto_transcript -
Set up the environment, and run the application
You can set up the environment in which we run the Flask application either by using Docker or by creating a conda or pip env by yourself. Strongly recommend to use Docker. Otherwise, you need to installSoxandffmpegto your machine.
- Docker
Build a container
Run the image
docker build -t auto_transcript:latest .Open up your browser, and copy and paste the link below. The application is supposed to start.docker run -d -p 5000:5000 auto_transcript:latesthttp://localhost:5000/
- Upload a file
There are sample audio files in the
samples/. You can grab one of them or upload amp4file you have. - Click "Create transcripts"
After you click the button, it will take some time to print the transcripts. (About a minute for 30sec length video)
- Once the transcripts printed out, there is a button that appeared where you can download a
zipfile that contains aJSONwhich includes a list of words with start time and duration and atextfile that keeps a sentence of words concatenated with white space.
- Add another feature that detects specific motions of the user and put marks on the sequence of frames so that the user will be able to find easily where they want to cut.
- Deploy as a C++ software since I want to create a software with C++.