GitHub - boblef/auto_transcript: A Speech-To-Text application with Flask in which we can upload a video or an audio file and can get transcripts of the speech in the file we upload.

Speech To Text app with Flask

A Speech-To-Text app with Flask in which we can upload a video or an audio file and can get transcripts of the speech in the file we upload.

How it works

Once we upload a video file, it takes the audio from the video with the information of the file such as sampling rate by using ffmpeg-python, which is a wrapper of ffmpeg. Based on the information, it converts the audio to a 1-D Numpy array which is fed into the DeepSpeech model which trained by machine learning techniques based on Baidu's Deep Speech research paper. The output from the DeepSpeech model is then fed into a language model in order to improve the prediction accuracy.

For more infomation, please visit my site.

How to use

Clone this repository to your local.

git clone https://github.com/boblef/auto_transcript

Set up the environment, and run the application
You can set up the environment in which we run the Flask application either by using Docker or by creating a conda or pip env by yourself. Strongly recommend to use Docker. Otherwise, you need to install Sox and ffmpeg to your machine.

Docker Build a container
```
docker build -t auto_transcript:latest .
```
Run the image
```
docker run -d -p 5000:5000 auto_transcript:latest
```
Open up your browser, and copy and paste the link below. The application is supposed to start.
```
http://localhost:5000/
```

Upload a file There are sample audio files in the samples/. You can grab one of them or upload a mp4 file you have.
Click "Create transcripts" After you click the button, it will take some time to print the transcripts. (About a minute for 30sec length video)
Once the transcripts printed out, there is a button that appeared where you can download a zip file that contains a JSON which includes a list of words with start time and duration and a text file that keeps a sentence of words concatenated with white space.

Further Work

Add another feature that detects specific motions of the user and put marks on the sequence of frames so that the user will be able to find easily where they want to cut.
Deploy as a C++ software since I want to create a software with C++.

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
.circleci		.circleci
lib		lib
model		model
results		results
samples		samples
static		static
templates		templates
.DS_Store		.DS_Store
.gitattributes		.gitattributes
.gitignore		.gitignore
Dockerfile		Dockerfile
Procfile		Procfile
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt
runtime.txt		runtime.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Speech To Text app with Flask

How it works

How to use

Further Work

Reference

About

Uh oh!

Releases

Packages

Uh oh!

Languages

boblef/auto_transcript

Folders and files

Latest commit

History

Repository files navigation

Speech To Text app with Flask

How it works

How to use

Further Work

Reference

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages