How to generate text from a video file using python

As per the trend, everyone is talking about Natural language processing, speech recognition, text generation etc. In this article, we will discuss on how can we get text from the video or audio files.

Pre-requisites:
>> Python 3.7
>> ffmpeg
>> Libraries: os and speech_recognition

Step 1: Prepare directory
Create a new folder and add some video files. For instance, I have created a folder ‘SpeechConversion’ and in this folder I have one video song (in .mp4 format).

Big Data Jobs

Step 2: Import libraries
Import the required libraries, refer below code:
import os
import speech_recognition as sr

Step 3: Command for video conversion
I am using ffmpeg to convert the video file to audio. First, I will convert this to mp3 format and then will transform it to the wav format, as wav format allows you to extract better features.
Here, my video file name is Bolna.mp4, I convert this to Bolna.mp3 then to Bolna.wav.
Below are the commands for the conversion process.
Let’s save them in variables as below.
command2mp3 = “ffmpeg -i Bolna.mp4 Bolna.mp3”
command2wav = “ffmpeg -i Bolna.mp3 Bolna.wav”

Trending AI Articles:

1. Top 5 Open-Source Machine Learning Recommender System Projects With Resources

2. Deep Learning in Self-Driving Cars

3. Generalization Technique for ML models

4. Why You Should Ditch Your In-House Training Data Tools (And Avoid Building Your Own)

Step 4: Execute video conversion commands
Let us now execute these commands using the ‘os’ library as below
os.system(command2mp3)
os.system(command2wav)

Step 5: Load the wav file
Now, let us load the wav file that was created in the above step.
The below code can be used for the same.
r = sr.Recognizer()
audio = sr.AudioFile(‘Bolna.wav’)

Step 6: Process the wav file
Lastly, as per the required, set the duration of the audio you want for further processing.
I am keeping this as 100 seconds duration for test purposes.
You can change the same as per your convenience.
with audio as source:
audio = r.record(source, duration=100)
print(r.recognize_google(audio))

Voila, you can get the text for the first 100 seconds of the video or audio file.

Further enhancements:
The text generated can be later used for Natural language understanding and Natural language generation processes.

Hope this helps. Do share your comments below.

Thank you..!!

Don’t forget to give us your ? !


How to generate text from a video file using python was originally published in Becoming Human: Artificial Intelligence Magazine on Medium, where people are continuing the conversation by highlighting and responding to this story.

Via https://becominghuman.ai/how-to-generate-text-from-a-video-file-using-python-261f59e95b5f?source=rss—-5e5bef33608a—4

source https://365datascience.weebly.com/the-best-data-science-blog-2020/how-to-generate-text-from-a-video-file-using-python

Published by 365Data Science

365 Data Science is an online educational career website that offers the incredible opportunity to find your way into the data science world no matter your previous knowledge and experience. We have prepared numerous courses that suit the needs of aspiring BI analysts, Data analysts and Data scientists. We at 365 Data Science are committed educators who believe that curiosity should not be hindered by inability to access good learning resources. This is why we focus all our efforts on creating high-quality educational content which anyone can access online.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Design a site like this with WordPress.com
Get started