Greetings everyone, I am back for another blog update!

All of a sudden, but isn’t transcribing audio data a bit of a hassle??


As someone who prefers grasping the overview through text rather than listening, I really wish there was a tool that could effortlessly transcribe audio files into written text with just a simple input of the audio source file.

Considering that, I tried searching for terms like ‘Transcription Audio File,’ ‘Automatic Transcription Tool,’ and ‘Audio Data Transcription.’ However, what comes up are mainly:

  • Tools that convert speech picked up by a microphone into text.
  • Convenient tools for typing while playing back audio to manually input the text.

Typing manually seems like a challenging task, and playing the audio at high volume to capture it with the PC’s microphone could potentially inconvenience other colleagues. On the other hand, repeating the audio with headphones for voice recognition via the microphone also seems quite embarrassing within the company.

I mean… What I want is a tool that automatically transcribes the audio file through voice recognition and converts it into text.

And if possible, I find downloading software a bit troublesome, so I prefer to do it in the browser. Also, since I just want to get an overview, it would be great if it’s free…

However, I tried using a convenient tool found through a search to transcribe by typing while listening to the audio. But I couldn’t keep up with the typing, so I gave up early on. It was just too challenging.

So, I left it untouched for a while, but surprisingly, it got resolved effortlessly in a sudden moment.

The solution was to use YouTube’s subtitle feature!

Are you aware of the feature on YouTube that automatically adds subtitles? In fact, using this feature makes transcription super easy. Truly, it’s the prowess of Google. I love Google Senpai, always there to help when I’m in trouble.

So, let me explain how to use it in a simple way.



How to use it?


Prepare a google account

If you don’t have a Google account, please create one by clicking here.


Log in to YouTube with your Google account

Please login to YouTube.


Convert the file to formats like MP4 or .MOV and upload it to YouTube.

Since YouTube is typically used for hosting video content, you cannot upload audio-only files like .MP3 (Reference: Supported file formats on YouTube).

If your content is only audio, try converting it to a video format like .MP4 or .MOV before uploading it to YouTube.


Also, if you don’t want to make the video public for everyone, you can choose ‘Limited release.’ This way, only users who have the URL can view the video.



Wait for few hours

Once you’ve uploaded the video, just leave it, and it will automatically generates subtitles for you.

When the subtitles are generated, you will see an option to ‘Open transcription,’ as shown in the image below, located under the video by clicking on […].

When I tried it, subtitles were generated in about an hour.


Copy and paste subtitles

When you click the [Open transcription] button, the transcribed text will be displayed on the right side of the video.

By default, timestamps are shown, and if you copy-paste directly, the timestamps will be included. To copy only the text without timestamps, select [Toggle timestamp display] from the [⋮] menu in the top right of the transcription screen.

With this, you can now copy and paste the automatically transcribed text from the data file.


How about the accuracy?

If the voice is distant, the generated text can become quite unconventional. The entire text will be unruly, if the microphone is far away.

One particularly surprising moment was when it misread our company name ‘Next System’ as ‘Sh*t System.’

Simple but harsh criticism.


Not surprisingly, but having only one person speaking, with a clear and easily audible voice, tends to work well.

I hope you were able to grasp the general idea of the conversation in this blog.


Conclusion

Transcribing text through typing can indeed be time-consuming. By utilizing YouTube’s subtitle feature, I was able to easily streamline the process. Considering that it’s a tool from the mighty Google family, it seems challenging to find a tool with higher accuracy at the moment.

When I discovered this method, I was so impressed that I wrote a blog about it in excitement. However, when I searched for ‘YouTube transcription,’ it turned out to be quite common approach. Nevertheless, when I extensively researched terms like ‘Automatic Transcription of Audio Data,’ it didn’t show up. So I’m sharing it to make it more accessible for lost souls like me, preventing them from wasting time.

I hope it’s helpful for everyone reading this blog.

Although it’s completely unrelated to this blog, I’ll introduce it anyway. At Next System, we engage in cutting-edge technology development, including AI, xR (AR/VR/MR), such as the posture estimation AI engine ‘VisionPose,’ which can detect human skeletal information using only a camera. We’re involved in various projects, so please take a look at our website sometime!

For inquiries, please contact us here!