How To Build a Multilingual Text-to-Audio Converter With Python | HackerNoon

“To have another language is to possess a second soul.”
— Charlemagne

Imagine you are traveling to a new country and had the ability to seamless have a conversation in their local language. That is what we will be trying to achieve in this article by building a simple text-to-audio converter app using Python, googletrans API and gTTS for text-to-speech conversion. We will go over the complete code, how the different components work, and how to leverage the different APIs to accomplish different tasks like converting text from English to any language and then converting it to audio in that specific language

The different components

The are three sections to this

  • Translationgoogletrans the Python library which uses Google Translation to help with language translation
  • Text-to-speechgTTS (Google Text-to-Speech) which will help convert text to audio format in the language of our choice
  • Audio playbackpygame which is primarily used for developing games, but we will be using it here to playback the audio that’s generated by gTTS

Prerequisites

We can use pip command in terminal to install the needed libraries:

pip install gTTS googletrans==4.0.0-rc1 pygame

Note: Sometimes you might encounter the below error when running the actual Python code –

AttributeError: 'coroutine' object has no attribute 'text'
sys:1: RuntimeWarning: coroutine 'Translator.translate' was never awaited

Fix – Make sure you have the correct version of googletrans installed. The version 4.0.0-rc1 is known to work well for synchronous operations.

Implementation

translate_text

The translate_text function uses the googletrans  for text translation. It takes two parameters: text, the actual string that needs to be translated, and dest_language the target language code (e.g., 'es' for Spanish). Inside the function, we create a Translator object and call the translate method which returns the translated text.

text_to_audio

The text_to_audio function helps convert the text to audio using gTTS and pygame. It takes two parameters: text and language, this would be the same as the dest_language input as we want the audio to be in the same language as the one it’s translated to. The function creates an audio file using gTTS and stores it as an MP3 file. Then we initialize pygame.mixer to handle audio playback, load the MP3, and then play it. We have a loop to ensure the audio fully finishes playing after which we can clean up the audio file if needed by setting should_clean_up_file to True

Below is the complete code –

from gtts import gTTS
from googletrans import Translator
import pygame
import os

def translate_text(text, dest_language):
    translator = Translator()
    translation = translator.translate(text, dest=dest_language)
    return translation.text

def text_to_audio(text, language):
    mp3_file = f'{language}_output.mp3'
    should_clean_up_file = True
    try:
        tts_file = gTTS(text=text, lang=language, slow=False)
        tts_file.save(mp3_file)
        pygame.mixer.init()
        pygame.mixer.music.load(mp3_file)
        pygame.mixer.music.play()
        while pygame.mixer.music.get_busy():
            pygame.time.Clock().tick(15)
    finally:
  
        if os.path.exists(mp3_file) and should_clean_up_file:
            os.remove(mp3_file)


def main(english_text, target_language='en'):

    translated_text = translate_text(english_text, target_language)
    print(f"English Text: {english_text}")
    print(f"Translated Text: {translated_text}")

    text_to_audio(translated_text, target_language)


if __name__ == "__main__":
    english_text = "Hello, welcome to the world of text-to-speech conversion using Python."
    target_language = 'es'  # Spanish
    main(english_text, target_language)

Input1 – English to Spanish:

english_text = "Hello, welcome to the world of text-to-speech conversion using Python."
target_language = 'es'  # Spanish
main(english_text, target_language)

Output:

English to Spanish translation

Audio output:

Spanish Audio file

This would have created an es_output.mp3 in your current folder which would be played by pygame

Input2 – English to Japanese:

english_text = "Hello, welcome to the world of text-to-speech conversion using Python."
target_language = 'ja'  # Japanese
main(english_text, target_language)

Output:

English to Japanese translation

Audio output:

Japanese Audio file

This would have created an ja_output.mp3 in your current folder which would be played by pygame

Applications and Use Cases

  • Accessibility – This can be easily integrated into a Tourism app or a website which can greatly help people who want to explore a foreign country where they don’t speak the native language, to travel with confidence
  • Language Learning – If someone is interested in learning a new language, we can leverage this tool to self-teach. We simply input the text we want translated and we get the converted text along with audio which can also help with pronunciation
  • Content Consumption – For people who want to multi-task, say listening to an audiobook while driving, this tool would be handy as it can read out the contents in a pace that you prefer
  • Multilingual Communication – In today’s world where multinational deals are common, having the power to articulate your thoughts, and business proposals to anyone in any language is a powerful asset that can make or break deals

Conclusion

There isn’t a space that can’t be benefited by this application. It’s simple to build but its benefits are vast. By developing this tool we not only have solved a real-world problem that many people face but have also learnt how we can use Python to make API calls,

initialize objects, invoke methods, functional programming, and try catch and clean up files after its use. Once you have mastered these and want a challenge you can try building an interactive GUI and host it in a web server to make it more user-friendly and add features like – the option to change pronunciation, pace, etc. The possibilities are endless and hope you keep pushing the boundaries of how we can use technology/coding to advance humankind.