If, like me, you’re passionate about artificial intelligence and speech synthesis, the Bark project developed by Suno will surely pique your interest. It’s an innovative and intriguing development in the realm of text-to-speech.
While most traditional text-to-speech models work based on phonemes, Bark takes a different approach. It produces realistic multilingual speech and incorporates music, background noises, simple sound effects, and even non-verbal expressions like laughter, sighs, and crying.
Imagine this scenario: you want to create an audio clip where your character talks about their love for pizza and suddenly bursts into laughter. With Bark, you can input text like ‘Hello, my name is Mohamed. And, uh – and I like pizza. [laughs]’ and get an audio output that faithfully captures that emotion.
Impressive, isn’t it? Even when dealing with multiple languages in the same text, Bark handles the nuances and native accents of each language. Currently, the highest quality output is achieved in English, but I’m sure other languages like French and Spanish will see improvements over time.
Now, you might be wondering how to create a clip that includes music. You can simply add music notes around your lyrics, like this: “♪In the jungle, the mighty jungle, the lion died tonight♪,” and Bark will generate audio with random music playing in the background.
Here are some keywords you can use to enhance your text-to-speech output with Bark:
- [laughter]
- [laughs]
- [sighs]
- [music]
- [gasps]
- [clears throat]
- — (or ellipsis) to mark hesitations
- ♪ for singing
- CAPITAL LETTERS to accentuate a word
- [MAN] and [WOMAN] to indicate whether the speaker is a man or woman.
The Bark Project recently made its model open-source under the MIT License, enabling researchers and companies to access and use it for their projects.
To start using Bark, you can easily install it using pip or by cloning the GitHub repository:
pip install git+https://github.com/suno-ai/bark.git
Once installed, you can import it into your Python script, load the templates, and generate audio from the text. Using the function, you can even listen to the result directly in Jupyter Notebook.
Here’s an example of how to integrate it:
from bark import SAMPLE_RATE, generate_audio, preload_models from scipy.io.wavfile import write as write_wav from IPython.display import Audio # download and load all models preload_models() # generate audio from text text_prompt = """ Hello, my name is Mohamed. And, uh — and I like pizza. [laughs] But I also have other interests such as playing tic-tac-toe. """ audio_array = generate_audio(text_prompt) # save audio to disk write_wav("bark_generation.wav", SAMPLE_RATE, audio_array) # play text in notebook Audio(audio_array, rate=SAMPLE_RATE)
You can also use it directly from the command line like this:
python -m bark --text "Hello, my name is mohamed." --output_filename "example.wav"
For developers interested in exploring more voice presets, Bark supports over 100 speakers presets in supported languages. You can find a library of voice presets here.
Demos are also available here on Huggingface or here at Replicate.
Bark is a truly exciting innovation in the field of text-to-speech. It’s like a Swiss army knife for developers and researchers working on text-to-speech, content creation, music, or AI projects.
If you’re interested, head over to their GitHub repository and dive into the fascinating world of advanced text-to-speech.
You can discover it here.