How do I generate speech from text?
Follow these steps to convert text to speech:
1. Click the "Create TTS Project" button at the bottom of the project list view
2. Select a text file (supported formats: TXT, SRT, VTT, or type directly)
3. The text loads into the editor with speakers detected automatically
4. Speaker assignment:
β’ Single speaker: Automatically appears as "Narrator"
β’ Multi-speaker: Specify speaker names with [Name] tags, untagged text becomes "Narrator"
5. In the Voice Library on the right, assign a voice actor to each speaker
6. Once all speakers have voices assigned, click "Generate Speech"
7. Audio plays as it generates - you can stop anytime
Note: Only the generated portion counts toward your usage quota if you stop early.
Is there a limit on the length of text for TTS?
There is no hard limit on text length.
However, your monthly TTS usage quota applies:
β’ Longer text consumes more of your monthly TTS hours
β’ Check your current usage in the toolbar
β’ Upgrade your plan for higher monthly quotas if needed
You can generate speech from text of any length as long as you have sufficient hours remaining in your monthly quota.
How do I quickly select a voice?
Each voice in the Voice Library displays key information at a glance:
Voice Information Display:
β’ Actor name abbreviation (e.g., JD for John Doe)
β’ Gender indicator: Pink for female, Blue for male
β’ Language code (e.g., EN for English)
β’ Accent indicator with national flag (e.g., πΊπΈ for American English)
β’ Full actor name
β’ Voice type: Solid circle for built-in voices, Dotted circle for cloned voices
Filtering Options:
Use the filter controls to narrow down your search:
β’ Filter by voice type (built-in or cloned)
β’ Filter by accent/language
β’ Filter by gender
These visual indicators and filters help you find the perfect voice quickly.
How do I assign voices to speakers?
Follow these steps to assign voice actors to your speakers:
1. In the Voice Assignment section (bottom right), click on a speaker name
2. The selected speaker will be highlighted
3. In the Voice Library, click on a voice actor icon to select it
4. The selected voice will show a border with two buttons at the bottom:
β’ Play button: Preview the voice actor's demo sound
β’ Checkmark button: Assign this voice to the selected speaker
5. After assignment, you'll see the pairing displayed (e.g., "Alice: Amelia β
")
6. Repeat for all speakers until no "No voice" labels remain
Note: One-to-one mapping:
β’ Each speaker can have only one voice
β’ Each voice can be assigned to only one speaker
This ensures clear voice distinction in your conversation.
How do I create a multi-speaker conversation?
To create conversations with multiple speakers:
1. Format your text with speaker labels: [John] Hello there!
2. VoClone will automatically detect speaker patterns
3. In the Voice Library, assign different voices to each speaker
Example:
[Alice] How are you today?
[Bob] I'm doing great, thanks for asking!
[Alice] That's wonderful to hear.
Is there a limit on the number of speakers in a conversation?
The number of speakers you can have is limited by the total number of available voices:
Maximum speakers = Built-in voices + Cloned voices
For example:
β’ If you have 10 built-in voices and 5 cloned voices, you can have up to 15 speakers in a conversation
β’ Each speaker requires a unique voice assignment
β’ Create more cloned voices to increase your speaker capacity
Your subscription plan determines how many cloned voices you can create (see Voice Cloning limits).
Can the same voice be used for multiple speakers?
No, each voice can only be assigned to one speaker per project. This ensures clear distinction between different characters in your conversation.
Can I edit text in my TTS project?
Yes! VoClone provides a simple yet powerful text editor for your TTS scripts:
Features:
β’ Edit individual sentences directly in the editor
β’ Delete current sentence
β’ Add or remove speaker labels with tags (e.g., [Alice] Hello!)
β’ Search/Replace text
β’ Adjust font size
β’ Export text
β’ View word frequency table
Note:
β’ Press Enter after editing a sentence or changing speaker labels to apply changes
β’ Changes are automatically saved to the project once applied
β’ All edits are immediately reflected when you generate speech.
What do the icons at the top of the editor mean?
The editor displays real-time statistics with visual icons:
β’ Character count: Total number of characters in your script
β’ Word count: Total number of words
β’ Sentence count: Number of sentences detected
β’ Speaker count: Number of unique speakers in your conversation
These statistics update automatically as you edit your text, helping you track the length and complexity of your script.
Why do I hear unrelated speech or random noise in my generated audio?
This is called "hallucination," a common phenomenon in generative AI:
What causes it:
β’ AI starts with random initialization each time
β’ Sometimes produces unintended sounds or speech fragments
β’ Different results occur with each generation
How to fix it:
β’ Regenerate the audio: Click Generate Speech again - each generation produces different results, so this often resolves the issue
β’ Edit the audio: Remove unwanted noise using the audio editing tools (see How do I edit audio?)
How do I edit generated audio?
To edit problematic portions of your generated audio:
1. In the audio player view, press and drag to select the portion you want to edit
2. The selected portion will highlight in red
3. To change selection, simply select again
4. Right-click on the selection and choose:
β’ Play Selection: Preview the selected part to confirm it's the portion you want to edit
β’ Silence Selection: Mute the selected portion (audio length stays the same)
β’ Remove Selection: Cut the portion out entirely (audio length becomes shorter)
This tool is especially useful for removing AI hallucination noise or unwanted segments from your generated speech.
How do I export generated audio?
To export your generated speech audio:
1. Locate the audio player view at the bottom of the TTS panel
2. Click the Export button in the audio player
3. Choose a location to save the audio file
4. Click Save to complete the export
What is the format of the generated audio?
VoClone generates audio in M4A format with the following specifications:
β’ Format: M4A (MPEG-4 Audio)
β’ Sample Rate: 44.1 kHz
β’ Quality: High-quality AAC encoding
What are the best practices for TTS?
Follow these tips to make the most of VoClone TTS:
Start Small:
β’ Begin with short projects to familiarize yourself with the tool
β’ Once comfortable, move on to longer projects
Master Audio Editing:
β’ Practice the selection and silence/remove features to fine-tune your audio
β’ This skill saves time and quota by fixing issues instead of regenerating
Regenerate Strategically:
β’ If only part of the generated speech is unsatisfactory, regenerate just that section
β’ No need to regenerate the entire sequence when most of it sounds good
β’ This approach conserves your monthly usage quota
General Tips:
β’ Use well-formatted text with proper speaker labels
β’ Preview voices before assigning them to speakers
β’ Export your audio when satisfied with the results
These practices help you create high-quality speech efficiently while managing your monthly quota.