Frequently Asked Questions

Everything you need to know about VoClone

📑 Quick Navigation

🚀 Getting Started 🔒 Privacy & Security 📁 Project Management 🎤 Speech to Text 🔊 Text to Speech 👤 Voice Cloning 💳 Subscription & Usage ⚙️ Technical Support

🚀 Getting Started

What is VoClone?

VoClone is your all-in-one voice solution and private voice studio for macOS: • Speech-to-Text (STT): Transcribe audio files with timestamps • Text-to-Speech (TTS): Generate natural-sounding speech • Voice Cloning (VC): Create custom voice models Your Private Voice Studio: • All processing happens 100% locally on your Mac • Your audio files are NEVER uploaded online • Your text content is NEVER uploaded online • Complete privacy and security for your voice work • Maximum performance with local processing

How do I get started?

Getting started with VoClone: 1. First Launch: • App downloads required AI models (one-time) • Sign in with Apple ID for multi-device sync (one-time unless you sign out) 2. Choose Your Feature: • Use tab selector on top to switch between STT, TTS, and VC • Each tab shows relevant projects in sidebar Speech-to-Text (STT): • Transcribe audio with sentence-level timestamps • Export in TXT, SRT, VTT, JSON formats • Batch processing (Pro and Max plans) Text-to-Speech (TTS): • Generate speech with multi-speaker conversations • Use built-in or cloned voices • Audio editing tools to fix generation errors Voice Cloning (VC): • Record yourself reading a provided script • Clone your voice with automatic quality checking • Use cloned voices in TTS projects 3. Create Your First Project: • STT: Click "Create STT Project" to import audio • TTS: Click "Create TTS Project" for new project • VC: Click "Clone a Voice" to start recording and cloning Tip: Start with a simple project to familiarize yourself with the interface.

How fast does VoClone run?

VoClone is extremely fast, processing audio and generating speech at impressive speeds. Performance Benchmarks: M1 MacBook Air (16GB RAM): • STT: 52x real-time • TTS: 1.4x real-time • VC: ~ 3.5 sec M4 MacBook Air (16GB RAM): • STT: 81x real-time • TTS: 1.8x real-time • VC: ~ 2 sec What does this mean? STT (Speech-to-Text): • for example, 81x real-time means to transcribe 1 hour audio takes only 60/81 = 0.7 minute • Higher numbers are faster TTS (Text-to-Speech): • for example, 1.8x real-time means to generate 30 min audio takes 30/1.8 = 16.6 minute • Higher numbers are faster Note: Partial results appear immediately, so you can start reviewing transcriptions or listening to generated speech while processing continues. * Speed may vary depending on factors such as audio length and system load.

What are the hardware requirements?

VoClone requires: • Apple Silicon chip (M1 or newer) • At least 8GB of RAM • 4GB of free disk space for models and projects Important: VoClone does not run on Intel-based Macs. An Apple Silicon chip is required for optimal AI model performance.

What languages are supported?

Currently, VoClone supports English for both transcription (STT) and speech generation (TTS). Additional languages are planned for future releases. Stay tuned for updates as we expand language support!

Can I use the generated speech for commercial purposes?

Yes! You hold the copyright to all speech and audio content generated by VoClone. You are free to use your generated materials for any purpose, including commercial projects, without additional licensing fees.

Voice Cloning Responsibility & Legal Disclaimer:

You are solely responsible for ensuring you have proper authorization before cloning any voice. You must obtain explicit consent from the voice owner, and comply with all applicable copyright laws, privacy rights, and intellectual property regulations.

Prohibited Uses:
• Cloning voices of public figures without authorization
• Using cloned voices for impersonation, fraud, or harassment
• Any illegal activities involving voice cloning technology

Liability:
VoClone and its developers assume no liability for misuse of the voice cloning features. By using VoClone, you agree to indemnify and hold harmless VoClone from any claims, damages, or legal actions arising from your use of this technology.

Use VoClone responsibly and ethically.

🔒 Privacy & Security

Is my data private and secure?

Yes! VoClone is committed to protecting your privacy and personal data. Key Privacy Principles: • Local Processing: All audio processing happens locally on your device • No Cloud Storage: Your audio and text files are never uploaded to our servers • No Data Collection: We don't collect or track your personal information • Secure by Design: Your privacy is built into every feature What is shared with us: • Usage status for multi-device sync • Subscription status for account management What is NEVER shared: • Your audio files and recordings • Your transcriptions • Your generated speech • Your cloned voices • Your project data See our full privacy policy at privacy.html

Why do I need to sign in with Apple ID?

Apple ID sign-in is required to sync usage data across your devices: • Use VoClone on multiple Macs with the same account • Multi-device usage synchronization • Sign-in handled securely by macOS • macOS provides anonymous ID only for privacy

What is being downloaded on first launch?

VoClone downloads AI models for STT and TTS on first launch: • Models are downloaded automatically from secure cloud storage • Download happens in background with progress indicator • Models are cached locally for offline use • Total download size: approximately 2GB (model version 1) This one-time download ensures the best performance and enables offline functionality. • All processing happens 100% locally on your Mac for privacy protection and fast performance

Can I use VoClone offline?

VoClone works best with an internet connection for usage synchronization and subscription management. Core features can technically work offline (you can test by turning off your network): • Speech-to-text transcription • Text-to-speech generation • Voice cloning However, internet connection is needed for: • Usage synchronization across devices • Subscription validation and management • Initial model downloads (one-time) • Model updates (automatic when available) We recommend staying connected to ensure proper usage tracking and subscription features work correctly.

📁 Project Management

How do I organize my work?

VoClone uses a project-based system: • Projects automatically save your settings and work as you go • Each transcription, speech generation, or cloned voice is saved as a project • Use the sidebar to browse and select projects

How do I quickly locate a project?

VoClone provides several ways to find your projects: Search: • Use the search bar at the top of the sidebar • Type any part of the project name to filter results instantly Project Sorting: • Projects are sorted chronologically by creation date • Newest projects appear at the top • Scroll down to find older projects Project Information Display: Each project shows key details at a glance: STT Projects (Speech-to-Text): • 📅 Creation date • ⏱️ Audio duration • Audio file type (📄 imported file, 🎤 recorded) TTS Projects (Text-to-Speech): • 📅 Creation date • 📝 Sentence count • 👥 Speaker count VC Projects (Voice Clone): • 📅 Creation date This information helps you quickly identify and find the right project.

How do I rename a project?

To rename a project: 1. Right-click on the project in the sidebar 2. Select "Change Title" from the context menu 3. Enter the new name and press Enter Project names help you stay organized, especially when managing multiple projects.

How do I delete a project?

To delete a project: 1. Right-click on the project in the sidebar 2. Select "Delete" from the context menu Important: • Deleting a project removes all associated data: - Transcriptions (for STT projects) - Generated speech (for TTS projects) - Cloned voices (for VC projects) • This action cannot be undone Tip: Export your content before deleting if you want to keep it.

🎤 Speech to Text (STT)

How do I transcribe an audio file?

Follow these steps to transcribe audio: 1. Click the "Create STT Project" button at the bottom of the project list view 2. Choose your audio source: • Import an existing audio file from your Mac • Record live speech using your microphone 3. If importing a file: • Select your audio file (WAV, MP3, M4A, or MP4) • The audio will be loaded into the project 4. If recording live: • Grant microphone permission if prompted • Click the record button to start • Speak clearly with minimal background noise • Click stop when finished 5. Click "Transcribe" to start the transcription process 6. Review the transcription 7. Export your transcription in your preferred format (TXT, SRT, VTT, or JSON) Note: • You can stop transcription at any time during processing • Only the transcribed portion will count toward your usage quota • Your transcription is automatically saved with the project

What audio formats are supported for transcription?

VoClone supports common audio formats: Input Formats: • WAV: Uncompressed audio • MP3: Compressed audio • M4A: AAC compressed audio • MP4: MPEG-4 files with audio tracks Export Formats: • TXT: Plain text transcription • SRT: Subtitle format with timestamps (for video subtitles) • VTT: Web video text tracks format • JSON: Structured data with timestamps and speaker information

What can I do with transcribed sentences?

VoClone provides several functions for working with individual transcribed sentences: Playback: • Click the play button on the left side of any segment to listen to that specific portion • Perfect for reviewing transcription accuracy Context Menu Options: Right-click on any segment to access: • Copy Segment to Clipboard: Copies both the timestamp and text for easy reference • Play Sequentially from Here: Plays all segments in order starting from the selected one • Remove This Segment: Deletes the segment from your transcription • Export This Segment as Audio Clip: Saves just that segment as a separate audio file (premium feature ⭐) These tools give you precise control over your transcription results.

How do I export my transcription?

To export your completed transcription: 1. Click the "Export Transcription" button in the STT panel 2. Choose your preferred export format: • TXT: Plain text for simple documents • SRT: Subtitle format with timestamps for video subtitles • VTT: Web video text tracks format for web players • JSON: Structured data with timestamps and metadata 3. Select a location on your Mac to save the file 4. Click "Save" to complete the export Your transcription is now ready to use in other applications or workflows.

How do I split my transcription into audio clips?

You can split your transcribed audio into individual clips based on sentence boundaries: 1. Complete your transcription 2. Click the "Split into Audio Clips" button 3. Select a location on your Mac to save the files 4. VoClone will create separate audio files for each transcribed segment 5. Each clip is saved with its corresponding timestamp Note: This is a premium feature ⭐ available only for paid plan subscribers (Plus, Pro, and Max). This feature is perfect for: • Creating podcast highlights • Extracting specific quotes from interviews • Organizing audio content for editing

Can I transcribe multiple audio files at once?

Yes! Batch processing is available for Pro and Max plan subscribers: 1. Click the batch processing button in the STT panel 2. Select an input folder containing your audio files 3. Select an output folder where transcriptions will be saved 4. VoClone will process all audio files sequentially 5. Each transcription is exported directly to the output folder Note: • This is premium feature ⭐ available only for Pro and Max plans • You can pause or stop batch processing at any time • Completed transcriptions are saved even if you stop mid-batch This feature is perfect for processing large volumes of audio content efficiently.

How do I export my recorded audio?

To export the audio from your transcription project: 1. Locate the audio player view in the STT panel 2. Click the Export button in the audio player 3. Choose a location to save the audio file 4. Click Save to complete the export This allows you to save your recorded or imported audio separately from the transcription.

Is there a limit on audio length or transcription length?

There is no limit on audio file length or transcription length. However, your monthly usage quota applies: • Longer audio files consume more of your monthly STT hours • Check your current usage in the toolbar • Upgrade your plan for higher monthly quotas if needed You can transcribe audio of any length as long as you have sufficient hours remaining in your monthly quota.

How do I improve transcription accuracy?

Audio quality is the key to accurate transcriptions: For All Audio Types (Files and Recordings): • Use high-quality audio with clear speech • Maintain consistent volume levels (avoid audio that's too quiet or too loud) For Recording Live Speech: • Speak clearly and steadily • Pause briefly between sentences • Keep background noise to a minimum • Position yourself close to the microphone • Press stop when finished to avoid capturing extra noise Better audio quality directly translates to more accurate transcriptions.

Does VoClone support speaker diarization?

Speaker diarization (automatic speaker detection and labeling): Speaker diarization is not currently available in this version. This feature is planned for a future update. Stay tuned for updates!

🔊 Text to Speech (TTS)

How do I generate speech from text?

Follow these steps to convert text to speech: 1. Click the "Create TTS Project" button at the bottom of the project list view 2. Select a text file (supported formats: TXT, SRT, VTT, or type directly) 3. The text loads into the editor with speakers detected automatically 4. Speaker assignment: • Single speaker: Automatically appears as "Narrator" • Multi-speaker: Specify speaker names with [Name] tags, untagged text becomes "Narrator" 5. In the Voice Library on the right, assign a voice actor to each speaker 6. Once all speakers have voices assigned, click "Generate Speech" 7. Audio plays as it generates - you can stop anytime Note: Only the generated portion counts toward your usage quota if you stop early.

Is there a limit on the length of text for TTS?

There is no hard limit on text length. However, your monthly TTS usage quota applies: • Longer text consumes more of your monthly TTS hours • Check your current usage in the toolbar • Upgrade your plan for higher monthly quotas if needed You can generate speech from text of any length as long as you have sufficient hours remaining in your monthly quota.

How do I quickly select a voice?

Each voice in the Voice Library displays key information at a glance: Voice Information Display: • Actor name abbreviation (e.g., JD for John Doe) • Gender indicator: Pink for female, Blue for male • Language code (e.g., EN for English) • Accent indicator with national flag (e.g., 🇺🇸 for American English) • Full actor name • Voice type: Solid circle for built-in voices, Dotted circle for cloned voices Filtering Options: Use the filter controls to narrow down your search: • Filter by voice type (built-in or cloned) • Filter by accent/language • Filter by gender These visual indicators and filters help you find the perfect voice quickly.

How do I assign voices to speakers?

Follow these steps to assign voice actors to your speakers: 1. In the Voice Assignment section (bottom right), click on a speaker name 2. The selected speaker will be highlighted 3. In the Voice Library, click on a voice actor icon to select it 4. The selected voice will show a border with two buttons at the bottom: • Play button: Preview the voice actor's demo sound • Checkmark button: Assign this voice to the selected speaker 5. After assignment, you'll see the pairing displayed (e.g., "Alice: Amelia ✅") 6. Repeat for all speakers until no "No voice" labels remain Note: One-to-one mapping: • Each speaker can have only one voice • Each voice can be assigned to only one speaker This ensures clear voice distinction in your conversation.

How do I create a multi-speaker conversation?

To create conversations with multiple speakers: 1. Format your text with speaker labels: [John] Hello there! 2. VoClone will automatically detect speaker patterns 3. In the Voice Library, assign different voices to each speaker Example: [Alice] How are you today? [Bob] I'm doing great, thanks for asking! [Alice] That's wonderful to hear.

Is there a limit on the number of speakers in a conversation?

The number of speakers you can have is limited by the total number of available voices: Maximum speakers = Built-in voices + Cloned voices For example: • If you have 10 built-in voices and 5 cloned voices, you can have up to 15 speakers in a conversation • Each speaker requires a unique voice assignment • Create more cloned voices to increase your speaker capacity Your subscription plan determines how many cloned voices you can create (see Voice Cloning limits).

Can the same voice be used for multiple speakers?

No, each voice can only be assigned to one speaker per project. This ensures clear distinction between different characters in your conversation.

Can I edit text in my TTS project?

Yes! VoClone provides a simple yet powerful text editor for your TTS scripts: Features: • Edit individual sentences directly in the editor • Delete current sentence • Add or remove speaker labels with tags (e.g., [Alice] Hello!) • Search/Replace text • Adjust font size • Export text • View word frequency table Note: • Press Enter after editing a sentence or changing speaker labels to apply changes • Changes are automatically saved to the project once applied • All edits are immediately reflected when you generate speech.

What do the icons at the top of the editor mean?

The editor displays real-time statistics with visual icons: • Character count: Total number of characters in your script • Word count: Total number of words • Sentence count: Number of sentences detected • Speaker count: Number of unique speakers in your conversation These statistics update automatically as you edit your text, helping you track the length and complexity of your script.

Why do I hear unrelated speech or random noise in my generated audio?

This is called "hallucination," a common phenomenon in generative AI: What causes it: • AI starts with random initialization each time • Sometimes produces unintended sounds or speech fragments • Different results occur with each generation How to fix it: • Regenerate the audio: Click Generate Speech again - each generation produces different results, so this often resolves the issue • Edit the audio: Remove unwanted noise using the audio editing tools (see How do I edit audio?)

How do I edit generated audio?

To edit problematic portions of your generated audio: 1. In the audio player view, press and drag to select the portion you want to edit 2. The selected portion will highlight in red 3. To change selection, simply select again 4. Right-click on the selection and choose: • Play Selection: Preview the selected part to confirm it's the portion you want to edit • Silence Selection: Mute the selected portion (audio length stays the same) • Remove Selection: Cut the portion out entirely (audio length becomes shorter) This tool is especially useful for removing AI hallucination noise or unwanted segments from your generated speech.

How do I export generated audio?

To export your generated speech audio: 1. Locate the audio player view at the bottom of the TTS panel 2. Click the Export button in the audio player 3. Choose a location to save the audio file 4. Click Save to complete the export

What is the format of the generated audio?

VoClone generates audio in M4A format with the following specifications: • Format: M4A (MPEG-4 Audio) • Sample Rate: 44.1 kHz • Quality: High-quality AAC encoding

What are the best practices for TTS?

Follow these tips to make the most of VoClone TTS: Start Small: • Begin with short projects to familiarize yourself with the tool • Once comfortable, move on to longer projects Master Audio Editing: • Practice the selection and silence/remove features to fine-tune your audio • This skill saves time and quota by fixing issues instead of regenerating Regenerate Strategically: • If only part of the generated speech is unsatisfactory, regenerate just that section • No need to regenerate the entire sequence when most of it sounds good • This approach conserves your monthly usage quota General Tips: • Use well-formatted text with proper speaker labels • Preview voices before assigning them to speakers • Export your audio when satisfied with the results These practices help you create high-quality speech efficiently while managing your monthly quota.

👤 Voice Cloning (VC)

How do I create a cloned voice?

Getting Started: 1. Switch to the Voice Clone tab 2. Click "Clone a Voice" button at the bottom of the project list 3. A new project opens with a random script to read • You'll see one of 10 fun, easy-to-read sentences (5-8 seconds each) • Click the refresh button to try a different script Recording Your Voice: 4. Click "Start Recording" and read the script clearly • Speak naturally at a steady pace • Use proper volume level • Record in a quiet place with no background noise 5. Click "Stop Recording" when finished • The recording appears in the audio player • Transcription starts automatically Review & Clone: 6. Review the matching score • Green checkmark (✓): Score ≥ 50% - good to proceed • "Try recording again": Score < 50% - record again for better results 7. Click "Clone Voice" when satisfied with the score • Cloning takes just 2-3 seconds • A demo voice plays automatically to preview the result Configure & Save: 8. Fill in the voice attributes (fields become editable after cloning) • Voice Name: Give your voice a unique name • Language: Select the language (currently English only) • Accent: Choose the accent (American, British, etc.) • Sex: Select Male or Female 9. Click "Add to Library" to save 10. Your cloned voice is now ready to use in TTS projects! Tips for Best Results: • Read the script naturally without rushing • Maintain consistent volume throughout • Avoid pausing for too long between words • If the matching score is low, try recording again • A higher matching score generally produces better voice quality Note: Voice cloning and transcription for the cloning process are completely free - they don't count toward your STT/TTS usage limits. Only the voice slot count is limited by your subscription plan.

How many cloned voices can I create?

Voice slot limits depend on your subscription plan: • Free Plan: 1 cloned voice • Plus Plan: 10 cloned voices • Pro Plan: 80 cloned voices • Max Plan: 400 cloned voices Note: • Built-in voices don't count toward your slot limit • Only custom cloned voices use your voice slots Upgrade your plan to unlock more voice slots.

How do I delete a voice?

To delete a cloned voice and free up a voice slot: 1. In the Voice Library list, find the cloned voice you want to delete 2. Right-click on the voice 3. Select "Delete" from the context menu Important Restrictions: • You can only delete cloned voices (built-in voices cannot be deleted) • If the voice is currently being used in any project, the delete option will not be available • This prevents breaking existing projects that rely on that voice To delete a voice that's in use: • Assign another voice to the speaker in those projects, or • Delete the projects using that voice • Then try deleting the voice again

How do I rename a voice?

To rename a cloned voice: 1. In the Voice Library list, right-click on the voice you want to rename 2. Select "Change Name" from the context menu 3. Enter the new name and confirm Note: The voice name will be updated across the entire system, including in all existing projects that use this voice.

Does VoClone support emotion in TTS?

VoClone does not directly support emotion control in TTS. However, you can achieve emotional variety manually through a clever workaround: How to Create Emotional Voices: 1. Clone the same voice actor expressing different emotions: • Record or find audio samples of the voice with various emotions (happy, angry, terrified, sad, excited, etc.) • Clone each emotional variation as a separate voice 2. Use descriptive naming conventions: • JohnHappy, JohnAngry, JohnTerrified, JohnSad, JohnExcited, etc. 3. Assign emotions in your TTS script: • Use speaker labels to switch between emotional variations • Example script: [JohnHappy] What a beautiful day! [JohnAngry] I can't believe this happened! [JohnTerrified] Watch out! Benefits: • Full control over emotional expression in dialogue • Natural-sounding emotional variation • Create complex, emotionally rich conversations Tip: For best results, ensure the original audio samples clearly express the intended emotion.

💳 Subscription & Usage

What are the benefits of paid plans?

Paid plans offer significant advantages over the free plan: Higher Quotas: • Higher usage hour quotas for both STT and TTS • More cloned voice slots for creating custom voices Premium Features: • Advanced tools and capabilities • Batch STT processing for transcribing multiple files at once (Pro and Max plans) Exceptional Value: • Pricing is only a fraction of competitors on the market • Even the most affordable paid plan (Plus monthly) costs: - 10 cents per hour for STT - 50 cents per hour for TTS • With annual Max plan, costs drop to: - As low as 1.5 cents per hour for STT - As low as 7.5 cents per hour for TTS Multi-Device Support: • Plans are shared across multiple Macs with the same account • Usage syncs automatically online Upgrade to access higher quotas, premium features, and exceptional pricing.

How is my usage tracked?

Usage tracking: • STT and TTS hours are tracked separately • Usage accumulates throughout the month • Limits reset monthly based on your subscription start date • Usage syncs across all your devices online • View your current usage in the toolbar or Usage panel

What happens when I reach my usage limit?

When you reach your monthly limit: • You'll see a notification when approaching the limit • STT/TTS features will be disabled until reset or upgrade • You can upgrade to a higher plan for more hours • Your existing projects and data remain accessible • Usage resets automatically on your renewal date

When does my monthly usage reset?

Usage resets monthly based on your subscription: • Free Plan: Resets monthly (one month from when you first signed in) • Paid Plans: Resets on your subscription renewal date You can view your exact next reset date in the toolbar or Usage panel.

⚙️ Technical & Troubleshooting

How much RAM does VoClone use?

VoClone uses approximately 4GB of RAM during operation. Ensure your Mac has sufficient available memory for smooth performance.

What are the indicators in the bottom right corner?

The progress indicators show real-time status for STT (Speech-to-Text) and TTS (Text-to-Speech) services. • Top row: STT progress • Bottom row: TTS progress • Each row has two circles: - Left: Model loading - Right: Processing (transcription/generation) Circle States: • Empty: Service is idle • Partial green: Job in progress • Full green + text: Job completed with performance metrics Performance Metrics (displayed after completion): The text shows two measurements: • Processing time: How long the job took to complete • Speed factor: How much faster than real-time (e.g., "52x" means 52 times faster than real-time) For example, "0.7m 81x" means: • The job took 0.7 minutes to complete • Processing speed was 81 times faster than real-time See "How fast does VoClone run?" for detailed performance benchmarks on different Mac models. Note: Model loading happens once per session and takes a few seconds.

What happens if I reinstall the app?

Reinstalling VoClone is safe and straightforward: • Projects are stored locally on your Mac • Reinstalling the app does NOT delete your projects • Your projects remain safe in their storage location • Usage data syncs online (if signed in with Apple ID) • Subscription status syncs automatically Your work is preserved even if you need to reinstall the application.

How do I report a bug?

To report a bug you've encountered: 1. Click the bug report button in the Customer Support panel 2. Describe the issue in the text field provided 3. Click "Add" to include your description in the report 4. Review the message body - everything is fully transparent to you: • The latest log entries are included to help us debug the issue • You can review and edit any content before sending • You may delete any information you prefer not to share, but the logs are very helpful for debugging 5. Copy the email recipient, subject, and message body: • Click the copy button next to "To: voclonesupport@agileedgeai.com" • Click the copy button next to the subject line • Click the copy button next to the message body 6. Open your email application 7. Paste the recipient, subject, and message body into a new email 8. Send the email The bug report automatically includes: • Your description of the issue • System information • Recent log entries to help diagnose the problem Your feedback helps us improve VoClone for everyone!