1. Overview

ConvertAudioToText provides a RESTful API for programmatic access to our transcription and media processing services. This disclosure outlines how the API works, what third-party services are involved, and how your data is handled during API interactions.

2. Third-Party Services

Our API relies on the following third-party services to deliver functionality:

Speech-to-text engine: Provides the underlying speech-to-text AI. Audio data is sent to this engine for processing and is not retained after transcription completes.
Encrypted object storage: Used for temporary and permanent file storage. All stored files are encrypted at rest.
Dodo Payments: Handles all payment processing and subscription management as our Merchant of Record.

3. Data Processing

When you submit a file or URL through the API, your media is processed through an automated pipeline. Audio is extracted from video files, then transcribed by our speech-to-text engine. The full response, including word-level timestamps, speaker labels, and confidence scores, is stored in our database and made available through the API.

4. Rate Limits

API usage is subject to the following rate limits:

Global: 100 requests per minute per IP address
Transcription: 10 jobs per minute per authenticated user
File uploads: Maximum file size depends on your subscription plan

5. Authentication

API access requires authentication via JWT tokens or API keys. JWT tokens are obtained through the login endpoint and expire after a configurable period. API keys are long-lived tokens available to users on paid plans, with granular scope controls and per-key usage tracking.

6. Data Retention

Transcription results are retained as long as your account is active. You can delete individual transcriptions or your entire account at any time. For unauthenticated tool usage, all data is automatically deleted within hours of processing.

7. Contact

For questions about API usage or this disclosure, contact us at support@convertaudiototext.com

API Disclosure