Rate Limits

Understand your API quotas, monitor usage, and handle rate limit errors gracefully.

Lesan AI enforces rate limits to ensure fair usage and service stability. This page explains the different limits, how to monitor your usage, and how to handle 429 errors.

Rate Limit Overview

Limits are applied per API key and vary by endpoint type:

  • REST API requests — 60 requests per minute per key
  • Transcription jobs — 10 concurrent jobs per key
  • Batch processing — 100 files per batch request
  • WebSocket connections — 5 concurrent connections per key
  • File upload size — 500 MB per file

Rate Limit Headers

Every API response includes headers to help you track your usage:

text
X-RateLimit-Limit: 60          # Max requests per window
X-RateLimit-Remaining: 45      # Requests remaining in current window
X-RateLimit-Reset: 1709472000  # Unix timestamp when the window resets
Retry-After: 2                 # (Only on 429) Seconds to wait before retrying

Checking Your Usage

Use the GET /v1/usage endpoint to check your current usage and remaining quota:

curl https://asr.lesan.ai/v1/usage \
  -H "Authorization: Bearer YOUR_API_KEY"

Example response:

json
{
  "requests_used": 15,
  "requests_limit": 60,
  "requests_reset": "2024-03-15T12:01:00Z",
  "concurrent_jobs": 2,
  "concurrent_jobs_limit": 10,
  "daily_minutes_used": 45.2,
  "daily_minutes_limit": 500
}

Handling 429 Errors

When you exceed a rate limit, the API returns a 429 status code with a retry_after field. Use exponential backoff to retry:

import requests
import time


def request_with_backoff(method, url, max_retries=5, **kwargs):
    """Make an API request with automatic retry on rate limits."""
    for attempt in range(max_retries):
        response = method(url, **kwargs)


        if response.status_code != 429:
            return response


        # Use server-provided retry time, or exponential backoff
        error = response.json().get("error", {})
        wait = error.get("retry_after", min(2 ** attempt, 30))
        print(f"Rate limited. Retrying in {wait}s (attempt {attempt + 1})")
        time.sleep(wait)


    return response  # Return last response after all retries


# Usage
response = request_with_backoff(
    requests.post,
    "https://asr.lesan.ai/transcribe",
    headers={"Authorization": "Bearer YOUR_API_KEY"},
    json={"audio_url": "https://example.com/audio.mp3", "language": "am"}
)

WebSocket Rate Limits

WebSocket connections have separate limits:

  • Max concurrent connections — 5 per API key
  • Max session duration — 30 minutes per connection
  • Message rate — 100 messages per second per connection
  • Audio data rate — 1 MB/s per connection

Exceeding WebSocket limits will result in a close frame with code 4029. See the Streaming guide for details.

Best Practices

  • Monitor your usage — Check X-RateLimit-Remaining headers to proactively throttle requests
  • Use exponential backoff — Start at 1 second, double each retry, cap at 30 seconds
  • Batch when possible — Use the batch endpoint to reduce the number of API calls
  • Use async mode — For long audio files, use async mode to avoid tying up concurrent job slots
  • Cache results — Store transcription results to avoid re-processing the same files

See the Error Codes reference for all error types, or the Best Practices guide for production patterns.