Skip to content

Providers

Gobbler uses a provider abstraction system that allows pluggable backends for content conversion. This enables swapping between local and cloud-based services, adding new backends without changing application code, and graceful fallback between multiple providers.

Overview

Providers are organized by category (the type of content they process):

Category Purpose Available Providers
transcription Audio/video to text whisper-local, openai-whisper
document PDF, DOCX, PPTX, XLSX to markdown docling
webpage Web pages to markdown crawl4ai

Each category has a default provider that can be overridden via configuration or CLI flags.

YouTube Transcripts Use a Separate Provider System

YouTube transcript providers (youtube-transcript-api, transcriptapi, auto) are not part of this generic provider registry. They have their own provider system optimized for handling IP blocking and fallback logic.

See YouTube Transcription for details on YouTube providers.

Available Providers

Transcription Providers

Provider Description Requirements
whisper-local Local transcription using faster-whisper with CoreML acceleration on M-series Macs ffmpeg
openai-whisper Cloud-based transcription using OpenAI Whisper API OPENAI_API_KEY

whisper-local

Uses the faster-whisper library for fast, local transcription. Automatically uses CoreML acceleration on Apple Silicon Macs.

Features: - Fully offline - no API keys required - CoreML/Metal acceleration on M-series Macs - Multiple model sizes (tiny, base, small, medium, large) - Language auto-detection

Model sizes:

Model Speed Accuracy Memory Use Case
tiny ~32x realtime Lower ~1GB Quick drafts
base ~16x realtime Moderate ~1GB General use
small ~6x realtime Good ~2GB Default
medium ~2x realtime Better ~5GB Important content
large ~1x realtime Best ~10GB Critical accuracy

openai-whisper

Uses the OpenAI Whisper API for cloud-based transcription with high accuracy.

Features: - High-quality cloud transcription - Automatic language detection - Word-level timestamps - No local hardware requirements - Automatic audio extraction and compression for large files

Configuration:

Option Type Default Description
api_key string $OPENAI_API_KEY OpenAI API key
model string whisper-1 OpenAI Whisper model
timeout float 120.0 Request timeout in seconds

File size handling: - OpenAI API has a 25MB file limit - Files exceeding 25MB are automatically compressed via ffmpeg - Audio is extracted to mono 16kHz MP3 at 64kbps for optimal compression

Document Providers

Provider Description Requirements
docling Docling Docker service for document conversion with OCR Docling running

docling

Uses the Docling service running in Docker for enterprise-grade document conversion.

Features: - PDF, DOCX, PPTX, XLSX support - Optional OCR for scanned documents - Table extraction with structure preservation - Markdown output with formatting

Supported formats:

Format Extension Notes
PDF .pdf With optional OCR for scanned pages
Word .docx Microsoft Word documents
PowerPoint .pptx Presentations with slide structure
Excel .xlsx Spreadsheets as markdown tables

Webpage Providers

Provider Description Requirements
crawl4ai Crawl4AI Docker service with JavaScript rendering Crawl4AI running

crawl4ai

Uses the Crawl4AI service running in Docker for web scraping with full JavaScript support.

Features: - JavaScript rendering via Playwright - Clean markdown extraction - CSS selector support - Link and image extraction

Configuration

Configure providers in ~/.config/gobbler/config.yaml:

# Provider configuration
providers:
  transcription:
    default: whisper-local
    whisper-local:
      model: small
      device: auto

  document:
    default: docling
    docling:
      service_url: http://localhost:5001
      timeout: 120

  webpage:
    default: crawl4ai
    crawl4ai:
      service_url: http://localhost:11235
      api_token: gobbler-local-token

Provider-Specific Configuration

whisper-local

providers:
  transcription:
    default: whisper-local
    whisper-local:
      model: small          # tiny, base, small, medium, large
      device: auto          # auto, cpu, cuda, mps

openai-whisper

providers:
  transcription:
    default: openai-whisper
    openai-whisper:
      api_key: ${OPENAI_API_KEY}  # Or set directly
      model: whisper-1
      timeout: 120

docling

providers:
  document:
    default: docling
    docling:
      service_url: http://localhost:5001
      timeout: 120          # Request timeout in seconds

crawl4ai

providers:
  webpage:
    default: crawl4ai
    crawl4ai:
      service_url: http://localhost:11235
      api_token: gobbler-local-token

Environment Variables

Provider configuration can also be set via environment variables:

Variable Description Default
GOBBLER_WHISPER_MODEL Default Whisper model small
GOBBLER_DOCLING_URL Docling service URL http://localhost:5001
GOBBLER_CRAWL4AI_URL Crawl4AI service URL http://localhost:11235
OPENAI_API_KEY OpenAI API key (required for openai-whisper provider) -

CLI Usage

List Providers

# List all providers
gobbler providers list

# Filter by category
gobbler providers list --category transcription
gobbler providers list -c document

# JSON output
gobbler providers list --format json

Example output:

All Providers
┏━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Category      ┃ Name           ┃ Description                           ┃
┡━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ document      │ docling        │ Docling document conversion provider  │
│ transcription │ whisper-local  │ Local transcription using faster-wh...│
│ webpage       │ crawl4ai       │ Crawl4AI web page conversion provider │
└───────────────┴────────────────┴───────────────────────────────────────┘

Get Provider Info

# Get detailed provider information
gobbler providers info transcription whisper-local
gobbler providers info document docling

# JSON output
gobbler providers info transcription whisper-local --format json

Example output:

Provider: whisper-local

  Category:    transcription
  Name:        whisper-local
  Class:       WhisperLocalProvider
  Module:      gobbler_core.providers.transcription.whisper

Description:

  Local transcription provider using faster-whisper.

  Uses the faster-whisper library with automatic CoreML acceleration
  on M-series Macs. Models are cached globally to avoid reloading.

Using --provider Flag

Override the default provider for any conversion command:

# Audio transcription with local Whisper
gobbler audio recording.mp3 --provider whisper-local

# Audio transcription with OpenAI API (requires OPENAI_API_KEY)
gobbler audio recording.mp3 --provider openai-whisper

# Document conversion (provider override)
gobbler document report.pdf --provider docling

# Webpage conversion (provider override)
gobbler webpage "https://example.com" --provider crawl4ai

Provider availability

The --provider flag is currently in development. Provider selection is determined by the configuration file default settings.

Provider Comparison

Local vs API-Based Providers

Aspect Local Providers API-Based Providers
Privacy Data stays local Data sent to external service
Cost Hardware cost only Per-request pricing
Speed Depends on hardware Generally consistent
Reliability No network dependency Requires internet
Setup Install dependencies API key only
Model updates Manual updates Automatic

When to Use Each

Use local providers (whisper-local, docling, crawl4ai) when: - Data privacy is critical - Processing large volumes (cost savings) - Working offline - Consistent latency is needed

Use API-based providers when: - No local hardware for processing - Occasional use (simpler setup) - Need latest model improvements automatically - Don't want to manage Docker services

Creating Custom Providers

Gobbler's provider system is extensible. To create a custom provider:

1. Implement the Base Class

from gobbler_core.providers.transcription.base import (
    TranscriptionProvider,
    TranscriptionResult,
    TranscriptionSegment,
)

class MyTranscriptionProvider(TranscriptionProvider):
    """Custom transcription provider using MyService."""

    @property
    def name(self) -> str:
        return "my-provider"

    async def transcribe(
        self,
        audio_path: Path,
        language: str = "auto",
        **options,
    ) -> TranscriptionResult:
        # Your implementation here
        return TranscriptionResult(
            text="transcribed text",
            segments=[],
            language="en",
            duration=120.0,
        )

    def supports_format(self, file_extension: str) -> bool:
        return file_extension.lower() in {".mp3", ".wav"}

2. Register with the Registry

from gobbler_core.providers.registry import ProviderRegistry

# At module load time
ProviderRegistry.register("transcription", "my-provider", MyTranscriptionProvider)

3. Add Configuration

providers:
  transcription:
    default: my-provider
    my-provider:
      api_key: ${MY_API_KEY}
      endpoint: https://api.myservice.com

For detailed implementation guidance, see Contributing.

Architecture

The provider abstraction uses a registry pattern:

┌─────────────────────────────────────────────────────────────┐
│                     ProviderRegistry                         │
├─────────────────────────────────────────────────────────────┤
│  register(category, name, provider_class)                   │
│  create(category, name, **kwargs) -> Provider               │
│  list_providers(category) -> list[str]                      │
│  get_provider_info(category, name) -> dict                  │
└─────────────────────────────────────────────────────────────┘
        ┌─────────────────────┼─────────────────────┐
        ▼                     ▼                     ▼
┌────────────────┐    ┌───────────────┐     ┌───────────────┐
│  transcription │    │   document    │     │   webpage     │
├────────────────┤    ├───────────────┤     ├───────────────┤
│ whisper-local  │    │   docling     │     │   crawl4ai    │
│ openai-whisper │    │               │     │               │
└────────────────┘    └───────────────┘     └───────────────┘

Base Classes

Each category has an abstract base class defining the interface:

Category Base Class Key Method
transcription TranscriptionProvider transcribe(audio_path, language)
document DocumentProvider convert(file_path, ocr)
webpage WebPageProvider fetch(url, timeout)

Result Types

Each category returns a standardized result type:

  • TranscriptionResult - text, segments, language, duration
  • DocumentResult - markdown, pages, metadata
  • WebPageResult - markdown, title, url, links

See Architecture for implementation details.