Providers¶
Gobbler uses a provider abstraction system that allows pluggable backends for content conversion. This enables swapping between local and cloud-based services, adding new backends without changing application code, and graceful fallback between multiple providers.
Overview¶
Providers are organized by category (the type of content they process):
| Category | Purpose | Available Providers |
|---|---|---|
transcription | Audio/video to text | whisper-local, openai-whisper |
document | PDF, DOCX, PPTX, XLSX to markdown | docling |
webpage | Web pages to markdown | crawl4ai |
Each category has a default provider that can be overridden via configuration or CLI flags.
YouTube Transcripts Use a Separate Provider System
YouTube transcript providers (youtube-transcript-api, transcriptapi, auto) are not part of this generic provider registry. They have their own provider system optimized for handling IP blocking and fallback logic.
See YouTube Transcription for details on YouTube providers.
Available Providers¶
Transcription Providers¶
| Provider | Description | Requirements |
|---|---|---|
whisper-local | Local transcription using faster-whisper with CoreML acceleration on M-series Macs | ffmpeg |
openai-whisper | Cloud-based transcription using OpenAI Whisper API | OPENAI_API_KEY |
whisper-local¶
Uses the faster-whisper library for fast, local transcription. Automatically uses CoreML acceleration on Apple Silicon Macs.
Features: - Fully offline - no API keys required - CoreML/Metal acceleration on M-series Macs - Multiple model sizes (tiny, base, small, medium, large) - Language auto-detection
Model sizes:
| Model | Speed | Accuracy | Memory | Use Case |
|---|---|---|---|---|
| tiny | ~32x realtime | Lower | ~1GB | Quick drafts |
| base | ~16x realtime | Moderate | ~1GB | General use |
| small | ~6x realtime | Good | ~2GB | Default |
| medium | ~2x realtime | Better | ~5GB | Important content |
| large | ~1x realtime | Best | ~10GB | Critical accuracy |
openai-whisper¶
Uses the OpenAI Whisper API for cloud-based transcription with high accuracy.
Features: - High-quality cloud transcription - Automatic language detection - Word-level timestamps - No local hardware requirements - Automatic audio extraction and compression for large files
Configuration:
| Option | Type | Default | Description |
|---|---|---|---|
api_key | string | $OPENAI_API_KEY | OpenAI API key |
model | string | whisper-1 | OpenAI Whisper model |
timeout | float | 120.0 | Request timeout in seconds |
File size handling: - OpenAI API has a 25MB file limit - Files exceeding 25MB are automatically compressed via ffmpeg - Audio is extracted to mono 16kHz MP3 at 64kbps for optimal compression
Document Providers¶
| Provider | Description | Requirements |
|---|---|---|
docling | Docling Docker service for document conversion with OCR | Docling running |
docling¶
Uses the Docling service running in Docker for enterprise-grade document conversion.
Features: - PDF, DOCX, PPTX, XLSX support - Optional OCR for scanned documents - Table extraction with structure preservation - Markdown output with formatting
Supported formats:
| Format | Extension | Notes |
|---|---|---|
.pdf | With optional OCR for scanned pages | |
| Word | .docx | Microsoft Word documents |
| PowerPoint | .pptx | Presentations with slide structure |
| Excel | .xlsx | Spreadsheets as markdown tables |
Webpage Providers¶
| Provider | Description | Requirements |
|---|---|---|
crawl4ai | Crawl4AI Docker service with JavaScript rendering | Crawl4AI running |
crawl4ai¶
Uses the Crawl4AI service running in Docker for web scraping with full JavaScript support.
Features: - JavaScript rendering via Playwright - Clean markdown extraction - CSS selector support - Link and image extraction
Configuration¶
Configure providers in ~/.config/gobbler/config.yaml:
# Provider configuration
providers:
transcription:
default: whisper-local
whisper-local:
model: small
device: auto
document:
default: docling
docling:
service_url: http://localhost:5001
timeout: 120
webpage:
default: crawl4ai
crawl4ai:
service_url: http://localhost:11235
api_token: gobbler-local-token
Provider-Specific Configuration¶
whisper-local¶
providers:
transcription:
default: whisper-local
whisper-local:
model: small # tiny, base, small, medium, large
device: auto # auto, cpu, cuda, mps
openai-whisper¶
providers:
transcription:
default: openai-whisper
openai-whisper:
api_key: ${OPENAI_API_KEY} # Or set directly
model: whisper-1
timeout: 120
docling¶
providers:
document:
default: docling
docling:
service_url: http://localhost:5001
timeout: 120 # Request timeout in seconds
crawl4ai¶
providers:
webpage:
default: crawl4ai
crawl4ai:
service_url: http://localhost:11235
api_token: gobbler-local-token
Environment Variables¶
Provider configuration can also be set via environment variables:
| Variable | Description | Default |
|---|---|---|
GOBBLER_WHISPER_MODEL | Default Whisper model | small |
GOBBLER_DOCLING_URL | Docling service URL | http://localhost:5001 |
GOBBLER_CRAWL4AI_URL | Crawl4AI service URL | http://localhost:11235 |
OPENAI_API_KEY | OpenAI API key (required for openai-whisper provider) | - |
CLI Usage¶
List Providers¶
# List all providers
gobbler providers list
# Filter by category
gobbler providers list --category transcription
gobbler providers list -c document
# JSON output
gobbler providers list --format json
Example output:
All Providers
┏━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Category ┃ Name ┃ Description ┃
┡━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ document │ docling │ Docling document conversion provider │
│ transcription │ whisper-local │ Local transcription using faster-wh...│
│ webpage │ crawl4ai │ Crawl4AI web page conversion provider │
└───────────────┴────────────────┴───────────────────────────────────────┘
Get Provider Info¶
# Get detailed provider information
gobbler providers info transcription whisper-local
gobbler providers info document docling
# JSON output
gobbler providers info transcription whisper-local --format json
Example output:
Provider: whisper-local
Category: transcription
Name: whisper-local
Class: WhisperLocalProvider
Module: gobbler_core.providers.transcription.whisper
Description:
Local transcription provider using faster-whisper.
Uses the faster-whisper library with automatic CoreML acceleration
on M-series Macs. Models are cached globally to avoid reloading.
Using --provider Flag¶
Override the default provider for any conversion command:
# Audio transcription with local Whisper
gobbler audio recording.mp3 --provider whisper-local
# Audio transcription with OpenAI API (requires OPENAI_API_KEY)
gobbler audio recording.mp3 --provider openai-whisper
# Document conversion (provider override)
gobbler document report.pdf --provider docling
# Webpage conversion (provider override)
gobbler webpage "https://example.com" --provider crawl4ai
Provider availability
The --provider flag is currently in development. Provider selection is determined by the configuration file default settings.
Provider Comparison¶
Local vs API-Based Providers¶
| Aspect | Local Providers | API-Based Providers |
|---|---|---|
| Privacy | Data stays local | Data sent to external service |
| Cost | Hardware cost only | Per-request pricing |
| Speed | Depends on hardware | Generally consistent |
| Reliability | No network dependency | Requires internet |
| Setup | Install dependencies | API key only |
| Model updates | Manual updates | Automatic |
When to Use Each¶
Use local providers (whisper-local, docling, crawl4ai) when: - Data privacy is critical - Processing large volumes (cost savings) - Working offline - Consistent latency is needed
Use API-based providers when: - No local hardware for processing - Occasional use (simpler setup) - Need latest model improvements automatically - Don't want to manage Docker services
Creating Custom Providers¶
Gobbler's provider system is extensible. To create a custom provider:
1. Implement the Base Class¶
from gobbler_core.providers.transcription.base import (
TranscriptionProvider,
TranscriptionResult,
TranscriptionSegment,
)
class MyTranscriptionProvider(TranscriptionProvider):
"""Custom transcription provider using MyService."""
@property
def name(self) -> str:
return "my-provider"
async def transcribe(
self,
audio_path: Path,
language: str = "auto",
**options,
) -> TranscriptionResult:
# Your implementation here
return TranscriptionResult(
text="transcribed text",
segments=[],
language="en",
duration=120.0,
)
def supports_format(self, file_extension: str) -> bool:
return file_extension.lower() in {".mp3", ".wav"}
2. Register with the Registry¶
from gobbler_core.providers.registry import ProviderRegistry
# At module load time
ProviderRegistry.register("transcription", "my-provider", MyTranscriptionProvider)
3. Add Configuration¶
providers:
transcription:
default: my-provider
my-provider:
api_key: ${MY_API_KEY}
endpoint: https://api.myservice.com
For detailed implementation guidance, see Contributing.
Architecture¶
The provider abstraction uses a registry pattern:
┌─────────────────────────────────────────────────────────────┐
│ ProviderRegistry │
├─────────────────────────────────────────────────────────────┤
│ register(category, name, provider_class) │
│ create(category, name, **kwargs) -> Provider │
│ list_providers(category) -> list[str] │
│ get_provider_info(category, name) -> dict │
└─────────────────────────────────────────────────────────────┘
│
┌─────────────────────┼─────────────────────┐
▼ ▼ ▼
┌────────────────┐ ┌───────────────┐ ┌───────────────┐
│ transcription │ │ document │ │ webpage │
├────────────────┤ ├───────────────┤ ├───────────────┤
│ whisper-local │ │ docling │ │ crawl4ai │
│ openai-whisper │ │ │ │ │
└────────────────┘ └───────────────┘ └───────────────┘
Base Classes¶
Each category has an abstract base class defining the interface:
| Category | Base Class | Key Method |
|---|---|---|
transcription | TranscriptionProvider | transcribe(audio_path, language) |
document | DocumentProvider | convert(file_path, ocr) |
webpage | WebPageProvider | fetch(url, timeout) |
Result Types¶
Each category returns a standardized result type:
TranscriptionResult- text, segments, language, durationDocumentResult- markdown, pages, metadataWebPageResult- markdown, title, url, links
See Architecture for implementation details.