Gobbler Architecture¶
This document provides an in-depth explanation of Gobbler's architecture, design decisions, and integration strategies.
Table of Contents¶
Overview¶
Gobbler converts content to markdown through three interfaces that share the same backend:
- CLI - Direct command-line usage (
gobbler youtube URL) - MCP Server - For Claude Desktop/Code via Model Context Protocol
- Skills - Markdown instructions that teach Claude how to use the CLI
All three interfaces use the same provider layer, which connects to:
- YouTube APIs - Transcript extraction
- Crawl4AI (Docker) - Web scraping with JavaScript rendering
- Docling (Docker) - Document conversion with OCR
- faster-whisper - Local audio transcription
- Browser Relay - WebSocket to browser extension
Component Architecture¶
MCP Server Layer¶
The MCP server coordinates all operations and manages service communication:
Responsibilities: - MCP protocol handling (JSON-RPC over stdio) - Tool routing and parameter validation - Service health monitoring - Auto-queue decision logic - Configuration management
Implementation: - Built on FastMCP framework - Runs as stdio server for Claude Code/Desktop - No HTTP server (except relay for browser extension)
Provider Layer¶
The provider layer implements a pluggable backend abstraction that enables swapping between different implementations for the same functionality:
- Multiple backends: Different providers for the same category (e.g., local vs API-based transcription)
- Configuration-driven selection: Switch providers via config without code changes
- Graceful fallback: Automatic fallback between providers on failure
| Category | Provider | Description |
|---|---|---|
| Transcription | whisper-local | Local faster-whisper with CoreML acceleration |
| Document | docling | Docling Docker service for PDF, DOCX, PPTX, XLSX |
| Webpage | crawl4ai | Crawl4AI Docker service with JavaScript rendering |
| YouTube | Multiple | Auto-fallback between free and paid transcript APIs |
| Browser | Relay | WebSocket relay to browser extension |
For detailed provider documentation, configuration, and implementation patterns, see Providers.
Configuration Resolution:
from gobbler_mcp.config import get_config
config = get_config()
# Get default provider for category
default_name = config.providers["transcription"]["default"] # "whisper-local"
# Get provider-specific config
provider_config = config.providers["transcription"]["whisper-local"]
# {"model": "small", "device": "auto", "compute_type": "float16"}
# Create provider with config
provider = ProviderRegistry.create(
category="transcription",
name=default_name,
**provider_config
)
CLI Override:
Users can override the default provider via CLI flags:
# Use default from config
gobbler audio transcribe audio.mp3
# Override with specific provider
gobbler audio transcribe audio.mp3 --provider whisper-local
# Override with provider + options
gobbler audio transcribe audio.mp3 --provider whisper-local --model large-v3
Available Providers¶
Transcription Providers:
| Provider | Description |
|---|---|
whisper-local | Local faster-whisper with CoreML acceleration |
Document Providers:
| Provider | Description |
|---|---|
docling | Docling Docker service for PDF, DOCX, PPTX, XLSX |
Webpage Providers:
| Provider | Description |
|---|---|
crawl4ai | Crawl4AI Docker service with JavaScript rendering |
YouTube Provider: - Multiple transcript APIs (youtube-transcript-api, TranscriptAPI.com) - Auto-fallback strategy between providers - Video metadata extraction - Download capabilities via yt-dlp
Browser Provider: - WebSocket relay to browser extension - Tab group security model - JavaScript execution interface - Content extraction
For detailed provider documentation, see Providers.
Services Layer¶
Docker-based services provide specialized processing:
Crawl4AI (Port 11235): - JavaScript rendering via Playwright - Session persistence (cookies, localStorage) - Content extraction with selectors - Markdown conversion
Docling (Port 5001): - Document structure analysis - OCR via Tesseract - Table extraction - Markdown generation
Queue System¶
SQLite-based background processing for long-running operations:
Auto-Queue Logic: - Tasks estimated >1:45 automatically queue - Returns job_id and ETA to user - Real-time progress tracking - Retry with exponential backoff
Queues: - default - General background tasks - transcription - Audio/video transcription - download - YouTube video downloads
Worker: - Executes via same provider layer as MCP server - Updates progress in SQLite database
Design Decisions¶
Why Tab Group Security Model?¶
Problem: Browser automation could accidentally access sensitive tabs (banking, email, etc.)
Solution: Only tabs explicitly added to "Gobbler" group are accessible to Claude.
Benefits: - User maintains explicit control - Visual indicator (orange group color) - Prevents accidental data leakage - Easy to add/remove tabs
Integration Patterns¶
Provider Interface Pattern¶
Gobbler uses a registry-based provider pattern for extensible backend support. Each provider category has:
- Abstract base class defining the interface
- Registry for provider discovery and instantiation
- Concrete implementations for each backend
Transcription Provider Example¶
# Base class in gobbler_core/providers/transcription/base.py
class TranscriptionProvider(ABC):
"""Abstract base for transcription providers."""
@property
@abstractmethod
def name(self) -> str:
"""Provider identifier (e.g., 'whisper-local')."""
@abstractmethod
async def transcribe(
self,
audio_path: Path,
language: str = "auto",
**options,
) -> TranscriptionResult:
"""Transcribe audio to text."""
@abstractmethod
def supports_format(self, extension: str) -> bool:
"""Check if format is supported."""
Provider Registration¶
# In gobbler_core/providers/transcription/whisper.py
from gobbler_core.providers.registry import ProviderRegistry
class WhisperLocalProvider(TranscriptionProvider):
@property
def name(self) -> str:
return "whisper-local"
async def transcribe(self, audio_path, language="auto", **options):
# Implementation using faster-whisper
...
# Self-register at import time
ProviderRegistry.register("transcription", "whisper-local", WhisperLocalProvider)
Provider Usage¶
from gobbler_core.providers import ProviderRegistry
# Create from registry
provider = ProviderRegistry.create("transcription", "whisper-local", model="small")
# Use the provider
result = await provider.transcribe(Path("audio.mp3"), language="en")
print(result.text)
YouTube Provider (Legacy Pattern)¶
The YouTube provider uses a similar but separate pattern with auto-fallback:
class TranscriptProvider:
"""Abstract base for transcript providers"""
def fetch(self, video_id, language, include_timestamps):
...
class YouTubeTranscriptAPIProvider(TranscriptProvider):
"""Free API with IP blocking risk"""
...
class TranscriptAPIProvider(TranscriptProvider):
"""Paid API, no IP blocks"""
...
class AutoFallbackProvider(TranscriptProvider):
"""Try free → paid on failure"""
...
This pattern enables: - Multiple backends for same capability - Easy addition of new providers - Graceful fallback between providers - User choice of cost/reliability tradeoffs
For detailed provider documentation, see Providers.
Batch Processing Pattern¶
All batch operations follow this pattern:
- Validate input items and limits
- Check auto_queue threshold
- If queued: Return batch_id, start background processing
- If immediate: Process with concurrency control
- Track progress in shared state
- Generate summary report
Benefits: - Consistent UX across batch operations - Real-time progress tracking - Automatic resource management - Fail-fast validation
Health Check Pattern¶
All external services implement health checks:
class ServiceHealthChecker:
def check_crawl4ai() -> HealthStatus
def check_docling() -> HealthStatus
def check_all() -> Dict[str, HealthStatus]
Benefits: - Early failure detection - Clear error messages - Service status visibility - Automated monitoring
Frontmatter Pattern¶
All converters generate YAML frontmatter:
def generate_frontmatter(content_type, metadata):
"""Standardized frontmatter for all content types"""
return {
"source": url,
"type": content_type,
"converted_at": timestamp,
...metadata
}
Benefits: - Consistent metadata format - Easy parsing and filtering - Preserved provenance - Rich context for AI