Lead Generation

Lead Generation Server Documentation

MCP SDK Crawl4AI Python

Table of Contents

  1. Overview
  2. Features
  3. Architecture
  4. Prerequisites
  5. Installation
  6. Configuration
  7. Running the Server
  8. API Documentation
  9. Examples
  10. Advanced Configuration
  11. Troubleshooting
  12. Contributing
  13. License
  14. Roadmap
  15. Support

Overview

A production-grade lead generation system built on:

  • MCP Python SDK for protocol-compliant AI services
  • Crawl4AI for intelligent web crawling
  • AsyncIO for high-concurrency operations

Implements a full lead lifecycle from discovery to enrichment with:

  • UUID-based lead tracking
  • Multi-source data aggregation
  • Smart caching strategies
  • Enterprise-grade error handling

Features

FeatureTech StackThroughput
Lead GenerationGoogle CSE, Crawl4AI120 req/min
Data EnrichmentHunter.io, Clearbit [Hubspot Breeze]80 req/min
LinkedIn ScrapingPlaywright, Stealth Mode40 req/min
Cachingaiocache, Redis10K ops/sec
MonitoringPrometheus, Custom MetricsReal-time

Architecture

graph TD
    A[Client] --> B[MCP Server]
    B --> C{Lead Manager}
    C --> D[Google CSE]
    C --> E[Crawl4AI]
    C --> F[Hunter.io]
    C --> G[Clearbit]
    C --> H[LinkedIn Scraper]
    C --> I[(Redis Cache)]
    C --> J[Lead Store]

Prerequisites

  • Python 3.10+
  • API Keys:
    export HUNTER_API_KEY="your_key"
    export CLEARBIT_API_KEY="your_key"
    export GOOGLE_CSE_ID="your_id"
    export GOOGLE_API_KEY="your_key"
    
  • LinkedIn Session Cookie (for scraping)
  • 4GB+ RAM (8GB recommended for heavy scraping)

Installation

Production Setup

# Create virtual environment
python -m venv .venv && source .venv/bin/activate

# Install with production dependencies
pip install mcp crawl4ai[all] aiocache aiohttp uvloop

# Set up browser dependencies
python -m playwright install chromium

Docker Deployment

FROM python:3.10-slim

RUN apt-get update && apt-get install -y \
    gcc \
    libpython3-dev \
    chromium \
    && rm -rf /var/lib/apt/lists/*

COPY . /app
WORKDIR /app

RUN pip install --no-cache-dir -r requirements.txt
CMD ["python", "-m", "mcp", "run", "lead_server.py"]

Configuration

config.yaml

services:
  hunter:
    api_key: ${HUNTER_API_KEY}
    rate_limit: 50/60s
    
  clearbit:
    api_key: ${CLEARBIT_API_KEY}
    cache_ttl: 86400

scraping:
  stealth_mode: true
  headless: true
  timeout: 30
  max_retries: 3

cache:
  backend: redis://localhost:6379/0
  default_ttl: 3600

Running the Server

Development Mode

mcp dev lead_server.py --reload --port 8080

Production

gunicorn -w 4 -k uvicorn.workers.UvicornWorker lead_server:app

Docker

docker build -t lead-server .
docker run -p 8080:8080 -e HUNTER_API_KEY=your_key lead-server

API Documentation

1. Generate Lead

POST /tools/lead_generation
Content-Type: application/json

{
  "search_terms": "OpenAI"
}

Response:

{
  "lead_id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "pending",
  "estimated_time": 15
}

2. Enrich Lead

POST /tools/data_enrichment
Content-Type: application/json

{
  "lead_id": "550e8400-e29b-41d4-a716-446655440000"
}

3. Monitor Leads

GET /tools/lead_maintenance

Examples

Python Client

from mcp.client import Client

async with Client() as client:
    # Generate lead
    lead = await client.call_tool(
        "lead_generation",
        {"search_terms": "Anthropic"}
    )
    
    # Enrich with all services
    enriched = await client.call_tool(
        "data_enrichment",
        {"lead_id": lead['lead_id']}
    )
    
    # Get full lead data
    status = await client.call_tool(
        "lead_status",
        {"lead_id": lead['lead_id']}
    )

cURL

# Generate lead
curl -X POST http://localhost:8080/tools/lead_generation \
  -H "Content-Type: application/json" \
  -d '{"search_terms": "Cohere AI"}'

Advanced Configuration

Caching Strategies

from aiocache import Cache

# Configure Redis cluster
Cache.from_url(
    "redis://cluster-node1:6379/0",
    timeout=10,
    retry=True,
    retry_timeout=2
)

Rate Limiting

from mcp.server.middleware import RateLimiter

mcp.add_middleware(
    RateLimiter(
        rules={
            "lead_generation": "100/1m",
            "data_enrichment": "50/1m"
        }
    )
)

Troubleshooting

ErrorSolution
403 Forbidden from GoogleRotate IPs or use official CSE API
429 Too Many RequestsImplement exponential backoff
Playwright TimeoutIncrease scraping.timeout in config
Cache MissVerify Redis connection and TTL settings

Contributing

  1. Fork the repository
  2. Create feature branch: git checkout -b feature/new-enrichment
  3. Commit changes: git commit -am 'Add Clearbit alternative'
  4. Push to branch: git push origin feature/new-enrichment
  5. Submit pull request

License

Apache 2.0 - See LICENSE for details.


Roadmap

  • Q2 2025: AI-powered lead scoring
  • Q3 2025: Distributed crawling cluster support

Support

For enterprise support and custom integrations:
📧 Email: hi@kobotai.co
🐦 Twitter: @KobotAIco


# Run benchmark tests
pytest tests/ --benchmark-json=results.json

Benchmark Results