Web Crawler

Web Crawler MCP Server Deployment Guide

Prerequisites

  • Node.js (v18+)
  • npm (v9+)

Installation

  1. Clone the repository:

    git clone https://github.com/jitsmaster/web-crawler-mcp.git
    cd web-crawler-mcp
    
  2. Install dependencies:

    npm install
    
  3. Build the project:

    npm run build
    

Configuration

Create a .env file with the following environment variables:

CRAWL_LINKS=false
MAX_DEPTH=3
REQUEST_DELAY=1000
TIMEOUT=5000
MAX_CONCURRENT=5

Running the Server

Start the MCP server:

npm start

MCP Configuration

Add the following to your MCP settings file:

{
  "mcpServers": {
    "web-crawler": {
      "command": "node",
      "args": ["/path/to/web-crawler/build/index.js"],
      "env": {
        "CRAWL_LINKS": "false",
        "MAX_DEPTH": "3",
        "REQUEST_DELAY": "1000",
        "TIMEOUT": "5000",
        "MAX_CONCURRENT": "5"
      }
    }
  }
}

Usage

The server provides a crawl tool that can be accessed through MCP. Example usage:

{
  "url": "https://example.com",
  "depth": 1
}

Configuration Options

Environment VariableDefaultDescription
CRAWL_LINKSfalseWhether to follow links
MAX_DEPTH3Maximum crawl depth
REQUEST_DELAY1000Delay between requests (ms)
TIMEOUT5000Request timeout (ms)
MAX_CONCURRENT5Maximum concurrent requests