Cannabis Strain Data Scraper
Model-Context-Protocol Servers
A collection of specialized MCP (Model Context Protocol) servers for different use cases.
1. Leafly Cannabis Strain Data Scraper
This MCP server implements a specialized scraper for collecting structured cannabis strain data from Leafly.com, following a standardized schema and methodology.
Installation
- Ensure you have Node.js 18+ installed
- Clone the repository
- Install dependencies:
cd firecrawl-mcp-server npm install
- Set up environment variables:
# Copy the example .env file cp .env.example .env # Edit the .env file and add your Firecrawl API key # You can obtain an API key from https://mendable.ai/firecrawl nano .env # or use any text editor
- Build the project:
npm run build
API Key Requirement
Important: This scraper requires a valid Firecrawl API key to function. If you try to run the scraper without a valid API key, you will receive a 401 Unauthorized error.
You can set the API key in one of two ways:
-
Environment Variable:
export FIRECRAWL_API_KEY=your_api_key_here
-
In a .env file:
FIRECRAWL_API_KEY=your_api_key_here
Features
- Scrapes 66 standardized data points for each cannabis strain from Leafly.com
- Follows a consistent methodology for data extraction and normalization
- Handles cases where data is missing or inconsistent
- Exports data in CSV or JSON format
- Built-in fallback mechanisms for strains that aren't directly accessible
Implementation Approaches
Regex-Based Extraction
The repository includes a regex-based scraper that extracts data using pattern matching:
// Extract cannabinoids using regex
function extractCannabinoids(content: string, strainData: StrainData): void {
// THC extraction
const thcMatch = content.match(/THC\s+(\d+(?:\.\d+)?)-?(\d+(?:\.\d+)?)?\s*%/i);
if (thcMatch) {
// Take higher end of range per methodology
const thcValue = thcMatch[2] ? parseFloat(thcMatch[2]) : parseFloat(thcMatch[1]);
strainData["cannabinoids.THC"] = thcValue / 100;
}
// Similar patterns for other cannabinoids
}
LLM-Powered Extraction
The repository also includes an advanced LLM-powered extraction method that uses structured schemas and AI to extract information more accurately:
// Using the extract tool for LLM-powered extraction
const strainSchema = {
type: 'object',
properties: {
"strain_name": { type: 'string' },
"aliases": { type: 'string' },
"strain_classification": { type: 'string' },
"thc_percentage": { type: 'number' },
"cbd_percentage": { type: 'number' },
"cbg_percentage": { type: 'number' },
"terpenes": {
type: 'object',
properties: {
"myrcene": { type: 'string' },
"caryophyllene": { type: 'string' }
// Other terpenes...
}
},
// Other properties...
}
};
// Extract data using LLM
const extractedData = await client.extract([strainUrl], {
schema: strainSchema,
systemPrompt: "Extract precise cannabis strain data. Use exact numbers when available.",
prompt: `Extract all available data for the cannabis strain "${strain}" according to the schema.`
});
Benefits of LLM extraction:
- Better handling of unstructured text and variations in formatting
- More resilient to website changes
- Can infer missing values based on context
- Extracts relationships between data points
Data Structure
The scraper collects the following categories of data for each strain:
- Basic Information: Strain name
- Terpenes: myrcene, pinene, caryophyllene, limonene, linalool, terpinolene, ocimene, humulene, other
- Cannabinoids: THC, CBD, CBG, CBN, other
- Medical Effects: Stress, Anxiety, Depression, Pain, Insomnia, Lack of Appetite, Nausea, other
- User Effects: Happy, Euphoric, Creative, Relaxed, Uplifted, Energetic, Focused, Sleepy, Hungry, Talkative, Tingly, Giggly, DryMouth, DryEyes, Dizzy, Paranoid, Anxious, other
- Onset and Duration: onset_minutes, duration_hours
- Interactions: Sedatives, Anti-anxiety (benzodiazepines), Antidepressants (SSRIs), Opioid analgesics, Anticonvulsants, Anticoagulants, other
- Flavors: Berry, Sweet, Earthy, Pungent, Pine, Vanilla, Minty, Skunky, Citrus, Spicy, Herbal, Diesel, Tropical, Fruity, Grape, other
Usage
As a Firecrawl MCP Tool
Once integrated with the Firecrawl MCP server, the tool can be called with the following parameters:
{
"name": "firecrawl_leafly_strain",
"arguments": {
"strains": ["Blue Dream", "OG Kush", "Sour Diesel"],
"exportFormat": "csv" // or "json"
}
}
Using the Extract Tool Directly
For more advanced extraction with the LLM-powered approach:
{
"name": "firecrawl_extract",
"arguments": {
"urls": ["https://www.leafly.com/strains/blue-dream"],
"schema": {
"type": "object",
"properties": {
"strain_name": { "type": "string" },
"thc_percentage": { "type": "number" },
"cbd_percentage": { "type": "number" },
"effects": { "type": "string" },
"flavors": { "type": "string" },
"medical": { "type": "string" },
"terpenes": { "type": "object" }
}
},
"prompt": "Extract comprehensive cannabis strain data from this Leafly page."
}
}
Using the CLI Script
You can also use the included CLI script to run the scraper directly:
## Set the Firecrawl API key (required)
export FIRECRAWL_API_KEY=your_api_key_here
## Using npm scripts
npm run scrape-leafly -- output.csv "Blue Dream,OG Kush,Sour Diesel"
## Or running directly
node dist/leafly-scraper-cli.js output.csv "Blue Dream,OG Kush,Sour Diesel"
Methodology
The scraper follows a rigorous methodology for extracting and normalizing data:
- Lab Data Priority: Lab-tested cannabinoid and terpene data is prioritized when available
- Consistent Normalization: When exact values aren't available, standardized normalization is applied:
- For terpenes: dominant = 0.008, second = 0.005, third = 0.003, others = 0.001
- For effects and flavors: Values are normalized to a 0.0-1.0 scale
- Default Values: Standard defaults are applied for commonly missing fields
Troubleshooting
TypeScript Errors
If you encounter TypeScript compilation errors:
- Ensure you have all dependencies installed:
npm install
- Make sure TypeScript is installed:
npm install -g typescript
- TypeScript module errors can typically be fixed by installing the @types packages:
npm install --save-dev @types/node
2. Python Codebase MCP Server
This MCP server provides code analysis capabilities and file system access for codebase navigation.
Installation
- Ensure you have Python 3.7+ installed
- Install dependencies:
pip install mcp-python-sdk watchdog
Features
- File system navigation and file reading
- Code search functionality
- Project structure analysis
- Real-time file change monitoring
- Function and component discovery
- Dependency analysis
Usage
Start the server:
python mcp_server.py
The server provides tools for code analysis:
- search_function: Find function definitions in code files
- search_code: Search for text across all code files
- get_project_structure: Generate a tree-like structure of the project
- analyze_dependencies: Analyze project dependencies
- find_components: Discover React/React Native components
Resources
- /file/list/{directory}: List files in a directory
- /file/read/{filepath}: Read file contents
- /file/info/{filepath}: Get file metadata
- /file/changes/{directory}: Get recently modified files
3. DeepSeek R1 extended MCP Server
This MCP server provides access to DeepSeek AI models for text generation, summarization, and document processing.
Installation
- Ensure you have Node.js 14+ installed
- Install dependencies:
npm install @modelcontextprotocol/sdk openai dotenv
- Set up environment variables:
# Create a .env file echo "DEEPSEEK_API_KEY=your_api_key_here" > .env
Features
- Text generation using DeepSeek R1 model
- Text summarization
- Streaming text generation
- Multi-model support
- Document processing (summarize, extract entities, analyze sentiment)
- File operations for saving outputs
Usage
Start the server:
node deepseek_mcp.js
The server provides the following tools:
- deepseek_r1: Generate text using DeepSeek R1 model
- deepseek_summarize: Summarize text
- deepseek_stream: Stream text generation
- deepseek_multi: Generate text using different DeepSeek models
- deepseek_document: Process documents (summarize, extract entities, analyze sentiment)
Resources
- /model/info: Get information about supported models
- /server/status: Check server status
- /file/save/{filename}: Save content to a file
- /file/list: List saved files
- /file/read/{filename}: Read saved file contents
4. JSON Manager MCP Server
This MCP server provides advanced JSON querying and manipulation capabilities.
Installation
- Ensure you have Node.js 14+ installed
- Install dependencies:
npm install @modelcontextprotocol/sdk node-fetch jsonpath
Features
- Query JSON data using JSONPath
- Advanced filtering
- String operations
- Numeric operations
- Date operations
- Array transformations
- Complex data comparisons
- Result caching
- Save and manage query results
Usage
Start the server:
node json_mcp.js
The server provides the following tools:
- query: Query JSON data using JSONPath expressions
- filter: Filter JSON data based on conditions
- save_query: Save query results to a file
- compare_json: Compare two JSON datasets
Resources
- /saved_queries/list: List saved queries
- /saved_queries/get/{filename}: Retrieve a saved query
- /cache/status: Check cache status
- /cache/clear: Clear the cache
License
MIT License