Prompt Tester
MCP Prompt Tester
A simple MCP server that allows agents to test LLM prompts with different providers.
Features
- Test prompts with OpenAI and Anthropic models
- Configure system prompts, user prompts, and other parameters
- Get formatted responses or error messages
- Easy environment setup with .env file support
Installation
## Install with pip
pip install -e .
## Or with uv
uv install -e .
API Key Setup
The server requires API keys for the providers you want to use. You can set these up in two ways:
Option 1: Environment Variables
Set the following environment variables:
OPENAI_API_KEY
- Your OpenAI API keyANTHROPIC_API_KEY
- Your Anthropic API key
Option 2: .env File (Recommended)
- Create a file named
.env
in your project directory or home directory - Add your API keys in the following format:
OPENAI_API_KEY=your-openai-api-key-here
ANTHROPIC_API_KEY=your-anthropic-api-key-here
- The server will automatically detect and load these keys
For convenience, a sample template is included as .env.example
.
Usage
Start the server using stdio (default) or SSE transport:
## Using stdio transport (default)
prompt-tester
## Using SSE transport on custom port
prompt-tester --transport sse --port 8000
Available Tools
The server exposes the following tools for MCP-empowered agents:
1. list_providers
Retrieves available LLM providers and their default models.
Parameters:
- None required
Example Response:
{
"providers": {
"openai": [
{
"type": "gpt-4",
"name": "gpt-4",
"input_cost": 0.03,
"output_cost": 0.06,
"description": "Most capable GPT-4 model"
},
// ... other models ...
],
"anthropic": [
// ... models ...
]
}
}
2. test_comparison
Compares multiple prompts side-by-side, allowing you to test different providers, models, and parameters simultaneously.
Parameters:
comparisons
(array): A list of 1-4 comparison configurations, each containing:provider
(string): The LLM provider to use ("openai" or "anthropic")model
(string): The model namesystem_prompt
(string): The system prompt (instructions for the model)user_prompt
(string): The user's message/prompttemperature
(number, optional): Controls randomnessmax_tokens
(integer, optional): Maximum number of tokens to generatetop_p
(number, optional): Controls diversity via nucleus sampling
Example Usage:
{
"comparisons": [
{
"provider": "openai",
"model": "gpt-4",
"system_prompt": "You are a helpful assistant.",
"user_prompt": "Explain quantum computing in simple terms.",
"temperature": 0.7
},
{
"provider": "anthropic",
"model": "claude-3-opus-20240229",
"system_prompt": "You are a helpful assistant.",
"user_prompt": "Explain quantum computing in simple terms.",
"temperature": 0.7
}
]
}
3. test_multiturn_conversation
Manages multi-turn conversations with LLM providers, allowing you to create and maintain stateful conversations.
Modes:
start
: Begins a new conversationcontinue
: Continues an existing conversationget
: Retrieves conversation historylist
: Lists all active conversationsclose
: Closes a conversation
Parameters:
mode
(string): Operation mode ("start", "continue", "get", "list", or "close")conversation_id
(string): Unique ID for the conversation (required for continue, get, close modes)provider
(string): The LLM provider (required for start mode)model
(string): The model name (required for start mode)system_prompt
(string): The system prompt (required for start mode)user_prompt
(string): The user message (used in start and continue modes)temperature
(number, optional): Temperature parameter for the modelmax_tokens
(integer, optional): Maximum tokens to generatetop_p
(number, optional): Top-p sampling parameter
Example Usage (Starting a Conversation):
{
"mode": "start",
"provider": "openai",
"model": "gpt-4",
"system_prompt": "You are a helpful assistant specializing in physics.",
"user_prompt": "Can you explain what dark matter is?"
}
Example Usage (Continuing a Conversation):
{
"mode": "continue",
"conversation_id": "conv_12345",
"user_prompt": "How does that relate to dark energy?"
}
Example Usage for Agents
Using the MCP client, an agent can use the tools like this:
import asyncio
import json
from mcp.client.session import ClientSession
from mcp.client.stdio import StdioServerParameters, stdio_client
async def main():
async with stdio_client(
StdioServerParameters(command="prompt-tester")
) as (read, write):
async with ClientSession(read, write) as session:
await session.initialize()
# 1. List available providers and models
providers_result = await session.call_tool("list_providers", {})
print("Available providers and models:", providers_result)
# 2. Run a basic test with a single model and prompt
comparison_result = await session.call_tool("test_comparison", {
"comparisons": [
{
"provider": "openai",
"model": "gpt-4",
"system_prompt": "You are a helpful assistant.",
"user_prompt": "Explain quantum computing in simple terms.",
"temperature": 0.7,
"max_tokens": 500
}
]
})
print("Single model test result:", comparison_result)
# 3. Compare multiple prompts/models side by side
comparison_result = await session.call_tool("test_comparison", {
"comparisons": [
{
"provider": "openai",
"model": "gpt-4",
"system_prompt": "You are a helpful assistant.",
"user_prompt": "Explain quantum computing in simple terms.",
"temperature": 0.7
},
{
"provider": "anthropic",
"model": "claude-3-opus-20240229",
"system_prompt": "You are a helpful assistant.",
"user_prompt": "Explain quantum computing in simple terms.",
"temperature": 0.7
}
]
})
print("Comparison result:", comparison_result)
# 4. Start a multi-turn conversation
conversation_start = await session.call_tool("test_multiturn_conversation", {
"mode": "start",
"provider": "openai",
"model": "gpt-4",
"system_prompt": "You are a helpful assistant specializing in physics.",
"user_prompt": "Can you explain what dark matter is?"
})
print("Conversation started:", conversation_start)
# Get the conversation ID from the response
response_data = json.loads(conversation_start.text)
conversation_id = response_data.get("conversation_id")
# Continue the conversation
if conversation_id:
conversation_continue = await session.call_tool("test_multiturn_conversation", {
"mode": "continue",
"conversation_id": conversation_id,
"user_prompt": "How does that relate to dark energy?"
})
print("Conversation continued:", conversation_continue)
# Get the conversation history
conversation_history = await session.call_tool("test_multiturn_conversation", {
"mode": "get",
"conversation_id": conversation_id
})
print("Conversation history:", conversation_history)
asyncio.run(main())
MCP Agent Integration
For MCP-empowered agents, integration is straightforward. When your agent needs to test LLM prompts:
- Discovery: The agent can use
list_providers
to discover available models and their capabilities - Simple Testing: For quick tests, use the
test_comparison
tool with a single configuration - Comparison: When the agent needs to evaluate different prompts or models, it can use
test_comparison
with multiple configurations - Stateful Interactions: For multi-turn conversations, the agent can manage a conversation using the
test_multiturn_conversation
tool
This allows agents to:
- Test prompt variants to find the most effective phrasing
- Compare different models for specific tasks
- Maintain context in multi-turn conversations
- Optimize parameters like temperature and max_tokens
- Track token usage and costs during development
Configuration
You can set API keys and optional tracing configurations using environment variables:
Required API Keys
OPENAI_API_KEY
- Your OpenAI API keyANTHROPIC_API_KEY
- Your Anthropic API key
Optional Langfuse Tracing
The server supports Langfuse for tracing and observability of LLM calls. These settings are optional:
LANGFUSE_SECRET_KEY
- Your Langfuse secret keyLANGFUSE_PUBLIC_KEY
- Your Langfuse public keyLANGFUSE_HOST
- URL of your Langfuse instance
If you don't want to use Langfuse tracing, simply leave these settings empty.