Databricks

# Markov Databricks MCP

A Model Completion Protocol (MCP) server for Databricks that provides access to Databricks functionality via the MCP protocol. This allows LLM-powered tools to interact with Databricks clusters, jobs, notebooks, and more.

This project is maintained by Olivier Debeuf De Rijcker <olivier@markov.bot>.

Credit for the initial version goes to [@JustTryAI](https://github.com/JustTryAI/databricks-mcp-server).

## Features

- **MCP Protocol Support**: Implements the MCP protocol to allow LLMs to interact with Databricks
- **Databricks API Integration**: Provides access to Databricks REST API functionality
- **Tool Registration**: Exposes Databricks functionality as MCP tools
- **Async Support**: Built with asyncio for efficient operation

## Available Tools

The Databricks MCP Server exposes the following tools:

- **list_clusters**: List all Databricks clusters
- **create_cluster**: Create a new Databricks cluster
- **terminate_cluster**: Terminate a Databricks cluster
- **get_cluster**: Get information about a specific Databricks cluster
- **start_cluster**: Start a terminated Databricks cluster
- **list_jobs**: List all Databricks jobs
- **run_job**: Run a Databricks job
- **list_notebooks**: List notebooks in a workspace directory
- **export_notebook**: Export a notebook from the workspace
- **list_files**: List files and directories in a DBFS path
- **execute_sql**: Execute a SQL statement

## Installation

### Prerequisites

- Python 3.10 or higher
- `uv` package manager (recommended for MCP servers)

### Setup

1. Install `uv` if you don't have it already:

```bash
# MacOS/Linux
curl -LsSf https://astral.sh/uv/install.sh | sh

# Windows (in PowerShell)
irm https://astral.sh/uv/install.ps1 | iex
```

Restart your terminal after installation.

2. Clone the repository:
```bash
git clone https://github.com/markov-kernel/databricks-mcp.git
cd databricks-mcp
```

3. Set up the project with `uv`:
```bash
# Create and activate virtual environment
uv venv

# On Windows
.\.venv\Scripts\activate

# On Linux/Mac
source .venv/bin/activate

# Install dependencies in development mode
uv pip install -e .

# Install development dependencies
uv pip install -e ".[dev]"
```

4. Set up environment variables:
```bash
# Windows
set DATABRICKS_HOST=https://your-databricks-instance.azuredatabricks.net
set DATABRICKS_TOKEN=your-personal-access-token

# Linux/Mac
export DATABRICKS_HOST=https://your-databricks-instance.azuredatabricks.net
export DATABRICKS_TOKEN=your-personal-access-token
```

You can also create an `.env` file based on the `.env.example` template.

## Running the MCP Server

### Standalone

To start the MCP server directly for testing or development, run:

```bash
# Activate your virtual environment if not already active
source .venv/bin/activate

# Run the start script (handles finding env vars from .env if needed)
./scripts/start_mcp_server.sh
```

This is useful for seeing direct output and logs.

### Integrating with AI Clients

To use this server with AI clients like Cursor or Claude CLI, you need to register it.

#### Cursor Setup

1. Open your global MCP configuration file located at `~/.cursor/mcp.json` (create it if it doesn't exist).
2. Add the following entry within the `mcpServers` object, replacing placeholders with your actual values and ensuring the path to `start_mcp_server.sh` is correct:

```json
{
"mcpServers": {
// ... other servers ...
"databricks-mcp-local": {
"command": "/absolute/path/to/your/project/databricks-mcp-server/start_mcp_server.sh",
"args": [],
"env": {
"DATABRICKS_HOST": "https://your-databricks-instance.azuredatabricks.net",
"DATABRICKS_TOKEN": "dapiXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
"RUNNING_VIA_CURSOR_MCP": "true"
}
}
// ... other servers ...
}
}
```

3. **Important:** Replace `/absolute/path/to/your/project/databricks-mcp-server/` with the actual absolute path to this project directory on your machine.
4. Replace the `DATABRICKS_HOST` and `DATABRICKS_TOKEN` values with your credentials.
5. Save the file and **restart Cursor**.

6. You can now invoke tools using `databricks-mcp-local:<tool_name>` (e.g., `databricks-mcp-local:list_jobs`).

#### Claude CLI Setup

1. Use the `claude mcp add` command to register the server. Provide your credentials using the `-e` flag for environment variables and point the command to the `start_mcp_server.sh` script using `--` followed by the absolute path:

```bash
claude mcp add databricks-mcp-local \
-s user \
-e DATABRICKS_HOST="https://your-databricks-instance.azuredatabricks.net" \
-e DATABRICKS_TOKEN="dapiXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX" \
-- /absolute/path/to/your/project/databricks-mcp-server/start_mcp_server.sh
```

2. **Important:** Replace `/absolute/path/to/your/project/databricks-mcp-server/` with the actual absolute path to this project directory on your machine.
3. Replace the `DATABRICKS_HOST` and `DATABRICKS_TOKEN` values with your credentials.

4. You can now invoke tools using `databricks-mcp-local:<tool_name>` in your Claude interactions.

## Querying Databricks Resources

The repository includes utility scripts to quickly view Databricks resources:

```bash
# View all clusters
uv run scripts/show_clusters.py

# View all notebooks
uv run scripts/show_notebooks.py
```

## Project Structure

```
databricks-mcp-server/
├── src/ # Source code
│ ├── __init__.py # Makes src a package
│ ├── __main__.py # Main entry point for the package
│ ├── main.py # Entry point for the MCP server
│ ├── api/ # Databricks API clients
│ ├── core/ # Core functionality
│ ├── server/ # Server implementation
│ │ ├── databricks_mcp_server.py # Main MCP server
│ │ └── app.py # FastAPI app for tests
│ └── cli/ # Command-line interface
├── tests/ # Test directory
├── scripts/ # Helper scripts
│ ├── start_mcp_server.ps1 # Server startup script (Windows)
│ ├── run_tests.ps1 # Test runner script
│ ├── show_clusters.py # Script to show clusters
│ └── show_notebooks.py # Script to show notebooks
├── examples/ # Example usage
├── docs/ # Documentation
└── pyproject.toml # Project configuration
```

See `project_structure.md` for a more detailed view of the project structure.

## Development

### Code Standards

- Python code follows PEP 8 style guide with a maximum line length of 100 characters
- Use 4 spaces for indentation (no tabs)
- Use double quotes for strings
- All classes, methods, and functions should have Google-style docstrings
- Type hints are required for all code except tests

### Linting

The project uses the following linting tools:

```bash
# Run all linters
uv run pylint src/ tests/
uv run flake8 src/ tests/
uv run mypy src/
```

## Testing

The project uses pytest for testing. To run the tests:

```bash
# Run all tests with our convenient script
.\scripts\run_tests.ps1

# Run with coverage report
.\scripts\run_tests.ps1 -Coverage

# Run specific tests with verbose output
.\scripts\run_tests.ps1 -Verbose -Coverage tests/test_clusters.py
```

You can also run the tests directly with pytest:

```bash
# Run all tests
uv run pytest tests/

# Run with coverage report
uv run pytest --cov=src tests/ --cov-report=term-missing
```

A minimum code coverage of 80% is the goal for the project.

## Documentation

- API documentation is generated using Sphinx and can be found in the `docs/api` directory
- All code includes Google-style docstrings
- See the `examples/` directory for usage examples

## Examples

Check the `examples/` directory for usage examples. To run examples:

```bash
# Run example scripts with uv
uv run examples/direct_usage.py
uv run examples/mcp_client_usage.py
```

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

1. Ensure your code follows the project's coding standards
2. Add tests for any new functionality
3. Update documentation as necessary
4. Verify all tests pass before submitting

## License

This project is licensed under the MIT License - see the LICENSE file for details.

## About

A Model Completion Protocol (MCP) server for interacting with Databricks services. Maintained by markov.bot.