Docker Hub MCP

ArXiv

by github.com/jasonleinart · Search

0.0 · 0 reviews

0 installs · 4 tools

The ArXiv MCP Server provides a comprehensive bridge between AI assistants and arXiv's research repository through the Model Context Protocol (MCP). Features: • Search arXiv papers with advanced filtering • Download and store papers locally as markdown • Read and analyze paper content • Deep research analysis prompts • Local paper management and storage • Enhanced tool descriptions optimized for local AI models • Docker MCP Gateway compatible with detailed context Perfect for researchers, academics, and AI assistants conducting literature reviews and research analysis. **Recent Update**: Enhanced tool descriptions specifically designed to resolve local AI model confusion and improve Docker MCP Gateway compatibility.

ArXiv MCP Server - Docker Implementation

Production-Ready Containerized Research Assistant

🐳 DOCKER-FIRST: Production-ready containerized ArXiv research capabilities for AI assistants

🔬 RESEARCH-FOCUSED: Complete academic workflow - search, download, analyze papers seamlessly

Why This Docker Implementation?: - ✅ Container Isolation: Secure, reproducible research environment - ✅ Volume Persistence: Papers survive container restarts
- ✅ Production Grade: Multi-stage builds, optimized for performance - ✅ Cross-Platform: Works on any Docker-enabled system - ✅ MCP Compliant: Full Model Context Protocol 2024-11-05 support

🚀 Docker vs Traditional MCP: Why Container Matters

Feature	Traditional MCP	This Docker Implementation
Deployment	Local Python install	Single `docker run` command
Dependencies	Manual environment setup	All dependencies included
Isolation	Host system dependencies	Complete container isolation
Portability	Platform-specific setup	Works anywhere Docker runs
Storage	Local filesystem only	Persistent volume mounting
Scaling	Single instance	Easy multi-container deployment
Security	Host system access	Sandboxed execution

🎯 Key Docker Advantages

Zero Setup Friction: No Python environment conflicts or dependency issues
Reproducible Research: Same environment across different machines/platforms
Storage Persistence: Downloaded papers persist outside container lifecycle
Security Isolation: Research tools run in contained environment
Production Ready: Battle-tested Docker deployment patterns

The ArXiv MCP Server provides a bridge between AI assistants and arXiv's research repository through the Model Context Protocol (MCP). It allows AI models to search for papers and access their content in a programmatic way.

🤝 **[Contribute](https://github.com/blazickjp/arxiv-mcp-server/blob/main/CONTRIBUTING.md)** • 📝 **[Report Bug](https://github.com/blazickjp/arxiv-mcp-server/issues)** • 🐳 **[Docker Registry](https://github.com/docker/mcp-registry/pull/66)** ✅

✨ Core Features

🔎 Paper Search: Query arXiv papers with filters for date ranges and categories
📄 Paper Access: Download and read paper content
📋 Paper Listing: View all downloaded papers
🗃️ Local Storage: Papers are saved locally for faster access
📝 Prompts: A Set of Research Prompts
🐳 Docker Ready: Official Docker MCP Registry integration with volume mounting

🚀 Quick Start with Docker

Option 1: Pre-built Docker Image (Recommended)

# Pull and run the latest image
docker run -i --rm \
  -v ./papers:/app/papers \
  jasonleinart/arxiv-mcp-server:latest

Option 2: Build from Source

# Clone this Docker-optimized repository
git clone https://github.com/jasonleinart/arxiv-mcp-server.git
cd arxiv-mcp-server

# Build the Docker image
docker build -t arxiv-mcp-server:local .

# Run your local build
docker run -i --rm \
  -v ./papers:/app/papers \
  arxiv-mcp-server:local

🔌 Claude Code Integration

Configure Claude Code to use the Docker MCP server by adding this to your claude_desktop_config.json:

{
  "mcpServers": {
    "arxiv-mcp-server-docker": {
      "command": "docker",
      "args": [
        "run",
        "--rm",
        "-i",
        "--name", "arxiv-mcp-server",
        "-v", "/path/to/your/papers:/app/papers",
        "jasonleinart/arxiv-mcp-server:latest"
      ],
      "env": {
        "ARXIV_STORAGE_PATH": "/app/papers"
      }
    }
  }
}

Important: Replace /path/to/your/papers with your desired local storage path.

🔧 Docker Deployment Options

Development Mode

# Mount source code for development
docker run -i --rm \
  -v $(pwd):/app \
  -v ./papers:/app/papers \
  python:3.11-slim \
  bash -c "cd /app && pip install -e . && python -m arxiv_mcp_server"

Production Mode with Custom Storage

# Run with specific storage location
docker run -i --rm \
  -v /your/research/papers:/app/papers \
  -e ARXIV_STORAGE_PATH=/app/papers \
  jasonleinart/arxiv-mcp-server:latest

Background Service Mode

# Run as background service
docker run -d \
  --name arxiv-mcp-service \
  -v ./papers:/app/papers \
  --restart unless-stopped \
  jasonleinart/arxiv-mcp-server:latest

🐳 Docker Architecture & Technical Details

Container Specifications

Base Image: Multi-stage build with python:3.11-slim-bookworm
Package Manager: UV for fast dependency resolution
Build Optimization: Bytecode compilation enabled for performance
Security: Non-root execution with minimal attack surface
Size: Optimized layers for efficient image distribution

Volume Mounting Requirements

Critical Path: Papers MUST be mounted to /app/papers inside container

# ✅ Correct - papers persist on host
docker run -v /host/papers:/app/papers jasonleinart/arxiv-mcp-server:latest

# ❌ Wrong - papers lost when container stops  
docker run jasonleinart/arxiv-mcp-server:latest

Environment Variables

Variable	Default	Purpose
`ARXIV_STORAGE_PATH`	`/app/papers`	Container storage location
`PYTHONUNBUFFERED`	`1`	Real-time logging output

Docker Compose Example

version: '3.8'
services:
  arxiv-mcp:
    image: jasonleinart/arxiv-mcp-server:latest
    volumes:
      - ./research-papers:/app/papers
    environment:
      - ARXIV_STORAGE_PATH=/app/papers
    restart: unless-stopped
    stdin_open: true
    tty: true

Multi-Platform Support

x86_64: Intel/AMD processors
ARM64: Apple Silicon (M1/M2/M3), AWS Graviton
Linux: Ubuntu, Debian, CentOS, Alpine
macOS: Docker Desktop integration
Windows: WSL2 backend support

Performance Characteristics

Startup Time: < 2 seconds cold start
Memory Usage: ~150MB baseline + paper storage
Network: Efficient arXiv API usage with caching
Storage: Papers stored as both PDF and optimized markdown

Production Deployment Tested

✅ Agent Validation Complete: Full tool functionality verified - Search operations: ✅ Successful arXiv queries
- Download pipeline: ✅ PDF→Markdown conversion working - Volume persistence: ✅ Papers survive container restarts - MCP protocol: ✅ Full 2024-11-05 compliance - Claude Code integration: ✅ Seamless AI assistant connectivity

💡 Available Tools

The server provides four main tools designed to work together in research workflows:

1. Paper Search (`search_papers`)

🔍 Purpose: Find relevant research papers by topic, author, or category

When to use: Starting research, finding recent papers, exploring a field

# Basic search
result = await call_tool("search_papers", {
    "query": "transformer architecture"
})

# Advanced search with filters
result = await call_tool("search_papers", {
    "query": "attention mechanism neural networks",
    "max_results": 20,
    "date_from": "2023-01-01",
    "date_to": "2024-12-31",
    "categories": ["cs.AI", "cs.LG", "cs.CL"]
})

# Search by author
result = await call_tool("search_papers", {
    "query": "au:\"Vaswani, A\"",
    "max_results": 10
})

2. Paper Download (`download_paper`)

📥 Purpose: Download and convert papers to readable markdown format

When to use: After finding interesting papers, before reading full content

# Download a specific paper
result = await call_tool("download_paper", {
    "paper_id": "1706.03762"  # "Attention Is All You Need"
})

# Check download status
result = await call_tool("download_paper", {
    "paper_id": "1706.03762",
    "check_status": true
})

3. List Papers (`list_papers`)

📋 Purpose: View your local paper library

When to use: Check what papers you have, avoid re-downloading, browse collection

# See all downloaded papers
result = await call_tool("list_papers", {})

4. Read Paper (`read_paper`)

📖 Purpose: Access full text content of downloaded papers

When to use: Deep analysis, quotation, detailed study of methodology/results

# Read full paper content
result = await call_tool("read_paper", {
    "paper_id": "1706.03762"
})

🔄 Research Workflows

Complete Research Workflow

Here's how the tools work together in real research scenarios:

Scenario 1: Exploring a New Research Area

# Step 1: Search for recent papers in the field
search_result = await call_tool("search_papers", {
    "query": "large language model reasoning",
    "max_results": 15,
    "date_from": "2024-01-01",
    "categories": ["cs.AI", "cs.CL"]
})

# Step 2: Download promising papers
await call_tool("download_paper", {"paper_id": "2401.12345"})
await call_tool("download_paper", {"paper_id": "2402.67890"})

# Step 3: List your collection to confirm downloads
library = await call_tool("list_papers", {})

# Step 4: Read papers for detailed analysis
paper_content = await call_tool("read_paper", {"paper_id": "2401.12345"})

Scenario 2: Following Up on Specific Authors

# Find papers by specific researchers
result = await call_tool("search_papers", {
    "query": "au:\"Anthropic\" OR au:\"OpenAI\"",
    "max_results": 10,
    "date_from": "2023-06-01"
})

# Download the most relevant papers
for paper in result['papers'][:3]:
    await call_tool("download_paper", {"paper_id": paper['id']})

Scenario 3: Building a Literature Review

# Search multiple related topics
topics = [
    "transformer interpretability",
    "attention visualization",
    "neural network explainability"
]

for topic in topics:
    results = await call_tool("search_papers", {
        "query": topic,
        "max_results": 8,
        "date_from": "2022-01-01"
    })

    # Download top papers from each topic
    for paper in results['papers'][:2]:
        await call_tool("download_paper", {"paper_id": paper['id']})

# Review your complete collection
library = await call_tool("list_papers", {})

📝 Research Prompts

The server offers specialized prompts to help analyze academic papers:

Paper Analysis Prompt

A comprehensive workflow for analyzing academic papers that only requires a paper ID:

result = await call_prompt("deep-paper-analysis", {
    "paper_id": "2401.12345"
})

This prompt includes: - Detailed instructions for using available tools (list_papers, download_paper, read_paper, search_papers) - A systematic workflow for paper analysis - Comprehensive analysis structure covering: - Executive summary - Research context - Methodology analysis - Results evaluation - Practical and theoretical implications - Future research directions - Broader impacts

⚙️ Configuration

Configure through environment variables:

Variable	Purpose	Default
`ARXIV_STORAGE_PATH`	Paper storage location	~/.arxiv-mcp-server/papers

📖 Advanced Usage Reference

Common ArXiv Categories

Category	Description	Use Cases
`cs.AI`	Artificial Intelligence	General AI research, reasoning, planning
`cs.LG`	Machine Learning	Neural networks, deep learning, training
`cs.CL`	Computation and Language	NLP, language models, text processing
`cs.CV`	Computer Vision	Image processing, visual recognition
`cs.RO`	Robotics	Autonomous systems, control theory
`stat.ML`	Machine Learning (Statistics)	Statistical learning theory, methods

Search Query Examples

Topic searches: "transformer architecture", "reinforcement learning" Author searches: "au:\"Hinton, Geoffrey\"", "au:OpenAI OR au:Anthropic" Title searches: "ti:\"Attention Is All You Need\"", "ti:BERT OR ti:GPT" Combined searches: "ti:transformer AND au:Vaswani", "abs:\"few-shot learning\" AND cat:cs.LG"

Local Model Best Practices

Use explicit workflows: Guide your model through Search → Download → List → Read → Analyze
Reference tool purposes: Mention why you're using each tool in your prompts
Check library first: Always use list_papers before downloading to avoid duplicates
Be specific with parameters: Use the exact formats shown in tool examples

🧪 Testing

Run the test suite:

python -m pytest

🤔 Docker vs Traditional MCP: When to Choose

Choose Docker Implementation When:

✅ Production deployment - Need reliable, consistent environments
✅ Team collaboration - Multiple developers need identical setups
✅ CI/CD integration - Automated testing and deployment pipelines
✅ Security isolation - Research tools need sandboxed execution
✅ Cross-platform - Supporting Windows, macOS, Linux users
✅ Scaling requirements - Multiple instances or load balancing
✅ Zero setup friction - Users want single-command deployment

Choose Traditional MCP When:

🔧 Development workflow - Active code modification and debugging
🔧 Custom integrations - Need to modify source code extensively
🔧 Resource constraints - Minimal overhead requirements
🔧 Direct filesystem - Need native host filesystem access patterns

Migration Path

Already using traditional MCP? Easy migration:

# Traditional MCP
uv tool run arxiv-mcp-server

# Equivalent Docker command  
docker run -i --rm -v ./papers:/app/papers jasonleinart/arxiv-mcp-server:latest

Your existing papers and workflows remain compatible!

🤖 Enhanced for Local Models & Docker MCP Gateway

Addressing Community Feedback: This Docker implementation specifically resolves issues with sparse tool descriptions that confuse local AI models.

🔍 Rich Tool Descriptions

Unlike minimal descriptions that cause local model confusion, each tool includes:

Purpose Statement: Clear explanation of what the tool does
Usage Context: When and why to use this tool
Parameter Guidance: Detailed input specifications with examples
Query Patterns: Built-in examples for search syntax and formatting
Integration Flow: How tools work together in research workflows

🎯 Local LLM Optimization Features

Docker MCP Gateway Ready: Seamless integration with local model deployments
Llama/Mistral/Local Model Tested: Verified compatibility with popular local LLMs
Context-Rich Responses: Tools provide detailed feedback to help models understand results
Error Handling: Clear error messages that local models can interpret and act on
Workflow Guidance: Tools suggest logical next steps in research processes

📋 Example: Enhanced Tool Descriptions

Before (Sparse): "search_papers": "Search arXiv papers"

After (Rich): "Search for academic research papers on arXiv.org using advanced filtering capabilities. This tool allows you to find papers by keywords, authors, categories, and date ranges. Use this when you need to discover relevant research papers on a specific topic, find papers by a particular author, or explore recent publications in a field..."

Impact: Local models now understand tool context and usage patterns, dramatically improving research workflow success rates.

🧪 Testing & Validation

This Docker implementation has been extensively tested:

Agent Testing: Validated with Claude Code using real research workflows
Multi-platform: Tested on macOS (Apple Silicon), Linux (x86_64)
Volume Persistence: Papers verified to survive container restarts
Performance: Sub-2-second startup, efficient memory usage
MCP Compliance: Full protocol 2024-11-05 compatibility

📄 License

Released under the Apache 2.0 License. See the LICENSE file for details.

🤝 Contributing

This is a Docker-focused fork optimizing ArXiv MCP for containerized deployment.

Original MCP Server: blazickjp/arxiv-mcp-server
This Docker Implementation: Focus on production container deployment

**🐳 Containerized Research Excellence** Made for researchers, by developers who understand deployment complexity. [![Docker](https://img.shields.io/badge/Get%20Started-Docker%20Implementation-blue?style=for-the-badge&logo=docker)](https://github.com/jasonleinart/arxiv-mcp-server)

search_papers Search for papers on arXiv with advanced filtering and query optimization. QUERY CONSTRUCTION GUIDELINES: - Use QUOTED PHRASES for exact matches: "multi-agent systems", "neural networks", "machine learning" - Combine related concepts with OR: "AI agents" OR "software agents" OR "intelligent agents" - Use field-specific searches for precision: - ti:"exact title phrase" - search in titles only - au:"author name" - search by author - abs:"keyword" - search in abstracts only - Use ANDNOT to exclude unwanted results: "machine learning" ANDNOT "survey" - For best results, use 2-4 core concepts rather than long keyword lists ADVANCED SEARCH PATTERNS: - Field + phrase: ti:"transformer architecture" for papers with exact title phrase - Multiple fields: au:"Smith" AND ti:"quantum" for author Smith's quantum papers - Exclusions: "deep learning" ANDNOT ("survey" OR "review") to exclude survey papers - Broad + narrow: "artificial intelligence" AND (robotics OR "computer vision") CATEGORY FILTERING (highly recommended for relevance): - cs.AI: Artificial Intelligence - cs.MA: Multi-Agent Systems - cs.LG: Machine Learning - cs.CL: Computation and Language (NLP) - cs.CV: Computer Vision - cs.RO: Robotics - cs.HC: Human-Computer Interaction - cs.CR: Cryptography and Security - cs.DB: Databases EXAMPLES OF EFFECTIVE QUERIES: - ti:"reinforcement learning" with categories: ["cs.LG", "cs.AI"] - for RL papers by title - au:"Hinton" AND "deep learning" with categories: ["cs.LG"] - for Hinton's deep learning work - "multi-agent" ANDNOT "survey" with categories: ["cs.MA"] - exclude survey papers - abs:"transformer" AND ti:"attention" with categories: ["cs.CL"] - attention papers with transformer abstracts DATE FILTERING: Use YYYY-MM-DD format for historical research: - date_to: "2015-12-31" - for foundational/classic work (pre-2016) - date_from: "2020-01-01" - for recent developments (post-2020) - Both together for specific time periods RESULT QUALITY: Results sorted by RELEVANCE (most relevant papers first), not just newest papers. This ensures you get the most pertinent results regardless of publication date. TIPS FOR FOUNDATIONAL RESEARCH: - Use date_to: "2010-12-31" to find classic papers on BDI, SOAR, ACT-R - Combine with field searches: ti:"BDI" AND abs:"belief desire intention" - Try author searches: au:"Rao" AND "BDI" for Anand Rao's foundational BDI work

Parameters

query Search query using quoted phrases for exact matches (e.g., '"machine learning" OR "deep learning"') or specific technical terms. Avoid overly broad or generic terms. required

max_results Maximum number of results to return (default: 10, max: 50). Use 15-20 for comprehensive searches.

date_from Start date for papers (YYYY-MM-DD format). Use to find recent work, e.g., '2023-01-01' for last 2 years.

date_to End date for papers (YYYY-MM-DD format). Use with date_from to find historical work, e.g., '2020-12-31' for older research.

categories Strongly recommended: arXiv categories to focus search (e.g., ['cs.AI', 'cs.MA'] for agent research, ['cs.LG'] for ML, ['cs.CL'] for NLP, ['cs.CV'] for vision). Greatly improves relevance.

sort_by Sort results by 'relevance' (most relevant first, default) or 'date' (newest first). Use 'relevance' for focused searches, 'date' for recent developments.

download_paper Download and convert an arXiv paper to readable markdown format for analysis and reading. This tool fetches the PDF from arXiv, converts it to markdown using advanced text extraction, and stores it locally for immediate access. Use this tool when you need to read, analyze, or work with the full text content of a specific paper. The conversion process extracts text, preserves formatting, and handles mathematical equations. Returns the full paper content directly upon successful completion.

Parameters

paper_id The arXiv identifier of the paper to download (e.g., '2301.07041', '1706.03762', 'cs.AI/0301001'). This can be found in search results or arXiv URLs. The paper must exist on arXiv. required

check_status Set to true to only check the status of an ongoing or completed conversion without starting a new download. Use this to monitor long-running conversions or verify if a paper is already available.

list_papers List all previously downloaded and converted papers that are available in local storage for immediate reading and analysis. This tool shows you what papers you already have access to without needing to download them again. Each paper in the list includes metadata like title, authors, abstract, and direct links. Use this tool to see your paper library, check if a specific paper is already downloaded, or browse previously acquired research papers before downloading new ones.

read_paper Read the full text content of a previously downloaded and converted research paper in clean markdown format. This tool retrieves the complete paper content including abstract, introduction, methodology, results, conclusions, and references. The content is formatted for easy reading and analysis, with preserved mathematical equations and structured sections. Use this tool when you need to access the full text of a paper for detailed study, quotation, analysis, or research. The paper must have been previously downloaded using the download_paper tool.

Parameters

paper_id The arXiv identifier of the paper to read (e.g., '2301.07041', '1706.03762'). This must be a paper that has been previously downloaded and converted to markdown format. Use list_papers to see available papers. required

🥉

Security Tier

Bronze

Score
out of 100

Scanned by

orcorus-marketplace-automation

Mar 13, 2026

AI review failed after 1 turns: Error code: 401 - {'error': {'code': '', 'message': '[sk-hPL***S1Z] token quota exhausted !token.UnlimitedQuota && token.RemainQuota = -499 (request id: 202603130940279345850933PxRa3DV)', 'type': 'comet_api_error'}}

0.0

0 reviews

No reviews yet — be the first!

Connect →

0.0

★ Rating

Tools

Installs

Configuration

ARXIV_STORAGE_PATH string

ARXIV_STORAGE_PATH

/Users/local-test/papers

Default: /app/papers

Docker Image

Docker Hub

            mcp/arxiv-mcp-server
          

Links

Source Commit

Published by github.com/jasonleinart

Similar Servers

ArXiv

ArXiv MCP Server - Docker Implementation

Production-Ready Containerized Research Assistant

🚀 Docker vs Traditional MCP: Why Container Matters

🎯 Key Docker Advantages

✨ Core Features

🚀 Quick Start with Docker

Option 1: Pre-built Docker Image (Recommended)

Option 2: Build from Source

🔌 Claude Code Integration

🔧 Docker Deployment Options

Development Mode

Production Mode with Custom Storage

Background Service Mode

🐳 Docker Architecture & Technical Details

Container Specifications

Volume Mounting Requirements

Environment Variables

Docker Compose Example

Multi-Platform Support

Performance Characteristics

Production Deployment Tested

💡 Available Tools

1. Paper Search (search_papers)

2. Paper Download (download_paper)

3. List Papers (list_papers)

4. Read Paper (read_paper)

🔄 Research Workflows

Complete Research Workflow

Scenario 1: Exploring a New Research Area

Scenario 2: Following Up on Specific Authors

Scenario 3: Building a Literature Review

📝 Research Prompts

Paper Analysis Prompt

⚙️ Configuration

📖 Advanced Usage Reference

Common ArXiv Categories

Search Query Examples

Local Model Best Practices

🧪 Testing

🤔 Docker vs Traditional MCP: When to Choose

Choose Docker Implementation When:

Choose Traditional MCP When:

Migration Path

🤖 Enhanced for Local Models & Docker MCP Gateway

🔍 Rich Tool Descriptions

🎯 Local LLM Optimization Features

📋 Example: Enhanced Tool Descriptions

🧪 Testing & Validation

📄 License

🤝 Contributing

Similar Servers

1. Paper Search (`search_papers`)

2. Paper Download (`download_paper`)

3. List Papers (`list_papers`)

4. Read Paper (`read_paper`)