Wikipedia MCP Server
A Model Context Protocol (MCP) server that retrieves information from Wikipedia to provide context to Large Language Models (LLMs). This tool helps AI assistants access factual information from Wikipedia to ground their responses in reliable sources.
Overview
The Wikipedia MCP server provides real-time access to Wikipedia information through a standardized Model Context Protocol interface. This allows LLMs to retrieve accurate and up-to-date information directly from Wikipedia to enhance their responses.
Verified By
Features
- Search Wikipedia: Find articles matching specific queries with enhanced diagnostics
- Retrieve Article Content: Get full article text with all information
- Article Summaries: Get concise summaries of articles
- Section Extraction: Retrieve specific sections from articles
- Link Discovery: Find links within articles to related topics
- Related Topics: Discover topics related to a specific article
- Multi-language Support: Access Wikipedia in different languages by specifying the
--languageor-largument when running the server (e.g.,wikipedia-mcp --language tafor Tamil). - Country/Locale Support: Use intuitive country codes like
--country US,--country China, or--country TWinstead of language codes. Automatically maps to appropriate Wikipedia language variants. - Language Variant Support: Support for language variants such as Chinese traditional/simplified (e.g.,
zh-hansfor Simplified Chinese,zh-twfor Traditional Chinese), Serbian scripts (sr-latn,sr-cyrl), and other regional variants. - Optional caching: Cache API responses for improved performance using --enable-cache
- Modern MCP Transport Support: Supports
stdio,http, andstreamable-http(with legacyssecompatibility). - Optional MCP Transport Auth: Secure network transports with
--auth-mode staticor--auth-mode jwt. - Google ADK Compatibility: Fully compatible with Google ADK agents and other AI frameworks that use strict function calling schemas
Installation
Using pipx (Recommended for Claude Desktop)
The best way to install for Claude Desktop usage is with pipx, which installs the command globally:
# Install pipx if you don't have it
pip install pipx
pipx ensurepath
# Install the Wikipedia MCP server
pipx install wikipedia-mcp
This ensures the wikipedia-mcp command is available in Claude Desktop's PATH.
Installing via Smithery
To install wikipedia-mcp for Claude Desktop automatically via Smithery:
npx -y @smithery/cli install @Rudra-ravi/wikipedia-mcp --client claude
From PyPI (Alternative)
You can also install directly from PyPI:
pip install wikipedia-mcp
Note: If you use this method and encounter connection issues with Claude Desktop, you may need to use the full path to the command in your configuration. See the Configuration section for details.
Using a virtual environment
# Create a virtual environment
python3 -m venv venv
# Activate the virtual environment
source venv/bin/activate
# Install the package
pip install git+https://github.com/rudra-ravi/wikipedia-mcp.git
From source
# Clone the repository
git clone https://github.com/rudra-ravi/wikipedia-mcp.git
cd wikipedia-mcp
# Create a virtual environment
python3 -m venv wikipedia-mcp-env
source wikipedia-mcp-env/bin/activate
# Install in development mode
pip install -e .
Usage
Running the server
# If installed with pipx
wikipedia-mcp
# If installed in a virtual environment
source venv/bin/activate
wikipedia-mcp
# Specify transport protocol (default: stdio)
wikipedia-mcp --transport stdio # For Claude Desktop
wikipedia-mcp --transport http --host 0.0.0.0 --port 8080 --path /mcp
wikipedia-mcp --transport streamable-http --host 0.0.0.0 --port 8080 --path /mcp
wikipedia-mcp --transport sse # Legacy compatibility transport
# Specify language (default: en for English)
wikipedia-mcp --language ja # Example for Japanese
wikipedia-mcp --language zh-hans # Example for Simplified Chinese
wikipedia-mcp --language zh-tw # Example for Traditional Chinese (Taiwan)
wikipedia-mcp --language sr-latn # Example for Serbian Latin script
# Specify country/locale (alternative to language codes)
wikipedia-mcp --country US # English (United States)
wikipedia-mcp --country China # Chinese Simplified
wikipedia-mcp --country Taiwan # Chinese Traditional (Taiwan)
wikipedia-mcp --country Japan # Japanese
wikipedia-mcp --country Germany # German
wikipedia-mcp --country france # French (case insensitive)
# List all supported countries
wikipedia-mcp --list-countries
# Optional: Specify host/port/path for network transport (use 0.0.0.0 for containers)
wikipedia-mcp --transport http --host 0.0.0.0 --port 8080 --path /mcp
# Optional: Enable caching
wikipedia-mcp --enable-cache
# Optional: Use Personal Access Token to avoid rate limiting (403 errors)
wikipedia-mcp --access-token your_wikipedia_token_here
# Or set via environment variable
export WIKIPEDIA_ACCESS_TOKEN=your_wikipedia_token_here
wikipedia-mcp
# Optional: Secure incoming MCP network requests with static bearer token
wikipedia-mcp --transport http --auth-mode static --auth-token your_mcp_token --host 0.0.0.0 --port 8080
# Optional: Secure incoming MCP network requests with JWT validation
wikipedia-mcp --transport http --auth-mode jwt --auth-jwks-uri https://issuer/.well-known/jwks.json --auth-issuer https://issuer
# Security note: prefer http/streamable-http + auth-mode for exposed network transport.
# Combine options
wikipedia-mcp --country Taiwan --enable-cache --access-token your_wikipedia_token --transport http --path /mcp --port 8080
### Docker/Kubernetes
When running inside containers, bind the HTTP MCP server to all interfaces and map
the container port to the host or service:
```bash
# Build and run with Docker
docker build -t wikipedia-mcp .
docker run --rm -p 8080:8080 wikipedia-mcp --transport http --host 0.0.0.0 --port 8080 --path /mcp
Kubernetes example (minimal):
apiVersion: apps/v1
kind: Deployment
metadata:
name: wikipedia-mcp
spec:
replicas: 1
selector:
matchLabels:
app: wikipedia-mcp
template:
metadata:
labels:
app: wikipedia-mcp
spec:
containers:
- name: server
image: your-repo/wikipedia-mcp:latest
args: ["--transport", "http", "--host", "0.0.0.0", "--port", "8080", "--path", "/mcp"]
ports:
- containerPort: 8080
---
apiVersion: v1
kind: Service
metadata:
name: wikipedia-mcp
spec:
selector:
app: wikipedia-mcp
ports:
- name: http
port: 8080
targetPort: 8080
### Configuration for Claude Desktop
Add the following to your Claude Desktop configuration file:
**Option 1: Using command name (requires `wikipedia-mcp` to be in PATH)**
```json
{
"mcpServers": {
"wikipedia": {
"command": "wikipedia-mcp"
}
}
}
Option 2: Using full path (recommended if you get connection errors)
{
"mcpServers": {
"wikipedia": {
"command": "/full/path/to/wikipedia-mcp"
}
}
}
Option 3: With country/language specification
{
"mcpServers": {
"wikipedia-us": {
"command": "wikipedia-mcp",
"args": ["--country", "US"]
},
"wikipedia-taiwan": {
"command": "wikipedia-mcp",
"args": ["--country", "TW"]
},
"wikipedia-japan": {
"command": "wikipedia-mcp",
"args": ["--country", "Japan"]
}
}
}
To find the full path, run: which wikipedia-mcp
Configuration file locations:
- macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
- Windows: %APPDATA%/Claude/claude_desktop_config.json
- Linux: ~/.config/Claude/claude_desktop_config.json
Note: If you encounter connection errors, see the Troubleshooting section for solutions.
Documentation Index
- CLI usage and options: see
docs/CLI.md - API and MCP tools/resources: see
docs/API.md - Architecture overview: see
docs/ARCHITECTURE.md - User guide and troubleshooting: see
docs/USER_GUIDE.md - Development guide: see
docs/DEVELOPMENT.md - Testing guide: see
docs/TESTING.md
Available MCP Tools
The Wikipedia MCP server provides the following tools for LLMs to interact with Wikipedia:
Each tool is also exposed with a wikipedia_-prefixed alias (for example, wikipedia_get_article) for improved cross-server discoverability.
search_wikipedia
Search Wikipedia for articles matching a query.
Parameters:
- query (string): The search term
- limit (integer, optional): Maximum number of results to return (default: 10)
Returns: - A list of search results with titles, snippets, and metadata
get_article
Get the full content of a Wikipedia article.
Parameters:
- title (string): The title of the Wikipedia article
Returns: - Article content including text, summary, sections, links, and categories
get_summary
Get a concise summary of a Wikipedia article.
Parameters:
- title (string): The title of the Wikipedia article
Returns: - A text summary of the article
get_sections
Get the sections of a Wikipedia article.
Parameters:
- title (string): The title of the Wikipedia article
Returns: - A structured list of article sections with their content
get_links
Get the links contained within a Wikipedia article.
Parameters:
- title (string): The title of the Wikipedia article
Returns: - A list of links to other Wikipedia articles
get_coordinates
Get the coordinates of a Wikipedia article.
Parameters:
- title (string): The title of the Wikipedia article
Returns:
- A dictionary containing coordinate information including:
- title: The article title
- pageid: The page ID
- coordinates: List of coordinate objects with latitude, longitude, and metadata
- exists: Whether the article exists
- error: Any error message if retrieval failed
get_related_topics
Get topics related to a Wikipedia article based on links and categories.
Parameters:
- title (string): The title of the Wikipedia article
- limit (integer, optional): Maximum number of related topics (default: 10)
Returns: - A list of related topics with relevance information
summarize_article_for_query
Get a summary of a Wikipedia article tailored to a specific query.
Parameters:
- title (string): The title of the Wikipedia article
- query (string): The query to focus the summary on
- max_length (integer, optional): Maximum length of the summary (default: 250)
Returns: - A dictionary containing the title, query, and the focused summary
summarize_article_section
Get a summary of a specific section of a Wikipedia article.
Parameters:
- title (string): The title of the Wikipedia article
- section_title (string): The title of the section to summarize
- max_length (integer, optional): Maximum length of the summary (default: 150)
Returns: - A dictionary containing the title, section title, and the section summary
extract_key_facts
Extract key facts from a Wikipedia article, optionally focused on a specific topic within the article.
Parameters:
- title (string): The title of the Wikipedia article
- topic_within_article (string, optional): A specific topic within the article to focus fact extraction
- count (integer, optional): Number of key facts to extract (default: 5)
Returns: - A dictionary containing the title, topic, and a list of extracted facts
Country/Locale Support
The Wikipedia MCP server supports intuitive country and region codes as an alternative to language codes. This makes it easier to access region-specific Wikipedia content without needing to know language codes.
Supported Countries and Regions
Use --list-countries to see all supported countries:
wikipedia-mcp --list-countries
This will display countries organized by language, for example:
Supported Country/Locale Codes:
========================================
en: US, USA, United States, UK, GB, Canada, Australia, ...
zh-hans: CN, China
zh-tw: TW, Taiwan
ja: JP, Japan
de: DE, Germany
fr: FR, France
es: ES, Spain, MX, Mexico, AR, Argentina, ...
pt: PT, Portugal, BR, Brazil
ru: RU, Russia
ar: SA, Saudi Arabia, AE, UAE, EG, Egypt, ...
Usage Examples
# Major countries by code
wikipedia-mcp --country US # United States (English)
wikipedia-mcp --country CN # China (Simplified Chinese)
wikipedia-mcp --country TW # Taiwan (Traditional Chinese)
wikipedia-mcp --country JP # Japan (Japanese)
wikipedia-mcp --country DE # Germany (German)
wikipedia-mcp --country FR # France (French)
wikipedia-mcp --country BR # Brazil (Portuguese)
wikipedia-mcp --country RU # Russia (Russian)
# Countries by full name (case insensitive)
wikipedia-mcp --country "United States"
wikipedia-mcp --country China
wikipedia-mcp --country Taiwan
wikipedia-mcp --country Japan
wikipedia-mcp --country Germany
wikipedia-mcp --country france # Case insensitive
# Regional variants
wikipedia-mcp --country HK # Hong Kong (Traditional Chinese)
wikipedia-mcp --country SG # Singapore (Simplified Chinese)
wikipedia-mcp --country "Saudi Arabia" # Arabic
wikipedia-mcp --country Mexico # Spanish
Country-to-Language Mapping
The server automatically maps country codes to appropriate Wikipedia language editions:
- English-speaking: US, UK, Canada, Australia, New Zealand, Ireland, South Africa β
en - Chinese regions:
- CN, China β
zh-hans(Simplified Chinese) - TW, Taiwan β
zh-tw(Traditional Chinese - Taiwan) - HK, Hong Kong β
zh-hk(Traditional Chinese - Hong Kong) - SG, Singapore β
zh-sg(Simplified Chinese - Singapore) - Major languages: JPβ
ja, DEβde, FRβfr, ESβes, ITβit, RUβru, etc. - Regional variants: Supports 140+ countries and regions
Error Handling
If you specify an unsupported country, you'll get a helpful error message:
$ wikipedia-mcp --country INVALID
Error: Unsupported country/locale: 'INVALID'.
Supported country codes include: US, USA, UK, GB, CA, AU, NZ, IE, ZA, CN.
Use --language parameter for direct language codes instead.
Use --list-countries to see supported country codes.
Language Variants
The Wikipedia MCP server supports language variants for languages that have multiple writing systems or regional variations. This feature is particularly useful for Chinese, Serbian, Kurdish, and other languages with multiple scripts or regional differences.
Supported Language Variants
Chinese Language Variants
zh-hans- Simplified Chinesezh-hant- Traditional Chinesezh-tw- Traditional Chinese (Taiwan)zh-hk- Traditional Chinese (Hong Kong)zh-mo- Traditional Chinese (Macau)zh-cn- Simplified Chinese (China)zh-sg- Simplified Chinese (Singapore)zh-my- Simplified Chinese (Malaysia)
Serbian Language Variants
sr-latn- Serbian Latin scriptsr-cyrl- Serbian Cyrillic script
Kurdish Language Variants
ku-latn- Kurdish Latin scriptku-arab- Kurdish Arabic script
Norwegian Language Variants
no- Norwegian (automatically mapped to BokmΓ₯l)
Usage Examples
# Access Simplified Chinese Wikipedia
wikipedia-mcp --language zh-hans
# Access Traditional Chinese Wikipedia (Taiwan)
wikipedia-mcp --language zh-tw
# Access Serbian Wikipedia in Latin script
wikipedia-mcp --language sr-latn
# Access Serbian Wikipedia in Cyrillic script
wikipedia-mcp --language sr-cyrl
How Language Variants Work
When you specify a language variant like zh-hans, the server:
1. Maps the variant to the base Wikipedia language (e.g., zh for Chinese variants)
2. Uses the base language for API connections to the Wikipedia servers
3. Includes the variant parameter in API requests to get content in the specific variant
4. Returns content formatted according to the specified variant's conventions
This approach ensures optimal compatibility with Wikipedia's API while providing access to variant-specific content and formatting.
Example Prompts
Once the server is running and configured with Claude Desktop, you can use prompts like:
General Wikipedia queries:
- "Tell me about quantum computing using the Wikipedia information."
- "Summarize the history of artificial intelligence based on Wikipedia."
- "What does Wikipedia say about climate change?"
- "Find Wikipedia articles related to machine learning."
- "Get me the introduction section of the article on neural networks from Wikipedia."
- "What are the coordinates of the Eiffel Tower?"
- "Find the latitude and longitude of Mount Everest from Wikipedia."
- "Get coordinate information for famous landmarks in Paris."
Using country-specific Wikipedia:
- "Search Wikipedia China for information about the Great Wall." (uses Chinese Wikipedia)
- "Tell me about Tokyo from Japanese Wikipedia sources."
- "What does German Wikipedia say about the Berlin Wall?"
- "Find information about the Eiffel Tower from French Wikipedia."
- "Get Taiwan Wikipedia's article about Taiwanese cuisine."
Language variant examples:
- "Search Traditional Chinese Wikipedia for information about Taiwan."
- "Find Simplified Chinese articles about modern China."
- "Get information from Serbian Latin Wikipedia about Belgrade."
MCP Resources
The server also provides MCP resources (similar to HTTP endpoints but for MCP):
search/{query}: Search Wikipedia for articles matching the queryarticle/{title}: Get the full content of a Wikipedia articlesummary/{title}: Get a summary of a Wikipedia articlesections/{title}: Get the sections of a Wikipedia articlelinks/{title}: Get the links in a Wikipedia articlecoordinates/{title}: Get the coordinates of a Wikipedia articlesummary/{title}/query/{query}/length/{max_length}: Get a query-focused summary of an articlesummary/{title}/section/{section_title}/length/{max_length}: Get a summary of a specific article sectionfacts/{title}/topic/{topic_within_article}/count/{count}: Extract key facts from an article
Development
Local Development Setup
# Clone the repository
git clone https://github.com/rudra-ravi/wikipedia-mcp.git
cd wikipedia-mcp
# Create a virtual environment
python3 -m venv venv
source venv/bin/activate
# Install the package in development mode
pip install -e .
# Install development and test dependencies
pip install -r requirements-dev.txt
# Run the server
wikipedia-mcp
Project Structure
wikipedia_mcp/: Main package__main__.py: Entry point for the packageserver.py: MCP server implementationwikipedia_client.py: Wikipedia API clientapi/: API implementationcore/: Core functionalityutils/: Utility functionstests/: Test suitetest_basic.py: Basic package teststest_cli.py: Command-line interface teststest_server_tools.py: Comprehensive server and tool tests
Testing
The project includes a comprehensive test suite to ensure reliability and functionality.
Test Structure
The test suite is organized in the tests/ directory with the following test files:
test_basic.py: Basic package functionality teststest_cli.py: Command-line interface and transport teststest_server_tools.py: Comprehensive tests for all MCP tools and Wikipedia client functionality
Running Tests
Run All Tests
# Install test dependencies
pip install -r requirements-dev.txt
# Run all tests
python -m pytest tests/ -v
# Run tests with coverage
python -m pytest tests/ --cov=wikipedia_mcp --cov-report=html
Run Specific Test Categories
# Run only unit tests (excludes integration tests)
python -m pytest tests/ -v -m "not integration"
# Run only integration tests (requires internet connection)
python -m pytest tests/ -v -m "integration"
# Run specific test file
python -m pytest tests/test_server_tools.py -v
Test Categories
Unit Tests
- WikipediaClient Tests: Mock-based tests for all client methods
- Search functionality
- Article retrieval
- Summary extraction
- Section parsing
- Link extraction
- Related topics discovery
- Server Tests: MCP server creation and tool registration
- CLI Tests: Command-line interface functionality
Integration Tests
- Real API Tests: Tests that make actual calls to Wikipedia API
- End-to-End Tests: Complete workflow testing
Test Configuration
The project uses pytest.ini for test configuration:
[pytest]
markers =
integration: marks tests as integration tests (may require network access)
slow: marks tests as slow running
testpaths = tests
addopts = -v --tb=short
Continuous Integration
All tests are designed to: - Run reliably in CI/CD environments - Handle network failures gracefully - Provide clear error messages - Cover edge cases and error conditions
Adding New Tests
When contributing new features:
- Add unit tests for new functionality
- Include both success and failure scenarios
- Mock external dependencies (Wikipedia API)
- Add integration tests for end-to-end validation
- Follow existing test patterns and naming conventions
Troubleshooting
Common Issues
Claude Desktop Connection Issues
Problem: Claude Desktop shows errors like spawn wikipedia-mcp ENOENT or cannot find the command.
Cause: This occurs when the wikipedia-mcp command is installed in a user-specific location (like ~/.local/bin/) that's not in Claude Desktop's PATH.
Solutions:
- Use full path to the command (Recommended):
json { "mcpServers": { "wikipedia": { "command": "/home/username/.local/bin/wikipedia-mcp" } } }
To find your exact path, run: which wikipedia-mcp
-
Install with pipx for global access:
bash pipx install wikipedia-mcpThen use the standard configuration:json { "mcpServers": { "wikipedia": { "command": "wikipedia-mcp" } } } -
Create a symlink to a global location:
bash sudo ln -s ~/.local/bin/wikipedia-mcp /usr/local/bin/wikipedia-mcp
Other Issues
- Article Not Found: Check the exact spelling of article titles
- Rate Limiting: Wikipedia API has rate limits; consider adding delays between requests
- Large Articles: Some Wikipedia articles are very large and may exceed token limits
Troubleshooting Search Issues
If you're experiencing empty search results, use the new diagnostic tools:
1. Test Connectivity
Use the test_wikipedia_connectivity tool to check if you can reach Wikipedia's API:
{
"tool": "test_wikipedia_connectivity"
}
This returns diagnostics including:
- Connection status (success or failed)
- Response time in milliseconds
- Site/host information when successful
- Error details when connectivity fails
2. Enhanced Search Error Information
The search_wikipedia tool now returns detailed metadata:
{
"tool": "search_wikipedia",
"arguments": {
"query": "Ada Lovelace",
"limit": 10
}
}
Example response:
{
"query": "Ada Lovelace",
"results": [...],
"count": 5,
"status": "success",
"language": "en"
}
When no results are found, you receive:
{
"query": "nonexistent",
"results": [],
"status": "no_results",
"count": 0,
"language": "en",
"message": "No search results found. This could indicate connectivity issues, API errors, or simply no matching articles."
}
3. Common Search Issues and Solutions
- Empty results: Run the connectivity test, verify query spelling, try broader terms.
- Connection errors: Check firewall or proxy settings, ensure
*.wikipedia.orgis reachable. - API limits: Requests with
limit > 500are automatically capped; negative values reset to the default (10).
4. Debugging with Verbose Logging
Launch the server with debug logging for deeper insight:
wikipedia-mcp --log-level DEBUG
This emits the request parameters, response status codes, and any warnings returned by the API.
Understanding the Model Context Protocol (MCP)
The Model Context Protocol (MCP) is not a traditional HTTP API but a specialized protocol for communication between LLMs and external tools. Key characteristics:
- Uses stdio for local integrations and streamable HTTP for network integrations (
sseretained for legacy compatibility) - Designed specifically for AI model interaction
- Provides standardized formats for tools, resources, and prompts
- Integrates directly with Claude and other MCP-compatible AI systems
Claude Desktop acts as the MCP client, while this server provides the tools and resources that Claude can use to access Wikipedia information.
Contributing
Contributions are welcome! Please see CONTRIBUTING.md for guidelines.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Connect with the Author
- π Portfolio: ravikumar-dev.me
- π Blog: Medium
- πΌ LinkedIn: in/ravi-kumar-e
- π¦ Twitter: @Ravikumar_d3v
Parameters
query
required
limit
Parameters
query
required
limit
Parameters
title
required
Parameters
title
required
Parameters
title
required
Parameters
title
required
Parameters
title
required
query
required
max_length
Parameters
title
required
query
required
max_length
Parameters
title
required
section_title
required
max_length
Parameters
title
required
section_title
required
max_length
Parameters
title
required
topic_within_article
count
Parameters
title
required
topic_within_article
count
Parameters
title
required
limit
Parameters
title
required
limit
Parameters
title
required
Parameters
title
required
Parameters
title
required
Parameters
title
required
Parameters
title
required
Parameters
title
required
out of 100
Security Review
Integration: Wikipedia
Repository: https://github.com/Rudra-ravi/wikipedia-mcp
Commit: latest
Scan Date: 2026-03-13 16:45 UTC
Security Score
55 / 100
Tier Classification
Reject
OWASP Alignment
OWASP Rubric
- Standard: OWASP Top 10 (2021) aligned review
- Core methodology: architecture context, trust boundaries, data-flow tracing, threat modeling, control verification, and evidence-backed validation
- Key characteristics considered: exploitability, impact, likelihood, attacker preconditions, and business context
OWASP Security Category Mapping
- A01 Broken Access Control: none
- A02 Cryptographic Failures: 7 finding(s)
- A03 Injection: none
- A04 Insecure Design: none
- A05 Security Misconfiguration: none
- A06 Vulnerable and Outdated Components: none
- A07 Identification and Authentication Failures: 28 finding(s)
- A08 Software and Data Integrity Failures: none
- A09 Security Logging and Monitoring Failures: none
- A10 Server-Side Request Forgery: none
Static Analysis Findings (Bandit)
High Severity
None
Medium Severity
- Possible binding to all interfaces. in tests/test_docker_compatibility.py:30 (confidence: MEDIUM)
- Possible binding to all interfaces. in tests/test_new_features.py:141 (confidence: MEDIUM)
Low Severity
- Consider possible security implications associated with the subprocess module. in test_build.py:6 (confidence: HIGH)
- subprocess call - check for execution of untrusted input. in test_build.py:17 (confidence: HIGH)
- Use of assert detected. The enclosed code will be removed when compiling to optimised byte code. in tests/test_access_token.py:18 (confidence: HIGH)
- Possible hardcoded password: 'test_token_123' in tests/test_access_token.py:22 (confidence: MEDIUM)
- Use of assert detected. The enclosed code will be removed when compiling to optimised byte code. in tests/test_access_token.py:24 (confidence: HIGH)
- Use of assert detected. The enclosed code will be removed when compiling to optimised byte code. in tests/test_access_token.py:31 (confidence: HIGH)
- Use of assert detected. The enclosed code will be removed when compiling to optimised byte code. in tests/test_access_token.py:32 (confidence: HIGH)
- Possible hardcoded password: 'test_token_123' in tests/test_access_token.py:36 (confidence: MEDIUM)
- Use of assert detected. The enclosed code will be removed when compiling to optimised byte code. in tests/test_access_token.py:40 (confidence: HIGH)
- Use of assert detected. The enclosed code will be removed when compiling to optimised byte code. in tests/test_access_token.py:41 (confidence: HIGH)
- Use of assert detected. The enclosed code will be removed when compiling to optimised byte code. in tests/test_access_token.py:42 (confidence: HIGH)
- Possible hardcoded password: 'test_token_123' in tests/test_access_token.py:47 (confidence: MEDIUM)
- Use of assert detected. The enclosed code will be removed when compiling to optimised byte code. in tests/test_access_token.py:77 (confidence: HIGH)
- Use of assert detected. The enclosed code will be removed when compiling to optimised byte code. in tests/test_access_token.py:78 (confidence: HIGH)
- Possible hardcoded password: 'test_token_123' in tests/test_access_token.py:83 (confidence: MEDIUM)
- Use of assert detected. The enclosed code will be removed when compiling to optimised byte code. in tests/test_access_token.py:118 (confidence: HIGH)
- Use of assert detected. The enclosed code will be removed when compiling to optimised byte code. in tests/test_access_token.py:119 (confidence: HIGH)
- Possible hardcoded password: 'secret_token_123' in tests/test_access_token.py:127 (confidence: MEDIUM)
- Use of assert detected. The enclosed code will be removed when compiling to optimised byte code. in tests/test_access_token.py:132 (confidence: HIGH)
- Use of assert detected. The enclosed code will be removed when compiling to optimised byte code. in tests/test_access_token.py:141 (confidence: HIGH)
Build Status
SKIPPED
Build step was skipped to avoid running untrusted build commands by default.
Tests
Detected (pytest)
Documentation
README: Present
Dependency file: Present
AI Security Review
OWASP-Aligned Security Review Report for repository: Wikipedia (wikipedia-mcp)
1) OWASP Review Methodology Applied
- Orientation: Examined project structure and prioritized files (wikipedia_mcp/server.py, wikipedia_mcp/wikipedia_client.py, wikipedia_mcp/auth_config.py, wikipedia_mcp/main.py, wikipedia_mcp/schemas.py, requirements.txt, Dockerfile, tests).
- Entry points: Read CLI entrypoint (wikipedia_mcp/main.py) and server factory (wikipedia_mcp/server.py) to understand exposed transports and auth.
- Data flows: Traced untrusted inputs (CLI args, HTTP path parameters, MCP tool arguments) into WikipediaClient and network calls (requests.get) and ASGI middleware.
- Attack surface: Identified network endpoints, auth handling (static bearer & JWT provider creation), request-building and header handling, caching, and CLI/ENV handling of secrets.
- Threat modeling / controls verification: Mapped findings to OWASP Top 10 categories, assessed exploitability and impact, and verified defensive controls (e.g., token logging safeguards, input trimming, parameter caps).
- Validation: Verified findings against concrete code locations and test coverage where present.
2) OWASP Top-10 (2021) Mapping
- A01 Broken Access Control: StaticBearerAuthMiddleware design and lack of brute-force protections, token handling.
- A02 Cryptographic Failures: (observations around JWT configuration and potential misconfigurationβno direct crypto ops in repo but reliant on third-party provider).
- A03 Injection: No injection (SQL/OS/command) found in production code. Tests invoke subprocess (expected).
- A04 Insecure Design: Missing rate limiting, lack of transport-level TLS enforcement guidance, lack of auth brute-force protections, potential resource-exhaustion via API abuse.
- A05 Security Misconfiguration: CLI token via argv, Dockerfile/port mismatch, default network binding behavior and deployment docs.
- A06 Vulnerable and Outdated Components: External dependencies (fastmcp, requests, wikipedia-api) need SCA/patching; requirements.txt uses permissive version ranges.
- A07 Identification and Authentication Failures: Use of static tokens, comparison method, possible exposure of tokens via CLI or environment.
- A08 Software and Data Integrity Failures: No signing/verification for code artifacts; dependency integrity not enforced.
- A09 Security Logging and Monitoring Failures: No rate-limiting or monitoring hooks; limited audit logging of auth failures.
- A10 Server-Side Request Forgery (SSRF): No direct SSRF; api_url is constructed as "https://{base_language}.wikipedia.org/..." which prevents arbitrary host control.
3) Critical Vulnerabilities (RCE/SQLi/unsafe deserialization): NONE found in production code
- No use of eval/exec/compile/pickle or subprocess in production code. All network calls are via requests.get to controlled API endpoints.
4) High Severity Issues
4.1 Credential exposure via CLI arguments (medium-high)
- Files/locations: wikipedia_mcp/main.py (parser.register for --access-token) and usage at assignment: access_token = args.access_token or os.getenv("WIKIPEDIA_ACCESS_TOKEN").
- Issue: Passing secrets on the command line (e.g., --access-token, --auth-token) exposes them in process listings (ps) and shell history on many systems. The code supports and tests CLI token usage (tests/test_access_token.py).
- OWASP mapping: A07 Identification and Authentication Failures, A05 Security Misconfiguration.
- Exploitability: Easy (local attacker or co-tenant in shared hosting) to read process args.
- Impact: Exposure of Wikipedia access token or MCP static token, which could be used to increase rate limits or access protected resources.
- Remediation: Document and strongly recommend using environment variables instead of CLI args for secrets. Add an explicit warning in --help and docs. Prefer reading secrets from stdin or a file with restrictive permissions. Example change: in main.py, add a deprecation/warning and avoid allowing --auth-token/--access-token by default (or accept but warn). References: main.py (parser.add_argument --access-token block ~lines 118-132) and auth_config.build_auth_config handling of auth_token (auth_config.py lines ~69-90).
4.2 Static bearer token comparison is not constant-time (low-medium)
- File/line: wikipedia_mcp/server.py (StaticBearerAuthMiddleware.init and call, approx lines 47-66).
- Issue: Authorization header compared with expected string using direct equality (authorization != self._expected). This may allow timing attacks to guess tokens in high-value deployments. Also no lockout or throttling for repeated invalid attempts.
- OWASP mapping: A07, A04.
- Exploitability: Low in most deployments (requires network access to auth-protected endpoint and precise timing measurement). Higher if deployed on same host/fast network.
- Impact: Disclosure of static token leading to unauthorized access to MCP API.
- Remediation: Use hmac.compare_digest to compare tokens to mitigate timing leakage and add throttling/lockout or rate-limiting middleware for network transports. E.g., replace authorization != self._expected with not hmac.compare_digest(authorization or "", self._expected).
4.3 Lack of rate limiting / brute force protections (medium)
- Files: server.py (the ASGI server creation and middleware insertion points) and create_server where middleware is set via build_http_middleware.
- Issue: No rate limiting, per-IP authentication failure logging, or integration point for WAF. This permits credential guessing and API abuse (amplified by absence of transport-level TLS enforcement guidance).
- OWASP mapping: A04, A09.
- Exploitability: High (internet-exposed service) if deployed publicly.
- Impact: Denial of service to downstream Wikipedia API keys, rate limit exhaustion of the integration, potential service disruption.
- Remediation: Add an optional rate-limiting middleware (per-IP, per-token) for network transports; add exponential backoff and logging on authentication failures. document recommended deployment behind TLS-terminating reverse proxy with client IP preservation and WAF.
5) Medium Severity Issues
5.1 permissive dependency spec and supply-chain risk (medium)
- File: requirements.txt (fastmcp>=2.3.0, wikipedia-api>=0.8.0, requests>=2.31.0, python-dotenv>=1.0.0).
- Issue: Broad >= ranges may allow pulling vulnerable versions; no pinned hashes (no pip-compile/poetry lock). No SBOM provided.
- OWASP mapping: A06, A08.
- Remediation: Pin exact vetted versions in production deployments, publish an SBOM, and run SCA checks regularly. Add CI SCA scanning.
5.2 Dockerfile / deployment minor misconfigurations
- File: Dockerfile
- Observations: EXPOSE 8080 but MCP server default port is 8000; ENTRYPOINT uses wikipedia-mcp CLI defaulting to stdio transport which is not appropriate inside containers by default. No USER instruction β container runs as root by default.
- OWASP mapping: A05.
- Impact: Operational confusion and potential insecure container defaults.
- Remediation: Align EXPOSE with runtime, document intended container usage, and add a non-root USER and minimal filesystem permissions.
5.3 Lack of TLS guidance / enforcement for network transport (medium)
- Files: main.py server.run for http transport; no TLS handling inside the server.
- Issue: The server accepts network transport without explicit guidance to terminate TLS at reverse proxy. Running the Python process directly on 0.0.0.0:8000 would be plaintext.
- OWASP mapping: A02, A05.
- Remediation: Document that the server must be deployed behind HTTPS termination (nginx, cloud load balancer). Consider providing an option or middleware to enforce TLS headers or reject non-TLS when run with a publicly bound host.
6) Low Severity Issues / Best-practice gaps
6.1 Timing comparison (see 4.2) β low in many setups.
6.2 Logging content safety: Tests assert tokens are not logged; code appears to avoid logging Authorization headers. Continue to audit future logging additions.
6.3 Input length checks are present in some places but not everywhere: titles passed to wikipediaapi.page are trimmed in some functions but not always; consider enforcing length caps or validating characters. (Files: wikipedia_mcp/wikipedia_client.py search: trimming done for queries; title inputs often passed unchanged.)
6.4 Potential information leakage via error body_preview returned in _request_json on HTTP errors β contains up to 200 chars of remote response. If upstream Wikipedia responses contain sensitive headers/content, they may appear in diagnostics. This is low risk for Wikipedia but note for other upstreams.
7) Key Risk Characteristics (for prioritized findings)
- Credential exposure via CLI args
- Exploitability: High locally (ps/equivalent), low remote. Requires access to same host/process table.
- Impact: Medium-High (access token compromise).
- Preconditions: Attacker with ability to enumerate process table or access to container metadata.
- Static bearer token timing comparison
- Exploitability: Low to medium (requires repeated measurements and network access).
- Impact: Medium (token disclosure leads to unauthorized access).
-
Preconditions: Attacker network access to service, ability to measure timings.
-
No rate limiting
- Exploitability: High if service is internet-accessible.
- Impact: High (API abuse/DoS, cost/rate-limit exhaustion).
-
Preconditions: Service exposed to untrusted networks.
-
Dependency & supply-chain risk
- Exploitability: Medium (depends on vulnerabilities in third-party packages; SCA needed).
- Impact: High if any dependency contains severe vulnerabilities.
- Preconditions: Using outdated/vulnerable dependency versions.
8) Positive Security Practices Observed
- Authorization headers are intentionally not logged by tests; code avoids printing tokens in standard informational logs.
- Input validation exists for search queries (trimming, length cap 300) and limits for 'limit' parameters with warnings.
- API calls have bounded retries and timeout handling in _request_json with backoff logic, protecting against some transient failures.
- Many operations catch exceptions and return safe structured error objects (avoids stacktrace leakage to clients).
- Tests cover auth modes, token handling, and logging behaviors demonstrating security awareness.
9) Recommendations (concrete fixes with file:line references)
Note: line numbers are approximate and refer to the files listed.
9.1 Protect secrets from CLI exposure (HIGH / A07)
- Files: wikipedia_mcp/main.py (parser.add_argument --access-token at ~lines 118-130; assignment at ~line 244).
- Change: Deprecate or strongly warn about supplying secrets via CLI. Modify CLI help text to explicitly recommend environment variables or files; optionally disable --access-token and read from WIKIPEDIA_ACCESS_TOKEN only.
- Example remediation: In main.py update help: "(Sensitive β prefer WIKIPEDIA_ACCESS_TOKEN env var or read from file/stdin)" and add runtime warning if --access-token is used (log at WARNING). Also document in README and PUBLISHING_GUIDE.md.
9.2 Use constant-time comparison for bearer tokens (MEDIUM / A07)
- File: wikipedia_mcp/server.py (StaticBearerAuthMiddleware.call, approx 47-66)
- Change: Replace equality check with hmac.compare_digest to prevent timing attacks. Also ensure authorization header is normalized and defaulted to empty string when missing.
- Code sample: import hmac / compare_digest(authorization or "", self._expected).
9.3 Add rate-limiting and lockout for network transports (MEDIUM / A04)
- Files: wikipedia_mcp/server.py (integration point, build_http_middleware); optionally provide an optional param to create_server to accept middleware or integrate a simple in-process rate limiter.
- Change: Add an optional middleware (token/IP-based) that limits requests per minute and throttles authentication failures. For production, recommend deploying behind reverse proxy rate limiting (nginx/Cloud LB) and WAF.
9.4 Enforce/advise TLS and secure deployment defaults (MEDIUM / A02, A05)
- Files: README.md, Dockerfile, and main.py
- Change: Document that public deployments must be behind TLS termination. In Dockerfile, set USER to non-root, expose default port used by server (8000) or clarify intended port mapping. Consider adding environment variables to force TLS-only operation or to require server to be run behind a proxy.
9.5 Harden dependency management and supply-chain (MEDIUM / A06)
- Files: requirements.txt, package metadata
- Change: Pin exact dependency versions (pip freeze or use poetry/poetry.lock), add a requirements.lock or constraints file with hashes (pip-compile --generate-hashes), add CI SCA checks (dependabot, snyk), and publish an SBOM.
9.6 Sanitize and minimize error previews (LOW / A09)
- File: wikipedia_mcp/wikipedia_client.py _request_json (preview returned in body_preview up to 200 chars)
- Change: Remove or reduce returned upstream body previews in error responses, or ensure they are sanitized to avoid leaking sensitive upstream data in diagnostics.
9.7 Improve logging and monitoring for auth/failed requests (LOW / A09)
- Files: server.py and wikipedia_mcp/main.py
- Change: Add structured logging for auth failures (without sensitive data), count/frequency metrics, and integration points for alerting/log aggregation.
9.8 Use secure default user in Dockerfile (LOW / A05)
- File: Dockerfile
- Change: Add a non-root USER with minimal permissions and avoid running as root. Ensure container uses the port the server listens on or adjust CMD/ENTRYPOINT to run network transport with proper args when containerized.
10) Next Tier Upgrade Plan (Bronze / Silver / Gold / Reject)
- Current tier assessment: Silver-leaning Bronze.
- Rationale: The integration has solid input validation, error handling, test coverage including auth behavior, and deliberate design around not logging tokens. However, it lacks operational hardening (rate limiting, TLS enforcement guidance), has CLI token exposure, permissive dependency specs, and some middleware hygiene items.
- Target next tier: Gold (production-ready) β prioritized actions to attain Gold:
- High Priority (must do):
- Stop recommending CLI tokens: update CLI docs/help and prefer env vars or files (fix main.py).
- Use constant-time comparison for static tokens in StaticBearerAuthMiddleware (server.py).
- Implement or document mandatory TLS termination and deployment guidance; update Dockerfile to non-root user.
- Medium Priority:
- Add rate limiting middleware and authentication failure throttling for network transports.
- Pin dependency versions and add SCA/CI scanning and SBOM.
-
Low Priority:
- Improve auth/failed request logging (structured, non-sensitive), add metrics hooks.
- Harden error body_preview sanitization in _request_json.
-
Example prioritized TODOs with estimated effort:
- Immediate (1-3 days): Replace string compare with hmac.compare_digest; update CLI help to warn against passing secrets on argv; document TLS requirement.
- Short term (1-2 weeks): Add a simple in-process rate limiter middleware and auth-failure logging, update Dockerfile to use non-root user and align EXPOSE.
- Medium term (2-4 weeks): Pin dependencies, add CI SCA scans and SBOM, introduce optional stricter configuration flags (e.g., enforce-jwt-only mode), and test boolean hardening under CI.
11) Concrete findings summary (file, approximate line, severity, remediation)
- wikipedia_mcp/main.py: ~lines 118-132 (parser.add_argument --access-token) and ~244 (access_token assignment) β Severity: HIGH (Credential exposure via CLI). Remediation: Remove or deprecate CLI secret args, warn user, prefer env vars, document best practice. (A07 / A05)
- wikipedia_mcp/server.py: ~lines 47-66 (StaticBearerAuthMiddleware) β Severity: MEDIUM (non-constant time token compare; no throttling). Remediation: Use hmac.compare_digest and add rate-limiting / throttling for auth failures. (A07 / A04)
- wikipedia_mcp/wikipedia_client.py: _request_json returns body_preview up to 200 chars on HTTP errors β Severity: LOW. Remediation: sanitize or truncate preview further. (A09)
- requirements.txt β Severity: MEDIUM (supply-chain). Remediation: pin exact versions, provide hashes, add SCA.
- Dockerfile β Severity: LOW (EXPOSE mismatch & run-as-root). Remediation: align EXPOSE with server port; set a non-root USER and document container usage. (A05)
- create_server / server.run usage β Severity: MEDIUM (no rate-limiting or authentication failure logging) β Remediation: add middleware for rate limiting and structured auth logs. (A04 / A09)
12) Final Assessment & Recommendation
- Overall security posture: The integration is well-structured and does not contain high-risk coding vulnerabilities (no RCE, injection, unsafe deserialization). The primary risks are operational/configuration (credential exposure via CLI, missing deployment/TLS guidance, lack of rate limiting) and dependency management. Fixing the highlighted items will significantly reduce attack surface and raise the integration to a Gold-ready posture for production use.
If you want, I can prepare a short patch (diff) implementing the most critical fixes: use hmac.compare_digest in StaticBearerAuthMiddleware and add CLI warnings about using --access-token. I can also draft a rate-limiting ASGI middleware sample for inclusion.
-- End of security review --
Summary
Security Score: 55/100 (Reject)
Static analysis found 0 high, 2 medium, and 460 low severity issues.
Build step skipped for safety.
Tests detected.
Sign in to leave a review
No reviews yet β be the first!
Configuration
Docker Image
Docker HubPublished by github.com/Rudra-ravi
