Documentation

Understanding Your Security Score

MCP Security Score provides multiple metrics to help you understand your MCP server's security posture. This guide explains each score and how to interpret them.

The Security Score

Your primary security score is a number from 0 to 100, representing the overall security health of your MCP server.

How It's Calculated

The score starts at 100 points and deducts points for each security finding:

| Severity | Points Deducted | |----------|-----------------| | Critical | -20 | | High | -12 | | Medium | -6 | | Low | -3 | | Info | -1 |

Diminishing Returns

To prevent a single category from tanking your entire score, we apply diminishing returns:

  • First 50 points deducted in a category: Full deduction
  • After 50 points: Further deductions are reduced by 50%

This ensures that even a repository with many low-severity findings won't automatically get an F.

Score Floor

The minimum score is 0. You cannot go negative.

Letter Grades

For quick assessment, scores map to letter grades:

| Score Range | Grade | Meaning | |-------------|-------|---------| | 90-100 | A | Excellent security posture | | 80-89 | B | Good, minor improvements possible | | 70-79 | C | Fair, some issues to address | | 60-69 | D | Poor, significant issues found | | 0-59 | F | Critical vulnerabilities present |

What Each Grade Means

Grade A (90-100)

  • No critical or high severity findings
  • Few medium or low findings
  • Well-structured, secure code
  • Ready for production use

Grade B (80-89)

  • No critical findings
  • May have a few high severity issues
  • Some medium/low findings to address
  • Generally safe, but improvements recommended

Grade C (70-79)

  • May have high severity findings
  • Several medium severity issues
  • Requires attention before production
  • Consider a security review

Grade D (60-69)

  • High severity vulnerabilities present
  • Many security issues across categories
  • Not recommended for production
  • Needs significant remediation

Grade F (0-59)

  • Critical vulnerabilities found
  • Serious security risks present
  • Do not deploy to production
  • Immediate action required

Safety Score

When AI analysis is enabled, you'll also see a Safety Score. This composite score combines:

  • 60% - Static analysis score (the main security score)
  • 40% - AI trust score (Claude's assessment of behavioral safety)

Why Two Scores?

Static analysis excels at finding known patterns but can miss:

  • Novel attack vectors
  • Semantic issues in code logic
  • Context-dependent vulnerabilities

AI analysis provides:

  • Behavioral understanding
  • Intent analysis
  • Risk assessment based on capabilities

Together, they give a more complete picture.

When They Diverge

If your static score and AI trust score differ significantly:

High static, low AI trust:

  • Code follows best practices but has risky capabilities
  • May have legitimate but dangerous operations
  • Review AI's specific concerns

Low static, high AI trust:

  • Many pattern matches but code is contextually safe
  • Could be false positives in test files
  • Review static findings for accuracy

Category Breakdown

Your score is broken down across security categories:

Categories and Weights

| Category | Weight | What It Covers | |----------|--------|----------------| | Tool Permissions | 25% | RCE risks, authentication | | Input Handling | 25% | Filesystem, data validation | | Dependencies | 20% | Supply chain, CVEs | | Code Patterns | 15% | Network, secrets | | Transparency | 15% | MCP-specific best practices |

Per-Category Scores

Each category has its own 0-100 score:

  • 90-100: Category is secure
  • 70-89: Minor issues in this area
  • 50-69: Significant concerns
  • Below 50: Critical problems in this category

Which Categories Matter Most?

Focus on categories with the highest weights first:

  1. Tool Permissions (25%) - RCE vulnerabilities are the most dangerous
  2. Input Handling (25%) - Path traversal and injection attacks
  3. Dependencies (20%) - Supply chain attacks are increasingly common
  4. Code Patterns (15%) - Exposed secrets are easily exploitable
  5. Transparency (15%) - MCP best practices improve overall safety

Severity Levels

Individual findings are tagged with severity levels:

Critical

  • Color: Red
  • Impact: Immediate security risk
  • Examples: Remote code execution, exposed secrets, known CVEs
  • Action: Fix immediately before any deployment

High

  • Color: Orange
  • Impact: Significant vulnerability
  • Examples: Path traversal, disabled TLS, SQL injection
  • Action: Address before production deployment

Medium

  • Color: Yellow
  • Impact: Moderate risk
  • Examples: Missing validation, excessive dependencies
  • Action: Plan to fix in near-term

Low

  • Color: Blue
  • Impact: Minor issue
  • Examples: Missing descriptions, code style concerns
  • Action: Good to fix, but not urgent

Info

  • Color: Gray
  • Impact: Informational only
  • Examples: Hardcoded URLs (for review), debug statements
  • Action: Review and decide if intentional

Improving Your Score

To improve your security score:

Quick Wins (Big Impact)

  1. Remove hardcoded secrets and use environment variables
  2. Fix any eval() or exec() usage
  3. Add timeouts to network requests
  4. Update dependencies with known CVEs

Medium Effort

  1. Add input validation to tool handlers
  2. Implement path sanitization
  3. Add tool descriptions for transparency
  4. Review and limit tool permissions

Long-term

  1. Reduce dependency count where possible
  2. Add security-focused code review process
  3. Set up automated scanning in CI/CD
  4. Regular security audits

Score Caching

Scan results are cached for 24 hours by default:

  • Re-scanning the same repository returns cached results
  • Use Rescan button to force a fresh analysis
  • Cache helps with rate limits and performance

Comparing Scores

When comparing MCP servers:

  • Same category scores are directly comparable
  • Different repository sizes may skew raw scores
  • Focus on severity distribution not just the total score
  • Consider the specific findings not just the number

Next Steps