Skip to main content
Create custom Semgrep rules for detecting security vulnerabilities, bug patterns, and code quality issues using a rigorous test-driven workflow.

Overview

The Semgrep Rule Creator plugin guides you through creating production-quality Semgrep rules with proper testing and validation. It enforces a strict test-first methodology to ensure rules are accurate, maintainable, and free from false positives. Key capabilities:
  • Test-driven rule development (write tests first, then iterate)
  • AST analysis to craft precise patterns
  • Support for both taint mode (data flow) and pattern matching
  • Comprehensive reference documentation from Semgrep docs
  • Common vulnerability patterns by language

Installation

/plugin install trailofbits/skills/plugins/semgrep-rule-creator

Prerequisites

  • Semgrep installed (pip install semgrep or brew install semgrep)

When to Use

Use this plugin when you need to:
  • Create custom Semgrep rules for detecting specific bug patterns
  • Write rules for security vulnerability detection
  • Build taint mode rules for data flow analysis
  • Develop pattern matching rules for code quality checks
  • Enforce coding standards with custom detections

When NOT to Use

Do NOT use this plugin for:
  • Running existing Semgrep rulesets (use semgrep scan instead)
  • General static analysis without custom rules (use the static-analysis plugin)

Core Workflow

The plugin enforces a strict 7-step workflow:
1

Analyze the Problem

Understand the bug pattern, target language, and determine whether to use taint mode or pattern matching.
2

Write Tests First

Create test file with vulnerable cases (ruleid:) and safe cases (ok:) before writing any rule code.
3

Analyze AST Structure

Run semgrep --dump-ast to understand how Semgrep parses the code.
4

Write the Rule

Create the YAML rule file using appropriate pattern operators.
5

Iterate Until Tests Pass

Run semgrep --test and fix issues until all tests pass.
6

Optimize the Rule

Remove redundancies and simplify patterns while keeping all tests passing.
7

Final Validation

Run final validation to confirm the rule works correctly.

Taint Mode vs Pattern Matching

When to Use Taint Mode (Prioritize)

Use taint mode for data flow issues where untrusted input reaches dangerous sinks:
rules:
  - id: insecure-eval
    languages: [python]
    severity: HIGH
    message: User input passed to eval() allows code execution
    mode: taint
    pattern-sources:
      - pattern: request.args.get(...)
    pattern-sinks:
      - pattern: eval(...)
Why prioritize taint mode? Pattern matching finds syntax but misses context. A pattern eval($X) matches both eval(user_input) (vulnerable) and eval("safe_literal") (safe). Taint mode tracks data flow, so it only alerts when untrusted data actually reaches the sink.

When to Use Pattern Matching

Use pattern matching for simple syntactic patterns without data flow requirements:
rules:
  - id: deprecated-function
    languages: [javascript]
    severity: WARNING
    message: Deprecated function md5() should not be used
    pattern: md5(...)

Output Structure

Each rule produces exactly 2 files in a directory named after the rule ID:
<rule-id>/
├── <rule-id>.yaml     # Semgrep rule
└── <rule-id>.<ext>    # Test file with ruleid/ok annotations

Example Rule

Here’s a complete example for detecting SQL injection in Python:
rules:
  - id: sql-injection
    languages: [python]
    severity: ERROR
    message: User input concatenated into SQL query enables SQL injection
    mode: taint
    pattern-sources:
      - pattern: request.GET.get(...)
      - pattern: request.POST.get(...)
    pattern-sinks:
      - pattern: cursor.execute($SQL, ...)
    pattern-sanitizers:
      - pattern: escape_sql(...)
Run tests:
cd sql-injection/
semgrep --test --config sql-injection.yaml sql-injection.py
Expected output:
1/1: ✓ All tests passed

Key Commands

CommandPurpose
semgrep --dump-ast -l <lang> <file>View AST structure
semgrep --validate --config <rule>.yamlValidate YAML syntax
semgrep --test --config <rule>.yaml <test-file>Run tests
semgrep --dataflow-traces -f <rule>.yaml <file>Debug taint flow

Strictness Principles

The plugin enforces strict quality standards:
Non-negotiable requirements:
  • Test-first is mandatory: Never write a rule without tests
  • 100% test pass required: “Most tests pass” is not acceptable
  • One YAML file = one Semgrep rule: Don’t combine multiple rules
  • No generic rules: Avoid generic pattern matching (languages: generic)
  • Forbidden annotations: todoruleid: and todook: are not allowed

Anti-Patterns to Avoid

Too Broad

Matches everything, useless for detection:
# BAD: Matches any function call
pattern: $FUNC(...)

# GOOD: Specific dangerous function
pattern: eval(...)

Missing Safe Cases

Leads to undetected false positives:
# BAD: Only tests vulnerable case
# ruleid: my-rule
dangerous(user_input)

# GOOD: Include safe cases to verify no false positives
# ruleid: my-rule
dangerous(user_input)

# ok: my-rule
dangerous(sanitize(user_input))

# ok: my-rule
dangerous("hardcoded_safe_value")

Overly Specific

Misses variations:
# BAD: Only matches exact format
pattern: os.system("rm " + $VAR)

# GOOD: Matches all os.system calls with taint tracking
mode: taint
pattern-sinks:
  - pattern: os.system(...)

Rationalizations to Reject

When writing Semgrep rules, reject these common shortcuts:
RationalizationWhy It Fails
”The pattern looks complete”Still run semgrep --test to verify. Untested rules have hidden false positives/negatives.
”It matches the vulnerable case”Matching vulnerabilities is half the job. Verify safe cases don’t match.
”Taint mode is overkill for this”If data flows from user input to a dangerous sink, taint mode gives better precision.
”One test is enough”Include edge cases: different coding styles, sanitized inputs, safe alternatives.
”I’ll optimize the patterns first”Write correct patterns first, optimize after all tests pass.
”The AST dump is too complex”The AST reveals exactly how Semgrep sees code. Skipping it leads to missed variations.

Required Documentation

Before writing any rule, the plugin requires reading these Semgrep resources using WebFetch:
  1. Rule Syntax
  2. Pattern Syntax
  3. ToB Testing Handbook - Semgrep
  4. Constant Propagation
  5. Writing Rules Index
  • semgrep-rule-variant-creator - Port existing Semgrep rules to new target languages
  • static-analysis - General static analysis toolkit with Semgrep, CodeQL, and SARIF parsing
  • variant-analysis - Find similar vulnerabilities across codebases

Additional Resources

Author

Maciej Domanski