Semgrep Rule Creator

Create custom Semgrep rules for detecting security vulnerabilities, bug patterns, and code quality issues using a rigorous test-driven workflow.

Overview

The Semgrep Rule Creator plugin guides you through creating production-quality Semgrep rules with proper testing and validation. It enforces a strict test-first methodology to ensure rules are accurate, maintainable, and free from false positives. Key capabilities:

Test-driven rule development (write tests first, then iterate)
AST analysis to craft precise patterns
Support for both taint mode (data flow) and pattern matching
Comprehensive reference documentation from Semgrep docs
Common vulnerability patterns by language

Installation

/plugin install trailofbits/skills/plugins/semgrep-rule-creator

Prerequisites

Semgrep installed (pip install semgrep or brew install semgrep)

When to Use

Use this plugin when you need to:

Create custom Semgrep rules for detecting specific bug patterns
Write rules for security vulnerability detection
Build taint mode rules for data flow analysis
Develop pattern matching rules for code quality checks
Enforce coding standards with custom detections

When NOT to Use

Do NOT use this plugin for:

Running existing Semgrep rulesets (use semgrep scan instead)
General static analysis without custom rules (use the static-analysis plugin)

Core Workflow

The plugin enforces a strict 7-step workflow:

Analyze the Problem

Understand the bug pattern, target language, and determine whether to use taint mode or pattern matching.

Write Tests First

Create test file with vulnerable cases (ruleid:) and safe cases (ok:) before writing any rule code.

Analyze AST Structure

Run semgrep --dump-ast to understand how Semgrep parses the code.

Write the Rule

Create the YAML rule file using appropriate pattern operators.

Iterate Until Tests Pass

Run semgrep --test and fix issues until all tests pass.

Optimize the Rule

Remove redundancies and simplify patterns while keeping all tests passing.

Final Validation

Run final validation to confirm the rule works correctly.

Taint Mode vs Pattern Matching

When to Use Taint Mode (Prioritize)

Use taint mode for data flow issues where untrusted input reaches dangerous sinks:

rules:
  - id: insecure-eval
    languages: [python]
    severity: HIGH
    message: User input passed to eval() allows code execution
    mode: taint
    pattern-sources:
      - pattern: request.args.get(...)
    pattern-sinks:
      - pattern: eval(...)

Why prioritize taint mode? Pattern matching finds syntax but misses context. A pattern eval($X) matches both eval(user_input) (vulnerable) and eval("safe_literal") (safe). Taint mode tracks data flow, so it only alerts when untrusted data actually reaches the sink.

When to Use Pattern Matching

Use pattern matching for simple syntactic patterns without data flow requirements:

rules:
  - id: deprecated-function
    languages: [javascript]
    severity: WARNING
    message: Deprecated function md5() should not be used
    pattern: md5(...)

Output Structure

Each rule produces exactly 2 files in a directory named after the rule ID:

<rule-id>/
├── <rule-id>.yaml     # Semgrep rule
└── <rule-id>.<ext>    # Test file with ruleid/ok annotations

Example Rule

Here’s a complete example for detecting SQL injection in Python:

rules:
  - id: sql-injection
    languages: [python]
    severity: ERROR
    message: User input concatenated into SQL query enables SQL injection
    mode: taint
    pattern-sources:
      - pattern: request.GET.get(...)
      - pattern: request.POST.get(...)
    pattern-sinks:
      - pattern: cursor.execute($SQL, ...)
    pattern-sanitizers:
      - pattern: escape_sql(...)

Run tests:

cd sql-injection/
semgrep --test --config sql-injection.yaml sql-injection.py

Expected output:

1/1: ✓ All tests passed

Key Commands

Command	Purpose
`semgrep --dump-ast -l <lang> <file>`	View AST structure
`semgrep --validate --config <rule>.yaml`	Validate YAML syntax
`semgrep --test --config <rule>.yaml <test-file>`	Run tests
`semgrep --dataflow-traces -f <rule>.yaml <file>`	Debug taint flow

Strictness Principles

The plugin enforces strict quality standards:

Non-negotiable requirements:

Test-first is mandatory: Never write a rule without tests
100% test pass required: “Most tests pass” is not acceptable
One YAML file = one Semgrep rule: Don’t combine multiple rules
No generic rules: Avoid generic pattern matching (languages: generic)
Forbidden annotations: todoruleid: and todook: are not allowed

Anti-Patterns to Avoid

Too Broad

Matches everything, useless for detection:

# BAD: Matches any function call
pattern: $FUNC(...)

# GOOD: Specific dangerous function
pattern: eval(...)

Missing Safe Cases

Leads to undetected false positives:

# BAD: Only tests vulnerable case
# ruleid: my-rule
dangerous(user_input)

# GOOD: Include safe cases to verify no false positives
# ruleid: my-rule
dangerous(user_input)

# ok: my-rule
dangerous(sanitize(user_input))

# ok: my-rule
dangerous("hardcoded_safe_value")

Overly Specific

Misses variations:

# BAD: Only matches exact format
pattern: os.system("rm " + $VAR)

# GOOD: Matches all os.system calls with taint tracking
mode: taint
pattern-sinks:
  - pattern: os.system(...)

Rationalizations to Reject

When writing Semgrep rules, reject these common shortcuts:

Rationalization	Why It Fails
”The pattern looks complete”	Still run `semgrep --test` to verify. Untested rules have hidden false positives/negatives.
”It matches the vulnerable case”	Matching vulnerabilities is half the job. Verify safe cases don’t match.
”Taint mode is overkill for this”	If data flows from user input to a dangerous sink, taint mode gives better precision.
”One test is enough”	Include edge cases: different coding styles, sanitized inputs, safe alternatives.
”I’ll optimize the patterns first”	Write correct patterns first, optimize after all tests pass.
”The AST dump is too complex”	The AST reveals exactly how Semgrep sees code. Skipping it leads to missed variations.

Required Documentation

Before writing any rule, the plugin requires reading these Semgrep resources using WebFetch:

semgrep-rule-variant-creator - Port existing Semgrep rules to new target languages
static-analysis - General static analysis toolkit with Semgrep, CodeQL, and SARIF parsing
variant-analysis - Find similar vulnerabilities across codebases

Additional Resources

Author

Maciej Domanski

Get Started

Core Concepts

Smart Contract Security

Code Auditing

Static Analysis Tools

Verification & Testing

Specialized Tools

Development

Infrastructure & Tools

Other

Overview

Installation

Prerequisites

When to Use

When NOT to Use

Core Workflow

Taint Mode vs Pattern Matching

When to Use Taint Mode (Prioritize)

When to Use Pattern Matching

Output Structure

Example Rule

Key Commands

Strictness Principles

Anti-Patterns to Avoid

Too Broad

Missing Safe Cases

Overly Specific

Rationalizations to Reject

Required Documentation

Additional Resources

Author

Get Started

Core Concepts

Smart Contract Security

Code Auditing

Static Analysis Tools

Verification & Testing

Specialized Tools

Development

Infrastructure & Tools

Other

Documentation Index

​Overview

​Installation

​Prerequisites

​When to Use

​When NOT to Use

​Core Workflow

​Taint Mode vs Pattern Matching

​When to Use Taint Mode (Prioritize)

​When to Use Pattern Matching

​Output Structure

​Example Rule

​Key Commands

​Strictness Principles

​Anti-Patterns to Avoid

​Too Broad

​Missing Safe Cases

​Overly Specific

​Rationalizations to Reject

​Required Documentation

​Related Plugins

​Additional Resources

Author

Overview

Installation

Prerequisites

When to Use

When NOT to Use

Core Workflow

Taint Mode vs Pattern Matching

When to Use Taint Mode (Prioritize)

When to Use Pattern Matching

Output Structure

Example Rule

Key Commands

Strictness Principles

Anti-Patterns to Avoid

Too Broad

Missing Safe Cases

Overly Specific

Rationalizations to Reject

Required Documentation

Related Plugins

Additional Resources