Variant Analysis - Trail of Bits Skills

The Variant Analysis plugin provides a systematic methodology for finding similar vulnerabilities and bugs across codebases after discovering an initial pattern.

Author: Axel MierczukOnce you’ve found one vulnerability, there are likely more. This plugin helps you hunt them down systematically.

Installation

/plugin install trailofbits/skills/plugins/variant-analysis

Overview

Variant analysis is the process of taking a known vulnerability and searching for similar instances across a codebase. The plugin provides:

Five-step systematic process from understanding to reporting
Tool selection guidance (ripgrep, Semgrep, CodeQL)
Critical pitfall catalog to avoid missing variants
Ready-to-use templates for CodeQL and Semgrep in multiple languages
Detailed methodology for abstraction and generalization

When to Use

Use this plugin when:

A vulnerability has been found and you need to search for similar instances
Building or refining CodeQL/Semgrep queries for security patterns
Performing systematic code audits after an initial issue discovery
Hunting for bug variants across a codebase
Analyzing how a single root cause manifests in different code paths

When NOT to Use

Do NOT use this plugin for:

Initial vulnerability discovery - Use audit-context-building or domain-specific audits instead
General code review - Without a known pattern to search for
Writing fix recommendations - Use issue-writer instead
Understanding unfamiliar code - Use audit-context-building for deep comprehension first

The Five-Step Process

Understand the Original Issue

Before searching, deeply understand the known bug:

What is the root cause? Not the symptom, but WHY it’s vulnerable
What conditions are required? Control flow, data flow, state
What makes it exploitable? User control, missing validation, etc.

The Root Cause StatementFormulate: “This vulnerability exists because [UNTRUSTED DATA] reaches [DANGEROUS OPERATION] without [REQUIRED PROTECTION].”Example: “User input reaches eval() without sanitization”This statement IS your search pattern.

Create an Exact Match

Start with a pattern that matches ONLY the known instance:

rg -n "exact_vulnerable_code_here"

Verify: Does it match exactly ONE location (the original)?This establishes your baseline before generalization.

Identify Abstraction Points

Determine what can be generalized:

Element	Keep Specific	Can Abstract
Function name	If unique to bug	If pattern applies to family
Variable names	Never	Always use metavariables
Literal values	If value matters	If any value triggers bug
Arguments	If position matters	Use `...` wildcards

Iteratively Generalize

Change ONE element at a time:

Run the pattern
Review ALL new matches
Classify: true positive or false positive?
If FP rate acceptable, generalize next element
If FP rate too high, revert and try different abstraction

Stop when false positive rate exceeds ~50%High FP rates waste analysis time and obscure real vulnerabilities.

Analyze and Triage Results

For each match, document:

Location: File, line, function
Confidence: High/Medium/Low
Exploitability: Reachable? Controllable inputs?
Priority: Based on impact and exploitability

Tool Selection

Choose the right tool for your analysis:

Scenario	Tool	Why
Quick surface search	ripgrep	Fast, zero setup, good for reconnaissance
Simple pattern matching	Semgrep	Easy syntax, no build needed, works on incomplete code
Data flow tracking	Semgrep taint / CodeQL	Follows values across functions and files
Cross-function analysis	CodeQL	Best interprocedural analysis, deep data flow
Non-building code	Semgrep	Works without compilation

Progressive Tool UseStart with ripgrep for quick reconnaissance, use Semgrep for pattern iteration, and finish with CodeQL for deep interprocedural analysis.

The Abstraction Ladder

Patterns exist at different abstraction levels. Climb the ladder systematically:

Level 0: Exact Match

Match the literal vulnerable code:

# Original vulnerable code
query = "SELECT * FROM users WHERE id=" + request.args.get('id')

# Level 0 pattern
rg 'SELECT \* FROM users WHERE id=" \+ request\.args\.get'

Matches: 1 (the original)
False positives: 0
Value: Confirms bug exists, baseline for generalization

Level 1: Variable Abstraction

Replace variable names with wildcards:

# Level 1 pattern
pattern: $QUERY = "SELECT * FROM users WHERE id=" + $INPUT

Matches: 3-5 (same query pattern, different variables)
False positives: Low
Value: Find copy-paste variants

Level 2: Structural Abstraction

Generalize the structure:

# Level 2 pattern
patterns:
  - pattern: $Q = "..." + $INPUT
  - pattern-inside: |
      def $FUNC(...):
        ...
        cursor.execute($Q)

Matches: 10-30 (any string concat used in query)
False positives: Medium
Value: Find pattern variants

Level 3: Semantic Abstraction

Abstract to the security property:

# Level 3 pattern (taint mode)
mode: taint
pattern-sources:
  - pattern: request.args.get(...)
  - pattern: request.form.get(...)
pattern-sinks:
  - pattern: cursor.execute(...)

Matches: 50-100+ (any user input to any query)
False positives: High (many have proper parameterization)
Value: Comprehensive coverage, requires triage

Choosing Your Level

Goal	Recommended Level
Verify a specific fix	Level 0
Find copy-paste bugs	Level 1
Audit a component	Level 2
Full security assessment	Level 3

Critical Pitfalls to Avoid

These common mistakes cause analysts to miss real vulnerabilities:

1. Narrow Search Scope

Problem: Searching only the module where the original bug was found misses variants in other locations. Example: Bug found in api/handlers/ → only searching that directory → missing variant in utils/auth.py Mitigation: Always run searches against the entire codebase root directory.

2. Pattern Too Specific

Problem: Using only the exact attribute/function from the original bug misses variants using related constructs. Example: Bug uses isAuthenticated check → only searching for that exact term → missing bugs using isActive, isAdmin, isVerified Mitigation: Enumerate ALL semantically related attributes/functions for the bug class.

3. Single Vulnerability Class

Problem: Focusing on only one manifestation of the root cause misses other ways the same logic error appears. Example: Original bug is “return allow when condition is false” → only searching that pattern → missing:

Null equality bypasses (null == null evaluates to true)
Documentation/code mismatches (function does opposite of docs)
Inverted conditional logic (wrong branch taken)

Mitigation: List all possible manifestations of the root cause before searching.

4. Missing Edge Cases

Problem: Testing patterns only with “normal” scenarios misses vulnerabilities triggered by edge cases. Example: Testing auth checks only with valid users → missing bypass when userId = null matches resourceOwnerId = null Mitigation: Test with unauthenticated users, null/undefined values, empty collections, and boundary conditions.

Expanding Vulnerability Classes

A single root cause can manifest in multiple ways. Before concluding your search, expand to related vulnerability classes:

Semantically Related Functions

Boolean Logic Errors

Inverted conditions (if not x vs if x)
Wrong default return value (return true vs return false)
Short-circuit evaluation errors

Edge Cases by Data Type

Null/None/undefined comparisons
Empty string vs null
Zero vs null
Empty array/collection

Documentation Mismatches

Function does opposite of docstring:

def check_restricted_permission(user, perm):
    """Returns True if access should be DENIED."""
    if user.has_perm(perm):
        return True  # BUG: This GRANTS access
    return False

Detection: Search for functions with “deny”, “restrict”, “block”, “forbid” and verify return semantics.

Null Equality Bypasses

# If both are None, None == None is True
if order.owner_id == current_user.id:
    return True  # Allows access!

Detection: Find equality-based permission checks, trace if both sides can be null.

Ready-to-Use Templates

The plugin includes language-specific templates:

CodeQL
Semgrep

Templates for:

Python (resources/codeql/python.ql)
JavaScript (resources/codeql/javascript.ql)
Java (resources/codeql/java.ql)
Go (resources/codeql/go.ql)
C++ (resources/codeql/cpp.ql)

Basic structure:

import python

from CallExpr call, Expr arg
where
  call.getFunc().getName() = "dangerous_func" and
  arg = call.getArg(0) and
  exists(DataFlow::Node source |
    source.asExpr() = arg and
    source.getALocalSource() instanceof UntrustedInput
  )
select call, "Untrusted input to dangerous function"

Templates for:

Python (resources/semgrep/python.yaml)
JavaScript (resources/semgrep/javascript.yaml)
Java (resources/semgrep/java.yaml)
Go (resources/semgrep/go.yaml)
C++ (resources/semgrep/cpp.yaml)

Basic structure:

rules:
  - id: variant-search
    mode: taint
    pattern-sources:
      - pattern: request.args.get(...)
    pattern-sinks:
      - pattern: dangerous_func(...)
    message: Variant of CVE-XXXX
    severity: ERROR
    languages: [python]

Commands

The plugin includes a /variants command for quick invocation:

/variants

This command is context-driven and uses conversation context to understand:

The original bug/vulnerability that was found
The codebase to search

Variant Report Template

Use the provided template for documenting findings:

## Variant Analysis: [Original Bug ID]

### Root Cause
[Statement of the vulnerability pattern]

### Patterns Tried
| Pattern | Level | Matches | True Pos | False Pos | Notes |
|---------|-------|---------|----------|-----------|-------|
| exact   | 0     | 1       | 1        | 0         | Baseline |

### Confirmed Variants
| Location | Severity | Status | Notes |
|----------|----------|--------|-------|
| file:line| High     | Fixed  | ...   |

### False Positive Patterns
- Pattern X: Always FP because [reason]

Key Principles

Root Cause First

Understand WHY before searching for WHERE

Start Specific

First pattern should match exactly the known bug

One Change at a Time

Generalize incrementally, verify after each change

Know When to Stop

50%+ FP rate means you’ve gone too generic

Search Everywhere

Always search the ENTIRE codebase

Expand Classes

One root cause has multiple manifestations

CodeQL

Primary tool for deep interprocedural variant analysis

Semgrep

Fast pattern matching for simpler variants

SARIF Parsing

Process variant analysis results

Example Workflow

Original Bug Found

# Found in api/auth.py:42
if user.isAuthenticated == request.token:
    return allow_access()

Root cause: Comparing boolean to string always fails; logic inverted

Exact Match (Level 0)

rg "user.isAuthenticated == request.token"
# Result: 1 match (the original)

Variable Abstraction (Level 1)

pattern: $USER.isAuthenticated == $INPUT
# Result: 3 matches (2 new variants found)

Expand to Related Properties

patterns:
  - pattern-either:
      - pattern: $USER.isAuthenticated == $INPUT
      - pattern: $USER.isActive == $INPUT
      - pattern: $USER.isVerified == $INPUT
# Result: 7 matches (4 more variants found)

Document and Triage

All 7 instances confirmed as true positives with varying severity based on impact and exploitability.

The Expert Mindset

Understand before searching - Root cause analysis is non-negotiable
Start specific - First pattern should match exactly one thing
Climb the ladder - Generalize one step at a time
Measure as you go - Track matches and FP rates at each step
Know when to stop - High FP rate means you’ve gone too far
Iterate ruthlessly - Refine patterns based on what you learn
Document everything - Your tracking doc is as valuable as your patterns
Expand vulnerability classes - One root cause has many manifestations
Check semantics - Verify code matches documentation intent
Test edge cases - Null values and boundary conditions reveal hidden bugs

​Installation

​Overview

​When to Use

​When NOT to Use

​The Five-Step Process

​Tool Selection

​The Abstraction Ladder

​Choosing Your Level

​Critical Pitfalls to Avoid

​1. Narrow Search Scope

​2. Pattern Too Specific

​3. Single Vulnerability Class

​4. Missing Edge Cases

​Expanding Vulnerability Classes

​Ready-to-Use Templates

​Commands

​Variant Report Template

​Key Principles

Root Cause First

Start Specific

One Change at a Time

Know When to Stop

Search Everywhere

Expand Classes

​Related Skills

CodeQL

Semgrep

SARIF Parsing

​Example Workflow

​The Expert Mindset

Installation

Overview

When to Use

When NOT to Use

The Five-Step Process

Tool Selection

The Abstraction Ladder

Choosing Your Level

Critical Pitfalls to Avoid

1. Narrow Search Scope

2. Pattern Too Specific

3. Single Vulnerability Class

4. Missing Edge Cases

Expanding Vulnerability Classes

Ready-to-Use Templates

Commands

Variant Report Template

Key Principles

Related Skills

Example Workflow

The Expert Mindset