Skip to main content
The Static Analysis plugin provides a complete toolkit for security vulnerability detection using industry-leading static analysis tools.
Based on Trail of Bits Testing HandbookSkills are based on the CodeQL Testing Handbook and Semgrep Testing Handbook.Author: Axel Mierczuk & Paweł Płatek

Installation

/plugin install trailofbits/skills/plugins/static-analysis

Skills Included

CodeQL

Deep security analysis with taint tracking and data flow

Semgrep

Fast pattern-based security scanning

SARIF Parsing

Parse and process results from static analysis tools

CodeQL Skill

Interprocedural security analysis with taint tracking and data flow for Python, JavaScript, Go, Java, C/C++, C#, Ruby, and Swift.

Key Features

  • Create databases for Python, JavaScript, Go, Java, C/C++, and more
  • Automatic build method selection with fallbacks
  • Quality assessment and validation
  • macOS Apple Silicon workarounds
  • SARIF/CSV output formats
  • Multiple query pack support (security-extended, Trail of Bits, Community)
  • Two scan modes: “run all” and “important only”
  • Interprocedural taint tracking
  • Generate project-specific source/sink models
  • Detect custom API patterns
  • Extend CodeQL’s built-in library knowledge
  • YAML-based model definitions

Essential Principles

Critical CodeQL Principles
  1. Database quality is non-negotiable - A database that builds is not automatically good. Always run quality assessment.
  2. Data extensions catch what CodeQL misses - Even projects using standard frameworks have custom wrappers.
  3. Explicit suite references prevent silent query dropping - Never pass pack names directly to codeql database analyze.
  4. Zero findings needs investigation - Zero results can indicate poor database quality or missing models.
  5. macOS Apple Silicon requires workarounds - Exit code 137 is arm64e/arm64 mismatch, not a build failure.

CodeQL Workflow

The skill provides three workflows:
WorkflowPurpose
Build DatabaseCreate CodeQL database using build methods in sequence
Create Data ExtensionsDetect or generate data extension models for project APIs
Run AnalysisSelect rulesets, execute queries, process results

Output Structure

All generated files are stored in a single output directory:
static_analysis_codeql_1/
├── rulesets.txt                 # Selected query packs
├── codeql.db/                   # CodeQL database
├── build.log                    # Build log
├── diagnostics/                 # Diagnostic queries and CSVs
├── extensions/                  # Data extension YAMLs
├── raw/                         # Unfiltered analysis output
│   ├── results.sarif
│   └── important-only.qls
└── results/                     # Final filtered results
    └── results.sarif

Supported Languages

Python

Django, Flask, FastAPI support

JavaScript/TypeScript

Node.js, React, Vue

Go

Standard library coverage

Java/Kotlin

Spring, Android

C/C++

Build tracing required

C#

.NET, ASP.NET

Ruby

Rails support

Swift

iOS/macOS

Semgrep Skill

Fast pattern-based security scanning with parallel execution and automatic language detection.

Key Features

  • Spawns parallel scanner subagents for each language
  • Automatic language detection from file extensions
  • Merged SARIF output
  • Support for GitHub repositories (auto-clones)
Run All: Complete coverage, all severity levelsImportant Only: High-confidence security vulnerabilities only
  • Pre-filter: --severity MEDIUM/HIGH/CRITICAL
  • Post-filter: category=security, confidence∈{MEDIUM,HIGH}
  • Official Semgrep registry (OWASP, CWE)
  • Trail of Bits custom rules
  • Third-party rules (0xdea, Decurity)
  • Custom YAML rules with pattern matching
  • Taint mode for data flow tracking
  • Automatic detection of Pro license
  • Cross-file taint tracking
  • Interprocedural analysis
  • Additional languages (Apex, C#, Elixir)

Essential Principles

Critical Semgrep Principles
  1. Always use --metrics=off - Semgrep sends telemetry by default; prevent data leakage during audits.
  2. User must approve the scan plan - Step 3 is a hard gate; present exact rulesets before scanning.
  3. Third-party rulesets are required - Trail of Bits, 0xdea, and Decurity rules catch vulnerabilities absent from official registry.
  4. Spawn all scan Tasks in parallel - Never spawn Tasks sequentially; emit all in one response.
  5. Always check for Semgrep Pro - Pro enables cross-file taint tracking and catches ~250% more true positives.

Semgrep Workflow

  1. Resolve output directory - Auto-increment static_analysis_semgrep_1, _2, etc.
  2. Detect languages and Pro availability - Use Glob to find file types
  3. Select scan mode and rulesets - Present plan to user
  4. Get explicit approval - Hard gate, must approve before scanning
  5. Spawn parallel scanner Tasks - One Task per language category
  6. Merge results and report - Combine SARIF files, provide summary

Agents

AgentToolsPurpose
semgrep-scannerBashExecutes parallel semgrep scans for a language category
semgrep-triagerRead, Grep, Glob, WriteClassifies findings as true/false positives by reading source

Output Structure

static_analysis_semgrep_1/
├── rulesets.txt                 # Approved rulesets
├── raw/                         # Per-scan raw output
│   ├── python-python.json
│   ├── python-python.sarif
│   ├── python-django.sarif
│   └── ...
└── results/                     # Final merged output
    └── results.sarif

SARIF Parsing Skill

Parse, analyze, and process SARIF files from static analysis tools.

Key Features

# Count total findings
jq '[.runs[].results[]] | length' results.sarif

# Extract errors only
jq '.runs[].results[] | select(.level == "error")' results.sarif

# Get findings with locations
jq '.runs[].results[] | {
  rule: .ruleId,
  file: .locations[0].physicalLocation.artifactLocation.uri,
  line: .locations[0].physicalLocation.region.startLine
}' results.sarif
from pysarif import load_from_file

sarif = load_from_file("results.sarif")
for run in sarif.runs:
    for result in run.results:
        print(f"[{result.level}] {result.rule_id}")
# Summary of findings
sarif summary results.sarif

# Diff two SARIF files
sarif diff baseline.sarif current.sarif

# Convert to CSV/HTML
sarif csv results.sarif > results.csv
sarif html results.sarif > report.html
  • Stable fingerprints for tracking findings across runs
  • Path normalization (handles different environments)
  • Baseline comparison for regression detection
  • Suppression of known false positives

SARIF Structure

sarifLog
├── version: "2.1.0"
└── runs[]
    ├── tool.driver (name, version, rules)
    ├── results[] (findings)
    │   ├── ruleId
    │   ├── level (error/warning/note)
    │   ├── message.text
    │   ├── locations[]
    │   └── fingerprints
    └── artifacts[] (scanned files)

Tool Selection Guide

Use CaseToolInstallation
Quick CLI queriesjqbrew install jq
Python scripting (simple)pysarifpip install pysarif
Python scripting (advanced)sarif-toolspip install sarif-tools
.NET applicationsSARIF SDKNuGet package

Common Pitfalls

Avoid These Issues
  1. Path normalization - Different tools report paths differently (absolute, relative, URI-encoded)
  2. Fingerprint mismatch - Fingerprints may not match if file paths differ or code is reformatted
  3. Missing data - Many SARIF fields are optional; use defensive access
  4. Large files - For 100MB+ files, use streaming with ijson
  5. Schema validation - Validate before processing to catch malformed files

When to Use Each Tool

Use CodeQL When

  • Need deep interprocedural analysis
  • Tracking data flow across files/functions
  • Complex taint tracking required
  • Building a database is feasible

Use Semgrep When

  • Need fast feedback
  • Pattern-based detection sufficient
  • No build capability for compiled languages
  • Quick first-pass security scan

Variant Analysis

Use CodeQL/Semgrep patterns to find bug variants

Supply Chain Risk Auditor

Audit dependencies for exploitation risk

Success Criteria

CodeQL Analysis Complete

  • Database built with quality assessment passed
  • Data extensions evaluated or created
  • Analysis run with explicit suite reference
  • All available query packs used
  • Results preserved in output directory
  • Zero findings investigated

Semgrep Scan Complete

  • Languages detected with Pro status checked
  • User approved scan plan
  • Third-party rulesets included
  • All Tasks spawned in parallel
  • --metrics=off used everywhere
  • Results merged and summarized