Static Analysis - Trail of Bits Skills

The Static Analysis plugin provides a complete toolkit for security vulnerability detection using industry-leading static analysis tools.

Based on Trail of Bits Testing HandbookSkills are based on the CodeQL Testing Handbook and Semgrep Testing Handbook.Author: Axel Mierczuk & Paweł Płatek

Installation

/plugin install trailofbits/skills/plugins/static-analysis

Skills Included

CodeQL

Deep security analysis with taint tracking and data flow

Semgrep

Fast pattern-based security scanning

SARIF Parsing

Parse and process results from static analysis tools

CodeQL Skill

Interprocedural security analysis with taint tracking and data flow for Python, JavaScript, Go, Java, C/C++, C#, Ruby, and Swift.

Key Features

Database Creation

Create databases for Python, JavaScript, Go, Java, C/C++, and more
Automatic build method selection with fallbacks
Quality assessment and validation
macOS Apple Silicon workarounds

Security Queries

SARIF/CSV output formats
Multiple query pack support (security-extended, Trail of Bits, Community)
Two scan modes: “run all” and “important only”
Interprocedural taint tracking

Data Extension Models

Generate project-specific source/sink models
Detect custom API patterns
Extend CodeQL’s built-in library knowledge
YAML-based model definitions

Essential Principles

Critical CodeQL Principles

Database quality is non-negotiable - A database that builds is not automatically good. Always run quality assessment.
Data extensions catch what CodeQL misses - Even projects using standard frameworks have custom wrappers.
Explicit suite references prevent silent query dropping - Never pass pack names directly to codeql database analyze.
Zero findings needs investigation - Zero results can indicate poor database quality or missing models.
macOS Apple Silicon requires workarounds - Exit code 137 is arm64e/arm64 mismatch, not a build failure.

CodeQL Workflow

The skill provides three workflows:

Workflow	Purpose
Build Database	Create CodeQL database using build methods in sequence
Create Data Extensions	Detect or generate data extension models for project APIs
Run Analysis	Select rulesets, execute queries, process results

Output Structure

All generated files are stored in a single output directory:

static_analysis_codeql_1/
├── rulesets.txt                 # Selected query packs
├── codeql.db/                   # CodeQL database
├── build.log                    # Build log
├── diagnostics/                 # Diagnostic queries and CSVs
├── extensions/                  # Data extension YAMLs
├── raw/                         # Unfiltered analysis output
│   ├── results.sarif
│   └── important-only.qls
└── results/                     # Final filtered results
    └── results.sarif

Supported Languages

Python

Django, Flask, FastAPI support

JavaScript/TypeScript

Node.js, React, Vue

Go

Standard library coverage

Java/Kotlin

Spring, Android

C/C++

Build tracing required

C#

.NET, ASP.NET

Ruby

Rails support

Swift

iOS/macOS

Semgrep Skill

Fast pattern-based security scanning with parallel execution and automatic language detection.

Key Features

Parallel Execution

Spawns parallel scanner subagents for each language
Automatic language detection from file extensions
Merged SARIF output
Support for GitHub repositories (auto-clones)

Scan Modes

Run All: Complete coverage, all severity levelsImportant Only: High-confidence security vulnerabilities only

Pre-filter: --severity MEDIUM/HIGH/CRITICAL
Post-filter: category=security, confidence∈{MEDIUM,HIGH}

Rulesets

Official Semgrep registry (OWASP, CWE)
Trail of Bits custom rules
Third-party rules (0xdea, Decurity)
Custom YAML rules with pattern matching
Taint mode for data flow tracking

Semgrep Pro Support

Automatic detection of Pro license
Cross-file taint tracking
Interprocedural analysis
Additional languages (Apex, C#, Elixir)

Essential Principles

Critical Semgrep Principles

Always use --metrics=off - Semgrep sends telemetry by default; prevent data leakage during audits.
User must approve the scan plan - Step 3 is a hard gate; present exact rulesets before scanning.
Third-party rulesets are required - Trail of Bits, 0xdea, and Decurity rules catch vulnerabilities absent from official registry.
Spawn all scan Tasks in parallel - Never spawn Tasks sequentially; emit all in one response.
Always check for Semgrep Pro - Pro enables cross-file taint tracking and catches ~250% more true positives.

Semgrep Workflow

Resolve output directory - Auto-increment static_analysis_semgrep_1, _2, etc.
Detect languages and Pro availability - Use Glob to find file types
Select scan mode and rulesets - Present plan to user
Get explicit approval - Hard gate, must approve before scanning
Spawn parallel scanner Tasks - One Task per language category
Merge results and report - Combine SARIF files, provide summary

Agents

Agent	Tools	Purpose
`semgrep-scanner`	Bash	Executes parallel semgrep scans for a language category
`semgrep-triager`	Read, Grep, Glob, Write	Classifies findings as true/false positives by reading source

Output Structure

static_analysis_semgrep_1/
├── rulesets.txt                 # Approved rulesets
├── raw/                         # Per-scan raw output
│   ├── python-python.json
│   ├── python-python.sarif
│   ├── python-django.sarif
│   └── ...
└── results/                     # Final merged output
    └── results.sarif

SARIF Parsing Skill

Parse, analyze, and process SARIF files from static analysis tools.

Key Features

Quick Analysis with jq

# Count total findings
jq '[.runs[].results[]] | length' results.sarif

# Extract errors only
jq '.runs[].results[] | select(.level == "error")' results.sarif

# Get findings with locations
jq '.runs[].results[] | {
  rule: .ruleId,
  file: .locations[0].physicalLocation.artifactLocation.uri,
  line: .locations[0].physicalLocation.region.startLine
}' results.sarif

Python with pysarif

from pysarif import load_from_file

sarif = load_from_file("results.sarif")
for run in sarif.runs:
    for result in run.results:
        print(f"[{result.level}] {result.rule_id}")

Aggregation with sarif-tools

# Summary of findings
sarif summary results.sarif

# Diff two SARIF files
sarif diff baseline.sarif current.sarif

# Convert to CSV/HTML
sarif csv results.sarif > results.csv
sarif html results.sarif > report.html

Deduplication & Fingerprinting

Stable fingerprints for tracking findings across runs
Path normalization (handles different environments)
Baseline comparison for regression detection
Suppression of known false positives

SARIF Structure

sarifLog
├── version: "2.1.0"
└── runs[]
    ├── tool.driver (name, version, rules)
    ├── results[] (findings)
    │   ├── ruleId
    │   ├── level (error/warning/note)
    │   ├── message.text
    │   ├── locations[]
    │   └── fingerprints
    └── artifacts[] (scanned files)

Tool Selection Guide

Use Case	Tool	Installation
Quick CLI queries	jq	`brew install jq`
Python scripting (simple)	pysarif	`pip install pysarif`
Python scripting (advanced)	sarif-tools	`pip install sarif-tools`
.NET applications	SARIF SDK	NuGet package

Common Pitfalls

Avoid These Issues

Path normalization - Different tools report paths differently (absolute, relative, URI-encoded)
Fingerprint mismatch - Fingerprints may not match if file paths differ or code is reformatted
Missing data - Many SARIF fields are optional; use defensive access
Large files - For 100MB+ files, use streaming with ijson
Schema validation - Validate before processing to catch malformed files

When to Use Each Tool

Use CodeQL When

Need deep interprocedural analysis
Tracking data flow across files/functions
Complex taint tracking required
Building a database is feasible

Use Semgrep When

Need fast feedback
Pattern-based detection sufficient
No build capability for compiled languages
Quick first-pass security scan

Variant Analysis

Use CodeQL/Semgrep patterns to find bug variants

Supply Chain Risk Auditor

Audit dependencies for exploitation risk

Success Criteria

CodeQL Analysis Complete

Database built with quality assessment passed
Data extensions evaluated or created
Analysis run with explicit suite reference
All available query packs used
Results preserved in output directory
Zero findings investigated

Semgrep Scan Complete

Languages detected with Pro status checked
User approved scan plan
Third-party rulesets included
All Tasks spawned in parallel
--metrics=off used everywhere
Results merged and summarized

Sharp Edges

Supply Chain Risk Auditor

​Installation

​Skills Included

CodeQL

Semgrep

SARIF Parsing

​CodeQL Skill

​Key Features

​Essential Principles

​CodeQL Workflow

​Output Structure

​Supported Languages

Python

JavaScript/TypeScript

Go

Java/Kotlin

C/C++

C#

Ruby

Swift

​Semgrep Skill

​Key Features

​Essential Principles

​Semgrep Workflow

​Agents

​Output Structure

​SARIF Parsing Skill

​Key Features

​SARIF Structure

​Tool Selection Guide

​Common Pitfalls

​When to Use Each Tool

Use CodeQL When

Use Semgrep When

​Related Skills

Variant Analysis

Supply Chain Risk Auditor

​Success Criteria

​CodeQL Analysis Complete

​Semgrep Scan Complete

Installation

Skills Included

CodeQL Skill

Key Features

Essential Principles

CodeQL Workflow

Output Structure

Supported Languages

Semgrep Skill

Key Features

Essential Principles

Semgrep Workflow

Agents

Output Structure

SARIF Parsing Skill

Key Features

SARIF Structure

Tool Selection Guide

Common Pitfalls

When to Use Each Tool

Related Skills

Success Criteria

CodeQL Analysis Complete

Semgrep Scan Complete