Overview
The YARA Authoring plugin provides expert-level guidance for writing YARA-X detection rules that catch malware without drowning in false positives. It focuses on decision trees, expert heuristics, and production-tested patterns rather than dumping YARA syntax documentation.YARA-X Focus: This plugin targets YARA-X, the Rust-based successor to legacy YARA. YARA-X powers VirusTotal’s Livehunt/Retrohunt production systems and is 5-10x faster for regex-heavy rules. Legacy YARA (C implementation) is in maintenance mode.
- Decision trees for common judgment calls
- Expert heuristics from experienced YARA authors
- Naming conventions (
CATEGORY_PLATFORM_FAMILY_DATEformat) - Performance optimization (atom quality, short-circuit conditions)
- Testing workflow with goodware corpus validation
- YARA-X migration guide for converting legacy rules
- Chrome extension analysis with
crxmodule - Android DEX analysis with
dexmodule
Installation
YARA-X CLI
Python Package (for scripts)
Plugin
When to Use
Use this plugin when:- Writing new YARA-X rules for malware detection
- Reviewing existing rules for quality or performance issues
- Optimizing slow-running rulesets
- Converting IOCs or threat intel into detection signatures
- Debugging false positive issues
- Preparing rules for production deployment
- Migrating legacy YARA rules to YARA-X
- Analyzing Chrome extensions (crx module)
- Analyzing Android apps (dex module)
When NOT to Use
Do NOT use this plugin for:- Static analysis requiring disassembly → use Ghidra/IDA skills
- Dynamic malware analysis → use sandbox analysis skills
- Network-based detection → use Suricata/Snort skills
- Memory forensics with Volatility → use memory forensics skills
- Simple hash-based detection → just use hash lists
Core Principles
Good Atoms
Strings must generate good atoms. YARA extracts 4-byte subsequences for fast matching. Strings with repeated bytes or under 4 bytes force slow verification.
Specific Families
Target specific families, not categories. “Detects ransomware” catches everything and nothing. “Detects LockBit 3.0 config extraction” is precise.
Test Against Goodware
A rule that fires on Windows system files is useless. Validate against VirusTotal’s goodware corpus or your own clean file set.
Short-Circuit First
Put cheap checks first:
filesize < 10MB and uint16(0) == 0x5A4D before expensive string searches or module calls.Essential Toolkit
An expert uses 5 tools. Everything else is noise.| Tool | Purpose | Usage |
|---|---|---|
| yarGen | Extract candidate strings | yarGen.py -m samples/ --excludegood → validate with yr check |
| FLOSS | Extract obfuscated/stack strings | floss sample.exe (when yarGen fails) |
| yr CLI | Validate, scan, inspect | yr check, yr scan -s, yr dump -m pe |
| signature-base | Study quality examples | Learn from 17,000+ production rules |
| YARA-CI | Goodware corpus testing | Test before deployment |
Rule Structure
Every YARA-X rule follows this format:Naming Convention
MAL_- MalwareHKTL_- Hacking toolWEBSHELL_- Web shellEXPL_- ExploitSUSP_- Suspicious (not definitively malicious)GEN_- Generic detection
Win_, Lnx_, Mac_, Android_, CRX_
Example: MAL_Win_Emotet_Loader_Jan25
Required Metadata
Every rule needs these fields:Platform-Specific Patterns
YARA works on any file type. Adapt patterns to your target:Windows PE
macOS Mach-O
- Keylogger:
CGEventTapCreate,kCGEventKeyDown - SSH tunneling:
ssh -D,tunnel,socks - Persistence:
~/Library/LaunchAgents,/Library/LaunchDaemons - Credentials:
security find-generic-password,keychain
npm Supply Chain Attacks
- Ethereum selectors:
{ 70 a0 82 31 }(transfer) - Zero-width steganography:
{ E2 80 8B E2 80 8C } - Obfuscator signatures:
_0x,var _0x - C2 patterns: domain names, webhook URLs
require,fetch,axios- too commonBuffer,crypto- legitimate uses everywhereprocess.envalone - need specific env var names
Chrome Extensions (crx module)
nativeMessaging + downloads, debugger permission, content scripts on <all_urls>
Android DEX
DexClassLoader reflection, encrypted assets
Decision Trees
Is This String Good Enough?
When to Use “all of” vs “any of”
When to Abandon a Rule Approach
Stop and pivot when:- yarGen returns only API names and paths → Pivot to PE structure, entropy, or imphash
- Can’t find 3 unique strings → Probably packed. Target the unpacked version or detect the packer
- Rule matches goodware files →
- 1-2 matches = investigate and tighten
- 3-5 matches = find different indicators
- 6+ matches = start over
- Performance is terrible → Split into multiple focused rules or add strict pre-filters
- Description is hard to write → Rule is too vague. If you can’t explain what it catches, it catches too much
Real-World Example
Here’s a production-quality rule detecting npm supply chain attacks:- Function names (
runmask,checkethereumw) are unique to the attack - Ethereum function selector adds context
all of themprevents false positives- Small filesize pre-filter improves performance
Expert Heuristics
String Selection Priority
String Selection Priority
Gold tier: Mutex names, PDB paths, stack strings (almost always unique)Silver tier: C2 paths, configuration markers, error messagesBronze tier: API sequences, unusual importsGarbage tier: Single API names, common paths, format specifiersIf you need >6 strings, you’re over-fitting.
Modifier Discipline
Modifier Discipline
Never use
nocase or wide speculatively — only when you have confirmed evidence the case/encoding varies in samples.nocasedoubles atom generationwidedoubles string matching- Both have real performance costs
Regex Anchoring
Regex Anchoring
Regex without a 4+ byte literal substring evaluates at every file offset — catastrophic performance.If you can’t anchor, consider hex pattern with wildcards instead.
Loop Discipline
Loop Discipline
Always bound loops with filesize:Unbounded
#a can be thousands in large files — exponential slowdown.Rationalizations to Reject
When you catch yourself thinking these, stop and reconsider:| Rationalization | Expert Response |
|---|---|
| ”This generic string is unique enough” | Test against goodware first. Your intuition is wrong. |
| ”yarGen gave me these strings” | yarGen suggests, you validate. Check each one manually. |
| ”It works on my 10 samples” | 10 samples ≠ production. Use VirusTotal goodware corpus. |
| ”One rule to catch all variants” | Causes FP floods. Target specific families. |
| ”I’ll make it more specific if we get FPs” | Write tight rules upfront. FPs burn trust. |
| ”Performance doesn’t matter” | One slow rule slows entire ruleset. Optimize atoms. |
| ”any of them is fine for these common strings” | Common strings + any = FP flood. Use any of only for individually unique strings. |
| ”This regex is specific enough” | /fetch.*token/ matches all auth code. Add exfil destination requirement. |
| ”I’ll use .* for flexibility” | Unbounded regex = performance disaster. Use .{0,30}. |
Performance Optimization
Quick Wins
- Put
filesizefirst — instant check - Avoid
nocase— doubles atom generation - Bound regex — use
{1,100}not.* - Prefer hex over regex — faster matching
Red Flags
- Strings less than 4 bytes
- Unbounded regex (
.*) - Modules without file-type filter
any ofwith common strings
Condition Ordering
Order conditions for short-circuit evaluation:Migrating from Legacy YARA
YARA-X has 99% rule compatibility, but enforces stricter validation. Quick migration:| Issue | Legacy | YARA-X Fix |
|---|---|---|
Literal { in regex | /{/ | /\{/ |
| Invalid escapes | \R silently literal | \\R or R |
| Base64 strings | Any length | 3+ chars required |
| Negative indexing | @a[-1] | @a[#a - 1] |
| Duplicate modifiers | Allowed | Remove duplicates |
Included Scripts
The plugin includes two Python scripts with PEP 723 inline metadata (dependencies auto-resolved byuv run):
yara_lint.py
Validates YARA-X rules for style, metadata, compatibility issues, and anti-patterns:atom_analyzer.py
Evaluates string quality for efficient atom extraction:Workflow
Quality Checklist
Before deploying any rule:- Name follows
{CATEGORY}_{PLATFORM}_{FAMILY}_{VARIANT}_{DATE}format - Description starts with “Detects” and explains what/how
- All required metadata present (author, reference, date)
- Strings are unique (not API names, common paths, or format strings)
- All strings have 4+ bytes with good atom potential
- Base64 modifier only on strings with 3+ characters
- Regex patterns have escaped
{and valid escape sequences - Condition starts with cheap checks (filesize, magic bytes)
- Rule matches all target samples
- Rule produces zero matches on goodware corpus
-
yr checkpasses with no errors -
yr fmt --checkpasses (consistent formatting) - Linter passes with no errors
- Peer review completed
Additional Resources
Quality YARA Rule Repositories
| Repository | Focus | Maintainer |
|---|---|---|
| Neo23x0/signature-base | 17,000+ production rules, multi-platform | Florian Roth |
| Elastic/protections-artifacts | 1,000+ endpoint-tested rules | Elastic Security |
| reversinglabs/reversinglabs-yara-rules | Threat research rules | ReversingLabs |
| imp0rtp3/js-yara-rules | JavaScript/browser malware | imp0rtp3 |
Guides
| Guide | Purpose |
|---|---|
| YARA Style Guide | Naming conventions, metadata |
| YARA Performance Guidelines | Atom optimization, regex bounds |
| Kaspersky Applied YARA Training | Expert techniques |
Official Documentation
Author
Trail of Bits (opensource@trailofbits.com)