Coverage by failure family¶

The denominator page names the universe the family is measured against (the Open Catalog of Test Smells, 517 documented smells) and the axis in scope (false-green only). This page goes one level finer: per failure family (F1-F8), which codes the four scanners actually ship, with a link to the public code that proves it.

What this page counts, and what it does not¶

This is a public-code view, not an evaluation. It counts codes the ecosystem ships per family, each linked to the catalog entry or the scanner that emits it. It does not report precision or recall against the catalog, and it carries no dataset evidence. Those numbers are measured against the false-green slice of the denominator and released with the study, not here. The honest reading of the table below: "the family ships these codes for this failure mode," not "the family detects N% of the literature."

A code can appear under more than one scanner when the same id covers the same mechanism in a different language (C5 is the always-true assertion in Python, JS/TS, and Robot). Counting a family by distinct ids, not by per-scanner rows, avoids double counting.

The four scanners¶

Scanner	Language	Public catalog (the code list)
falsegreen	Python / pytest	README catalog · `scanner.py`
falsegreen-js	JS / TS	README catalog
robotframework-falsegreen	Robot Framework	README catalog
falsegreen-skill	semantic (LLM)	`reference.md` (the superset)

The skill is the superset: every structural code the three static scanners emit appears in its reference.md, plus the semantic-only codes (cases 10, 11, 12, 15, 18) that no parser can reach.

Codes per family¶

The taxonomy (F1-F8) is the conceptual axis: how a test goes green without protecting anything. The codes below are the public ids the ecosystem ships for each mode. Static scanners cover F1-F3, F5, and the static proxies of F6; the skill adds F4 and F7; F8 is the diagnostic group, off by default.

Family	Failure mode	Codes the ecosystem ships	Layer
F1	Checks nothing (no oracle)	`C2`, `C2b`, `C2c`, `C27`, `C39`, `C50`, `C51`, `JS2`, `JS6`, `JS13`, `R2`, `R4`, `R7`, semantic cases 10/11	static + skill
F2	The check exists but never runs	`C1`, `C3`, `C20`, `C21`, `C22`, `C43`, `CC`, `JS5`, `JS7`, `JS9`, `JS11`, `JS25`, `JS26`, `JS29`, `JS31`, `R8`, `R8b`	static
F3	The check is trivial (always passes)	`C5`, `C6`, `C6c`, `C7`, `C8`, `C8b`, `C11a`, `C18`, `C34`, `C42`, `C44`, `C52`, `JS15`, `JS21`, `JS30`, `R1`, `R6`	static
F4	Checks the wrong thing	`C9`, `C9b`, `C19`, `C28`, `C49`, `C55`, `JS8`, `JS24`, `JS27`; semantic case 18, parts of `C6` / `C33` / the snapshot codes	static (partial) + skill
F5	Drops out of the count (skip / not collected)	`C4`, `C4b`, `C25`, `C32`, `C38`, `C45`, `JS1`, `JS4`, `JS22`, `JS23`, `R3`, `R5`; project layer: `PL1`, `PL2`, `PL7`, `PL8`, `PL9`, `PL10`	static + project layer
F6	Passes or fails by luck (non-determinism)	`C16`, `C23`, `C24`, `C29`, `C35` (static proxies)	static (proxy) + runtime
F7	Circular or semantic oracle	semantic cases 10, 11, 12, 15; `C14` (the codable corner)	skill + mutation testing
F8	Hygiene / readability (not false-green)	`D1`, `D3`, `D4`, `D5`, `D6`, `D7`, `D8`, `M2` (opt-in diagnostics)	diagnostic / linter

The exact, current code list per scanner lives in each repository's README catalog and in the skill's reference.md. This table groups those published codes by failure mode; it does not invent new ones. Where a code maps to more than one family (a code can be both "never runs" and "weak"), it is listed under the family that names its primary mechanism.

What is and is not counted per family¶

F1, F2, F3 are fully static and saturated: a per-file parser proves them with no false negatives inside its rules. The scanner READMEs list every id.
F4 is counted only for the slice a parser can reach (a string-format comparison, a discarded metric). The contradicts-the-spec core is semantic and lives in the skill (case 18); it is not a static count.
F5 has two slices: the per-file slice (a test not collected, a non-strict xfail) counted in the scanner codes, and the project slice (PL1, PL2, PL7, PL8, PL9, PL10, read by --config-audit) counted separately. The runtime slice (a collection error reported as "0 tests") is documented, not a code.
F6 is counted only as static proxies (C16 for uncontrolled time/randomness, C23 for a hard-coded path). Whether a test is flaky in practice needs runtime and is out of band, so it is not counted here.
F7 is the semantic family. Only C14 (a snapshot generated from the code's own output) is a static code; the rest (mocking the unit under test, re-implementing the formula, borrowed state) are skill cases and are confirmed with mutation testing, which the skill never runs itself. They are listed, not counted as static coverage.
F8 is not false-green. The diagnostic codes are off by default, and dedicated linters (ruff, ESLint, Robocop) cover the same ground. They are surfaced on request, not promised as detection.

Why this is the honest framing¶

A coverage claim is only meaningful against a named denominator. The percentage of the catalog the family detects is reported against the false-green slice, not the whole 517, and that number ships with the study. This page stops at what the public code proves: which codes exist per family, where the code is, and which layer owns each mode. The boundary in the scope page and the cross-walk in the denominator page say what stays out and why.