Coverage by failure family¶
The denominator page names the universe the family is measured against (the Open Catalog of Test Smells, 517 documented smells) and the axis in scope (false-green only). This page goes one level finer: per failure family (F1-F8), which codes the four scanners actually ship, with a link to the public code that proves it.
What this page counts, and what it does not¶
This is a public-code view, not an evaluation. It counts codes the ecosystem ships per family, each linked to the catalog entry or the scanner that emits it. It does not report precision or recall against the catalog, and it carries no dataset evidence. Those numbers are measured against the false-green slice of the denominator and released with the study, not here. The honest reading of the table below: "the family ships these codes for this failure mode," not "the family detects N% of the literature."
A code can appear under more than one scanner when the same id covers the same mechanism in a different language (C5 is the always-true assertion in Python, JS/TS, and Robot). Counting a family by distinct ids, not by per-scanner rows, avoids double counting.
The four scanners¶
| Scanner | Language | Public catalog (the code list) |
|---|---|---|
| falsegreen | Python / pytest | README catalog · scanner.py |
| falsegreen-js | JS / TS | README catalog |
| robotframework-falsegreen | Robot Framework | README catalog |
| falsegreen-skill | semantic (LLM) | reference.md (the superset) |
The skill is the superset: every structural code the three static scanners emit appears in its
reference.md, plus the
semantic-only codes (cases 10, 11, 12, 15, 18) that no parser can reach.
Codes per family¶
The taxonomy (F1-F8) is the conceptual axis: how a test goes green without protecting anything. The codes below are the public ids the ecosystem ships for each mode. Static scanners cover F1-F3, F5, and the static proxies of F6; the skill adds F4 and F7; F8 is the diagnostic group, off by default.
| Family | Failure mode | Codes the ecosystem ships | Layer |
|---|---|---|---|
| F1 | Checks nothing (no oracle) | C2, C2b, C2c, C27, C39, C50, C51, JS2, JS6, JS13, R2, R4, R7, semantic cases 10/11 |
static + skill |
| F2 | The check exists but never runs | C1, C3, C20, C21, C22, C43, CC, JS5, JS7, JS9, JS11, JS25, JS26, JS29, JS31, R8, R8b |
static |
| F3 | The check is trivial (always passes) | C5, C6, C6c, C7, C8, C8b, C11a, C18, C34, C42, C44, C52, JS15, JS21, JS30, R1, R6 |
static |
| F4 | Checks the wrong thing | C9, C9b, C19, C28, C49, C55, JS8, JS24, JS27; semantic case 18, parts of C6 / C33 / the snapshot codes |
static (partial) + skill |
| F5 | Drops out of the count (skip / not collected) | C4, C4b, C25, C32, C38, C45, JS1, JS4, JS22, JS23, R3, R5; project layer: PL1, PL2, PL7, PL8, PL9, PL10 |
static + project layer |
| F6 | Passes or fails by luck (non-determinism) | C16, C23, C24, C29, C35 (static proxies) |
static (proxy) + runtime |
| F7 | Circular or semantic oracle | semantic cases 10, 11, 12, 15; C14 (the codable corner) |
skill + mutation testing |
| F8 | Hygiene / readability (not false-green) | D1, D3, D4, D5, D6, D7, D8, M2 (opt-in diagnostics) |
diagnostic / linter |
The exact, current code list per scanner lives in each repository's README catalog and in the
skill's reference.md. This
table groups those published codes by failure mode; it does not invent new ones. Where a code maps
to more than one family (a code can be both "never runs" and "weak"), it is listed under the family
that names its primary mechanism.
What is and is not counted per family¶
- F1, F2, F3 are fully static and saturated: a per-file parser proves them with no false negatives inside its rules. The scanner READMEs list every id.
- F4 is counted only for the slice a parser can reach (a string-format comparison, a discarded metric). The contradicts-the-spec core is semantic and lives in the skill (case 18); it is not a static count.
- F5 has two slices: the per-file slice (a test not collected, a non-strict xfail) counted in
the scanner codes, and the project slice (
PL1,PL2,PL7,PL8,PL9,PL10, read by--config-audit) counted separately. The runtime slice (a collection error reported as "0 tests") is documented, not a code. - F6 is counted only as static proxies (
C16for uncontrolled time/randomness,C23for a hard-coded path). Whether a test is flaky in practice needs runtime and is out of band, so it is not counted here. - F7 is the semantic family. Only
C14(a snapshot generated from the code's own output) is a static code; the rest (mocking the unit under test, re-implementing the formula, borrowed state) are skill cases and are confirmed with mutation testing, which the skill never runs itself. They are listed, not counted as static coverage. - F8 is not false-green. The diagnostic codes are off by default, and dedicated linters (ruff, ESLint, Robocop) cover the same ground. They are surfaced on request, not promised as detection.
Why this is the honest framing¶
A coverage claim is only meaningful against a named denominator. The percentage of the catalog the family detects is reported against the false-green slice, not the whole 517, and that number ships with the study. This page stops at what the public code proves: which codes exist per family, where the code is, and which layer owns each mode. The boundary in the scope page and the cross-walk in the denominator page say what stays out and why.