Coverage versus the literature¶
The family measures itself against the Open Catalog of Test Smells (UFAL / easy-software): 517 documented smells across 1621 references and 166 sources. Naming the denominator keeps the claims honest. It says what universe the tools were checked against, and, just as important, what stays out of scope on purpose.
One axis: false-green¶
Every tool in the family detects one thing: a test that passes green without protecting anything (the F1-F8 taxonomy). The overwhelming majority of the catalog is a different concern. Mixing those in would raise the false-positive rate, and a tool that cries wolf gets turned off.
What stays out, and why¶
| Category | Why it is out | Examples (catalog id) |
|---|---|---|
| Brittleness / false-red | breaks without a real bug; the opposite axis | Sensitive Equality, Brittle Assertion, Fragile Fixture, Interface/Behavior Sensitivity, Overspecified Software |
| Hygiene / maintainability | the test still protects; it is just hard to read (linter territory: ruff, ESLint, Robocop) | Assertion Roulette, Magic Number Test, Long Test, Verbose Test, Overcommented Test, Missing Assertion Message |
| Slow / performance | execution time, invisible in the file | Slow Test, Slow Component Usage |
| Design | architecture, not the green of the test | Constructor Initialization, Refused Bequest, Hard-To-Test Code |
| Naming / docs | readability; a linter owns it | Anonymous Test, Bad Naming, Absence Of Why, Bad Comment Rate |
| Duplication | maintainability | Test Code Duplication, Duplicated Code, Duplicate Assert |
| Runtime / culture | needs the suite to run, or is a team practice | Test Run War, Erratic Test, Manual Intervention, Frequent Debugging |
The catalog uses its own id scheme (A16, S06, C30), which is not the same as the family's
codes (C1, C23). The ids above are the catalog's.
The boundary is deliberate, not accidental¶
Several smells look out of scope but have a false-green proxy the family does catch. The tools take the statically provable, low-false-positive form of each boundary and leave the rest:
| Catalog smell | The proxy the family catches |
|---|---|
| Nondeterministic / flaky test | C16 (uncontrolled time, randomness, or sleep) |
| Resource Optimism | C23 (hard-coded path or IP URL) |
| Interacting / order-dependent tests | C24 / C15 / S13 (shared state) |
| Conditional Test Logic | C1 / C21 (an assertion that may never run) |
| No Assertions / Empty / Assertionless Test | C2 / C2b |
| Rotten Green Test | the whole product: the false-green axis |
A few hygiene smells are surfaced as opt-in diagnostics (off by default, not false-green):
Assertion Roulette as D1, Magic Number as D8, Duplicate Assert as D3, Long Test as M2.
Where a dedicated linter covers a smell better, the scanners defer to it.
Why this matters for research¶
This page is the threats-to-validity statement in public form. Precision and recall are reported against the false-green slice of the catalog, not the whole 517, and the boundary table above shows exactly where the line sits. The full cross-walk (every catalog smell mapped to in-scope or out, with the reason) lives in the private research hub; only the denominator and the boundary are public, never the raw adjudication.