Skip to content

Coverage versus the literature

The family measures itself against the Open Catalog of Test Smells (UFAL / easy-software): 517 documented smells across 1621 references and 166 sources. Naming the denominator keeps the claims honest. It says what universe the tools were checked against, and, just as important, what stays out of scope on purpose.

One axis: false-green

Every tool in the family detects one thing: a test that passes green without protecting anything (the F1-F8 taxonomy). The overwhelming majority of the catalog is a different concern. Mixing those in would raise the false-positive rate, and a tool that cries wolf gets turned off.

What stays out, and why

Category Why it is out Examples (catalog id)
Brittleness / false-red breaks without a real bug; the opposite axis Sensitive Equality, Brittle Assertion, Fragile Fixture, Interface/Behavior Sensitivity, Overspecified Software
Hygiene / maintainability the test still protects; it is just hard to read (linter territory: ruff, ESLint, Robocop) Assertion Roulette, Magic Number Test, Long Test, Verbose Test, Overcommented Test, Missing Assertion Message
Slow / performance execution time, invisible in the file Slow Test, Slow Component Usage
Design architecture, not the green of the test Constructor Initialization, Refused Bequest, Hard-To-Test Code
Naming / docs readability; a linter owns it Anonymous Test, Bad Naming, Absence Of Why, Bad Comment Rate
Duplication maintainability Test Code Duplication, Duplicated Code, Duplicate Assert
Runtime / culture needs the suite to run, or is a team practice Test Run War, Erratic Test, Manual Intervention, Frequent Debugging

The catalog uses its own id scheme (A16, S06, C30), which is not the same as the family's codes (C1, C23). The ids above are the catalog's.

The boundary is deliberate, not accidental

Several smells look out of scope but have a false-green proxy the family does catch. The tools take the statically provable, low-false-positive form of each boundary and leave the rest:

Catalog smell The proxy the family catches
Nondeterministic / flaky test C16 (uncontrolled time, randomness, or sleep)
Resource Optimism C23 (hard-coded path or IP URL)
Interacting / order-dependent tests C24 / C15 / S13 (shared state)
Conditional Test Logic C1 / C21 (an assertion that may never run)
No Assertions / Empty / Assertionless Test C2 / C2b
Rotten Green Test the whole product: the false-green axis

A few hygiene smells are surfaced as opt-in diagnostics (off by default, not false-green): Assertion Roulette as D1, Magic Number as D8, Duplicate Assert as D3, Long Test as M2. Where a dedicated linter covers a smell better, the scanners defer to it.

Why this matters for research

This page is the threats-to-validity statement in public form. Precision and recall are reported against the false-green slice of the catalog, not the whole 517, and the boundary table above shows exactly where the line sits. The full cross-walk (every catalog smell mapped to in-scope or out, with the reason) lives in the private research hub; only the denominator and the boundary are public, never the raw adjudication.