falsegreen (Python)¶
The deterministic Python/pytest scanner. A zero-dependency AST pass that validates each test against the false-positive codes a parser can prove. HIGH findings block the commit, LOW ones warn, and a diagnostic/coupling group is opt-in.
- Repository: github.com/vinicq/falsegreen
- Catalog: Python codes
Install¶
Use¶
falsegreen path/to/tests # scan
falsegreen --staged # only staged files (pre-commit)
falsegreen --json # machine-readable report
falsegreen --diagnostics # include the opt-in F8 group
falsegreen --config-audit # read pytest/coverage config for project-level false-green
HIGH findings exit non-zero, so the tool drops into CI and pre-commit unchanged. The report numbers each finding with its code, judgment, pyramid level, location, evidence, and a fix hint.
First finding¶
Save a test that always passes:
Run the scanner over it:
It reports:
test_demo.py:3 [C5] always-true check (assert True / tuple / or True)
level: unit fix: assert the real behaviour, not a constant or tautology
Summary: 1 high, 0 low.
Reading a finding¶
Each line carries the same fields:
test_demo.py:3- the file and the line that triggered it.[C5]- the catalog code.C5is the always-true check. Every code is explained in the Python catalog.level: unit- which level of the test pyramid the file sits at; it changes what counts as a real check.fix:- a one-line hint. Here: assert the real behaviour, not a constant.
Exit codes wire it into CI: 0 clean, 10 low-confidence only, 20 at least
one high-confidence finding. Block the build on 20.
Complete usage and configuration¶
The getting-started above is the five-minute path. This section is the full reference: every install channel, every output format, every configuration knob, the exit-code contract, and the CI wiring. It mirrors what the project README documents.
Install channels¶
Python 3.8 or newer, no third-party runtime dependencies.
pip install falsegreen # project or virtualenv install
uvx falsegreen tests/ # run once without installing (uv)
pipx run falsegreen tests/ # run once without installing (pipx)
python -m falsegreen ... is equivalent to the falsegreen command when the entry point
is not on PATH.
Invocation¶
falsegreen # scan the current directory
falsegreen tests/ # scan a folder or a single file
falsegreen --staged # only the test files staged in git (pre-commit)
falsegreen --summary # one-line "N scanned, M flagged" to stderr
The scanner reads the test files only, it never imports or runs them, so a broken or hostile test cannot execute through it. Each finding carries its pyramid level (unit / integration / e2e, read from the file's imports) and a one-line fix hint; the text summary breaks findings down by level and lists the most common fixes.
Output formats¶
--format text|json|sarif|junit selects the report shape (default text). --json stays as
an alias for --format json.
falsegreen tests/ --json # machine-readable JSON
falsegreen tests/ --format sarif # SARIF 2.1.0
falsegreen tests/ --format junit # JUnit XML
falsegreen tests/ --output report.sarif # write to a file
falsegreen tests/ --output .falsegreen/ # write report.<ext> into a directory
jsoncarries the full envelope: tool, version, judgments, and the findings list.sarifemits SARIF 2.1.0, mapping HIGH toerrorand LOW towarning, for GitHub code scanning and inline pull-request annotations.junitemits JUnit XML, where HIGH findings become<failure>so a CI test reporter surfaces them as a failing suite.
--output takes a file or a directory: an extension-less or trailing-slash path (.falsegreen/)
receives report.<ext> for the chosen format. Reports are run artifacts, so keep the output
directory gitignored.
Configuration¶
Disable codes (CLI). --disable C6,C2b turns specific codes off for one run.
Inline suppression. A comment on the offending line silences a justified finding without disabling the code suite-wide:
assert user.id == user.id # falsegreen: ignore[C7] silence only C7 on this line
assert x # falsegreen: ignore silence every code on this line
Only the exact falsegreen: token suppresses; a plain # ignore does not.
Project config file. [tool.falsegreen] in pyproject.toml, or a flat .falsegreen.toml
at the repo root (.falsegreen.toml wins if both exist). Point at a specific file with
--config PATH.
[tool.falsegreen]
disable = ["C13b"] # turn these codes off everywhere
exclude = ["tests/legacy/*"] # skip files matching these globs
long_test_threshold = 30 # line-count limit for M2 (default: 50)
inline_setup_threshold = 3 # statement limit for D5 (default: 5)
[tool.falsegreen.severity]
C8 = "high" # promote: now blocks the commit (exit 20)
C6 = "off" # same as adding C6 to disable
C22 = "low" # enable the async-never-awaits check
D1 = "info" # enable Assertion Roulette (diagnostic, never blocks)
M2 = "info" # enable Long Test Method (diagnostic)
severity values: high, low, info, or off. info findings appear in the
DIAGNOSTIC / COUPLING sections and do not affect the exit code, the opt-in F8 group. The
long_test_threshold and inline_setup_threshold keys live directly under
[tool.falsegreen], not inside [severity]. Precedence, highest first: --disable on the
CLI, inline # falsegreen: ignore, the config file, the built-in default.
Config audit. --config-audit is a separate mode. Instead of scanning test files, it
reads the project's pytest and coverage config (pyproject.toml, pytest.ini, tox.ini,
setup.cfg) and reports the project-layer ways a suite stays green by configuration:
PL1-python -O/PYTHONOPTIMIZEstrips everyassertat runtime.PL2-filterwarningsdoes not promote warnings to errors.PL7- no coverage gate (--cov-fail-underabsent).PL8-addoptsstops the run early with-x/--maxfail, masking the count.
Baseline (adopt on a legacy repo). Record the findings you already have, then fail only on new ones:
falsegreen --write-baseline tests/ # write .falsegreen-baseline.json, exit 0
falsegreen --baseline tests/ # suppress recorded findings, fail on new ones
A finding is fingerprinted by relative path, code, detail, and normalized source line (not
line number), so prepending code does not re-trigger a baselined finding. Commit
.falsegreen-baseline.json and the ratchet only tightens.
Exit codes¶
| Code | Meaning |
|---|---|
0 |
clean, no findings that affect the gate |
10 |
low-confidence findings only |
20 |
at least one high-confidence finding |
Block the build on 20. The pre-commit hook honors the same contract.
CI integration¶
GitHub Actions. A failing job on exit 20:
name: falsegreen
on: [push, pull_request]
jobs:
scan:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with: { python-version: "3.x" }
- run: pip install falsegreen
- run: falsegreen tests/ # exit 20 fails the job
SARIF upload to GitHub code scanning. Emit SARIF and hand it to the CodeQL action so findings show inline on the pull request:
- run: falsegreen tests/ --format sarif --output falsegreen.sarif
continue-on-error: true
- uses: github/codeql-action/upload-sarif@v3
with:
sarif_file: falsegreen.sarif
Pre-commit hook. Add to .pre-commit-config.yaml:
Then pre-commit install. The hook entry is falsegreen --staged with pass_filenames: false,
so it reads the staged test files itself; do not add file arguments or re-enable pass_filenames,
or some files scan twice. Pin a tag (never a branch) so local runs and CI use the same scanner;
pre-commit autoupdate rewrites the rev. Bypass once with git commit --no-verify, or set
FALSEGREEN_BLOCK=0 to make the hook warn-only. To run on push instead of commit, set
stages: [pre-push] under the hook. A raw git hook without the framework:
python -m falsegreen.hook_install --repo . # install
python -m falsegreen.hook_install --uninstall # remove
For the semantic cases a parser cannot reach, pair the scanner with the falsegreen-skill LLM pass.
What it covers¶
The most complete scanner of the family - it is the reference the others mirror. The full per-code detail is in the Python catalog.
| Group | Codes | Effect |
|---|---|---|
| False-positive (F1-F6) | ~45 active C* codes + CC |
HIGH blocks, LOW warns |
| Diagnostic / coupling (F8) | D1, D3, D4, D5, D6, M2 |
opt-in, never blocks |
Project / CI (F5, --config-audit) |
PL2 (filterwarnings not error), PL7 (no --cov-fail-under), PL8 (-x/--maxfail masks the count) |
reads config, reports |
What it does not cover, and why¶
Out of scope (the wrong axis)¶
Brittleness/false-red, hygiene, slow, design, naming, duplication, runtime/culture are not false-green. See coverage vs the literature for the full boundary.
Codes deliberately not implemented¶
These were evaluated against the consolidated catalog and left out, each for a reason. Leaving them out is the precision-first policy: a false positive is worse than a miss.
| Code | What it would flag | Why not |
|---|---|---|
| C40 | assert mock.attr with no spec (always truthy) |
without spec analysis the false-positive rate is high; the concept lives in the skill (F7) |
| C46 | real network/DB with no double (requests, socket) |
legitimate in an integration test; flagging it needs to know the level, so it routes to the skill / --config-audit |
| C47 | assertion depends on dict/set ordering | high false positive (most collections are deterministic in use); a skill note instead |
Reserved for the semantic pass (F7)¶
Mocking the unit under test (case 10), asserting the value fed to the mock (case 11),
re-implementing the production formula (case 12), an expected value that contradicts intent
(case 18), borrowed shared state (case 15). No AST proves intent or inter-procedural flow. These
live in falsegreen-skill. C14 (snapshot of the code's own output) is the only
codifiable corner.
Needs runtime (not promised statically)¶
python -O stripping assert, a collection error reported as "0 tests passed", a CI step that
runs a subset and reports green (PL1/PL4/PL6). PL1 now has a config-discoverable slice:
--config-audit flags python -O/-OO or PYTHONOPTIMIZE=1 set in tox.ini/pytest addopts
as a project-level warning. The rest only appear when the suite runs; they are documented, not
claimed. The honest path is mutation testing (mutmut, cosmic-ray), which is out of band.
See scope and honesty for the layer boundary.