Changelog¶
All notable changes to md-bridge are documented in this file.
The format is based on Keep a Changelog and this project adheres to Semantic Versioning. Each version section answers three questions for the reader:
- Added — new behaviour or files that did not exist before.
- Changed — behaviour or files that already existed and now work differently.
- Removed — behaviour or files that no longer exist.
If a section is empty in a release, the section is omitted entirely.
Unreleased¶
Added¶
- Optional Tesseract OCR pre-pass for scanned PDFs by @0exec
(first external contribution from this account). New
apps/api/app/services/ocr.pyruns whenMD_BRIDGE_OCR_ENABLED=1and the inspect diagnostics reportneeds_ocr: true. PDFs that already carry a text layer skip OCR entirely. Response payload gainedocr_applied: bool. New Docker stageruntime-ocr(lean defaultruntimestage unchanged),[ocr]extras inpyproject.toml, and CI installstesseract-ocrfor the integration tests. (#86, closes #5) - Descriptive iframe title on the PDF preview by @zhouzhou626
(third PR landed by this contributor). New
mdToPdf.previewIframeTitledictionary key across EN, PT-BR, ES, bound atapps/web/src/pages/MdToPdf.tsx:119. Screen readers now announce "Generated PDF preview, frame" instead of inheriting the page heading. WCAG 4.1.2 (Name, Role, Value). (#87, closes #72) - Heuristics documentation page at
docs/heuristics.md: document profile, heading detection, TOC normalization, list recovery, table cleanup, inline formatting, paragraph stitching, header/footer suppression, front matter, and the deliberately out-of-scope items. Each section names the function inpackages/pdf-to-markdown/scripts/convert.py. (#92) - FAQ page at
docs/faq.md: why OCR is opt-in, why the converter persists nothing, how to add a new locale, heuristics vs language model, Playwright + headless Chromium trade-off, server-side vs client-side conversion, offline operation. (#93) - API recipes page at
docs/api-recipes.md: copy-paste recipes for the four HTTP endpoints from curl, Python requests, and JavaScript fetch (browser + Node), plus error-envelope reading, CORS configuration, and rate-limit guidance. (#97, closes #26) - Deployment recipes page at
docs/deployment-other.md: Render, Fly.io, and Railway walkthroughs with blueprints, free-tier caveats, and a consolidated common-gotchas section (VITE_API_URLbuild-time, CORS, cold-start timeouts, 500 MB upload cap). (#98, partial fix for #7)
Changed¶
- Contribution-guide section in every open issue body. All 27 open issues now carry a standardized "How to contribute" block covering claim workflow, branch naming, Conventional Commits, test pyramid, no AI co-authors, squash merge, and reversibility declaration. The section is idempotent: re-running the batch skips issues that already carry it.
- Scorecard exceptions doc (
docs/scorecard-exceptions.md) updated to cover the three new Pinned-Dependencies paths introduced by the OCR pipeline (apps/api/Dockerfile:49,60and.github/workflows/ci.yml:28) plus a new Maintained section documenting the time-based auto-resolution. (#90) - Contributors recognition: @zhouzhou626 gains the
a11ytype (#88); @0exec joins the contributors block withcode,doc,test,infratypes (#89). Avatar URLs use numeric ID form so username changes do not break the links.
Removed¶
- Orphan
useConvert.tshook atapps/web/src/hooks/. DeclaredusePdfToMd()anduseMdToPdf()but had zero callsites. Pages call the API functions directly fromsrc/lib/api.ts; the batch flow usesuseBatchConvert. The hook was superseded during early development. (#91) - Unused
docs/brand/social-preview.pngasset. Not referenced frommkdocs.yml, README, any markdown, or any HTML meta tag. Noog:imageis wired in the docs site or the React app. (#91) - Dead
mkdocs.ymlnav entry that pointed atdocs/deployment/oracle-cloud.md, which never existed in repo history. Replaced with the newDeploy: deployment-other.mdentry so MkDocs builds without the orphan-page warning. (#98)
0.2.3 — 2026-05-20¶
Minor release. Headline change is the first external contribution:
@ko4lax landed the WCAG 2.1 AA audit and remediation in PR #54. The
release also ships the new design system catalogue (PRs #58 + #65)
that future UI work tracks back to, plus the trilingual screenshots
refresh, the Spanish locale page tests, the all-contributors adoption,
and a handful of CI hygiene fixes accumulated since v0.2.2.
Added¶
- WCAG 2.1 AA accessibility audit and remediation by @ko4lax (first
external contributor on the project).
DropZone.tsxrefactored so the file input is a sibling of therole="button"wrapper instead of nested inside it (closes the axe nested-interactive-controls violation); skip-to-content link added inApp.tsx; nav landmark labelledMain navigation; batch progress carries anaria-live="polite"announcement. Newdocs/accessibility-audit.mddocuments findings with WCAG identifiers (wcag2a,wcag2aa,wcag21a,wcag21aa) and reproduction steps. Newapps/web/e2e/audit.spec.tswires@axe-core/playwrightinto the existing E2E job so CI fails on critical or serious violations going forward. (#54, closes #36) docs/design/design system catalogue. Self-contained HTML (design-thinking.html) with hi-fi mockups and paste-ready issue specs for eight features: F1 CSS theme picker, F2 theme library, F3 per-conversion options panel, F4 format hub (DOCX/EPUB/HTML/RTF), F5 language workshop, F6 conversion presets, F7 local history, and F8 preferences page. The HTML reusesapps/web/src/styles/tokens.cssverbatim so visual changes to the React app propagate to the catalogue. Published with the docs site at /design/; MkDocs nav now includes a Design system entry; CONTRIBUTING.md routes contributors who pick updesign-requiredissues at the catalogue first; README gains a dedicated Design system section above Limits. Seeds six new feature issues (#59 F3, #60 F4, #61 F5, #62 F6, #63 F7, #64 F8). (#58)docs/design/screenshots/retina captures (1440x900 at 2x device scale) of every catalogue section: hero, principles, foundations, roadmap, plus F1..F8 mockup spreads. The design landing page and the GitHub-view README render the gallery so contributors preview the catalogue before opening the HTML. (#65)docs/screenshots/home-es.pngSpanish home-page screenshot at the same 2880x1800 retina resolution as the EN and PT companions. README's screenshot section now shows the trilingual UI in a 3-column table (EN / PT / ES).docs/screenshots/demo.gifwas regenerated with the three locales at the start so the README hero shows the trilingual capability before the conversion flow. (#49)- Spanish locale coverage added to the page-level integration tests:
About.test.tsxasserts the ES title,Home.test.tsxasserts the ES hero headline, andNavigation.test.tsxflips the whole UI to ES via the language toggle and asserts both the headline and the About link translate. The Portuguese tests stay in place. (#49) - all-contributors specification adopted.
.all-contributorsrcat the repo root tracks every contributor and their kind of contribution (code, doc, translation, design, review, test, infra, maintenance). README gains a## Contributorssection between## Licenseand## If md-bridge helped youthat renders the current list.CONTRIBUTING.mddocuments how to be credited: no bot to install, no extra PR; the maintainer regenerates the README block during release prep. (#48) @ko4laxcredited as a contributor with categoriescode,doc,test,infra,translationper the diff classification in PR #54. Avatar URL uses the numeric-ID form per the maintainer credit rule. (#67).github/workflows/pr-linked-issue.ymlposts a single one-line comment on every issue closed via "Closes #N" naming the PR author, so attribution survives in the casual reader's view. (#56, refined in #57)
Changed¶
- All
docs/screenshots/*.pngrefreshed at the current 2880x1800 retina resolution from the post-v0.2.2 UI state (after the About rewrite, the extensibility positioning, and the warning i18n fix). Thepdf-to-md,pdf-to-md-batch,md-to-pdf,about, andswaggercaptures now match the UI a contributor sees today.demo.gifwas regenerated with eight frames covering the trilingual home pages and the full conversion flow. (#51) Validate PR titleis now a required status check onmain. Previously the Conventional Commits validation ran on every PR but did not block the merge if it failed (caught when PR #49 merged with the malformed scopefeat(web,docs):despite a red title check). Branch protection now requires every PR title to parse cleanly as<type>(<scope>)<!>: <description>before the merge button enables. CONTRIBUTING.md's branch-protection list is updated to include the sixth required check. (#50)- CONTRIBUTING.md regression test guidance promoted from a single paragraph to a step-by-step checklist under "Writing a good regression test". Documents the failing-diff format, tier choice, fixture vs synthetic input, and the no-silent-skips rule with a worked example from PR #20. (#52)
@zhouzhou626's entry in.all-contributorsrcswitched to the numeric-ID avatar URL (was producing identicon fallback) and gained thedoccontribution credit for PR #52. CONTRIBUTING.md codifies the post-merge maintainer credit rule as a five-step mechanical checklist any future maintainer (or AI assistant) can apply without judgement. (#53)
Fixed¶
MdToPdf.tsxwas passingt.pdfToMd.ready("Ready" / "Pronto" / "Listo") to theConvertButton'ssuccesslabel slot instead of the page-ownedt.mdToPdf.success("PDF ready." / equivalents). The branch was unreachable under the currentstatus={batch.running ? 'loading' : 'idle'}state machine, so no user saw the wrong copy, but the dead path would have surfaced the next time anyone wired the success state. Aligned to the correct key. (#66)
0.2.2 — 2026-05-19¶
Patch release. Headline change is the trilingual warning fix; the rest is
governance, infrastructure, and documentation polish accumulated since
v0.2.1.
Fixed¶
/api/pdf-to-mdwarnings now follow the active UI locale. The backend used to emit hardcoded English strings ("Very little text was extracted…"); PT and ES users saw English while the rest of the UI was in their locale. Backend emits stable codes (needs_ocr,images_not_persisted); the frontend dictionary translates per locale. The lookup falls back to the raw string for unknown codes so future warnings stay forward-compatible. (#40, PR #42)apps/api/app/main.pySwagger metadata pointedcontact.urland theAPI_DESCRIPTIONmarkdown link at the placeholderhttps://github.com/your-org/md-bridge. The Swagger UI at/docssurfaced both. Replaced with the real repository URL. (PR #46)docker-publish.ymlsmoke test for the Web image was runningnginx -tagainst the bundled config, which containsproxy_pass http://api:8000. In an isolated container theapihostname does not resolve, so the parse failed and the workflow reported a red CI even though the publish itself succeeded. The smoke now asserts that the Vite build stage producedindex.htmland copied it to the nginx web root. (PR #20)
Added¶
- Conventional Commits 1.0.0 is now the project's commit and
PR-title convention. New CI workflow
semantic-pr.ymlrejects PR titles that do not match<type>(<scope>)<!>: <description>.CONTRIBUTING.mdgains a full reference section with the recognised types, bump rules, and worked examples. Therelease-drafter.ymlconfig gains anautolabelerblock. (PR #19) CONTRIBUTING.mdnow documents the issue-claiming process: contributors comment to claim, maintainer assigns via the native GitHubassigneefield, seven-day window before the issue returns to the pool. (PR #41)- Issue templates (
bug_report.md,feature_request.md) now require a test plan with explicit file paths and tiers. Feature template also gains Architect and Design notes sections so the tri-disciplinary review pattern shows up before the issue is filed. (PR #43) docs/screenshots/warning-i18n.pngvisual proof for #40 (deterministic Pillow render, no AI image generation). (PR #45)
Changed¶
- Pre-commit hooks moved from a separate "Optional" section near
Tests to Local setup in
CONTRIBUTING.mdso new contributors see them at the same moment they install Python and Node. The "Strongly recommended" framing replaces "Optional". A new paragraph explains that the hooks deliberately do not check branch staleness; branch protection onmain("require branches to be up to date before merging") handles that server-side. PR template checklist gains the two matching items. (PR #44) - Project descriptions across
package.json,apps/api/pyproject.toml,README.md, anddocs/index.mdnow state the extensibility intent explicitly: md-bridge is a document converter that ships PDF ↔ Markdown today and welcomes new format pairs as contributions land. The GitHub repo description and topics were updated to match. (PR #39) - About page copy rewritten across
en,pt, andesin an OSS-professional register. New copy leads with positioning ("open source, self-hosted, deterministic, no model inference, no telemetry") and names the heuristic stack (PyMuPDF + headless Chromium) directly. "Built with" becomes "Open source" with explicit MIT-licence andCONTRIBUTING.mdpointers. (PR #21) - Theme picker for Markdown → PDF (#14) reorganised as an umbrella
issue with three sister sub-issues: design (#22, CSS templates),
backend (#23, registry +
/api/themes), frontend (#24, picker dropdown). The pattern is now the project's reference for multi-discipline features.
0.2.1 — 2026-05-19¶
Fixed¶
docker-publish.ymlnow builds multi-platform images (linux/amd64+linux/arm64). The Oracle Cloud Always Free deployment recipe targets ARM Ampere A1 VMs but the previous amd64-only publishes faileddocker pullon ARM hosts with a manifest-mismatch error. Apple Silicon developers were affected by the same issue. A post-publish smoke job verifies both arches by pulling the image and running a minimal probe. (#12)
0.2.0 — 2026-05-19¶
First minor release. Ships the new trilingual UI plus a wider set of visibility, distribution, and contributor-onboarding work.
Added¶
- Spanish (
es) locale in the web UI. The header toggle now lists EN / PT / ES. Locale detection and the<html lang>attribute were generalised so future locales drop in without further code changes. Translations are native-quality; tests and the Playwright spec exercise all three locales. (#9 by @zhouzhou626 — first external contributor.) - Oracle Cloud Always Free deployment recipe under
deployment/oracle-cloud/: step-by-stepREADME.md,bootstrap.shthat installs Docker + Caddy + the stack on a fresh ARM Ampere A1 VM, and a referenceCaddyfile.example. Cost: zero. The docs site picks up the page under a new "Deploy" nav section. - Release-drafter workflow that keeps a draft GitHub Release in
sync with merged PRs on
main. Categories are driven by PR labels (enhancement,bug,security,documentation,chore, ...) and the next semver bump is resolved automatically (major / minor / patch). Config in.github/release-drafter.yml. workflow_dispatchtrigger ondocker-publish.ymlso a manual re-publish from the Actions UI is now possible without an unrelated commit.- Documentation site at https://vinicq.github.io/md-bridge/.
MkDocs Material build deployed to GitHub Pages on every doc change.
mkdocs.ymlplusdocs/index.mdanddocs/getting-started.mdprovide a curated landing experience separate from the GitHub README. - Docker images on GHCR: a release-triggered workflow publishes
ghcr.io/vinicq/md-bridge-apiandghcr.io/vinicq/md-bridge-webso users candocker pullinstead of building locally. Tags follow the semver scheme; both images are public. - OpenSSF Scorecard workflow that runs weekly + on push, surfaces the result in the Security tab, and exposes a public score at scorecard.dev. README gains a Scorecard badge alongside CI and CodeQL.
- Brand assets under
docs/brand/(logo, wordmark, social preview). Programmatic Pillow geometry, deterministic, no AI generation. - Demo GIF at
docs/screenshots/demo.gif, used as the README hero. - Star history chart and a "If md-bridge helped you" CTA at the bottom of the README.
0.1.1 — 2026-05-19¶
Maintenance and governance release. No behaviour changes in the converter; only infrastructure, security posture, and contributor ergonomics.
Added¶
- Optional
pre-commitconfiguration that runsruffand basic hygiene hooks (trailing whitespace, EOF newline, YAML/TOML syntax, merge conflict markers, large files) before every commit. Documented inCONTRIBUTING.md. .github/workflows/dependabot-auto-merge.ymlenablesgh pr merge --autofor Dependabot PRs that are patch bumps (X.Y.Z → X.Y.Z+1) or transitive (indirect) dependency updates. Branch protection still gates the actual merge on every required status check; minor and major bumps stay in the manual review queue.- Branch protection on
maindocumented inCONTRIBUTING.md: required status checks for Backend, Web, End-to-end, and the two CodeQL jobs; force-push and deletion blocked; linear history required. SECURITY.mdnow lists the GitHub-native defenses that are active on the repository so contributors know what they get for free (secret scanning, push protection, CodeQL, Dependabot, private vulnerability reporting).
Changed¶
- GitHub Actions bumped to current majors:
actions/checkoutv4 → v6,actions/setup-pythonv5 → v6,github/codeql-actionv3 → v4,actions/setup-nodev4 → v6,actions/upload-artifactv4 → v7. Clears the Node.js 20 deprecation warnings on the runner. - Docker base images bumped: web
node:22-alpine→node:26-alpine, web runtimenginx:1.27-alpine→nginx:1.31-alpine. - npm devDependencies bumped:
typescript-eslint8.59.3 → 8.59.4 (patch),@types/node24.12.4 → 25.9.0.
Security¶
- Enabled GitHub-native repo features via API: secret scanning, push protection, private vulnerability reporting, vulnerability alerts, Dependabot security updates.
- Branch protection on
mainrequires every CI and CodeQL status check to pass before a merge can land.
0.1.0 — 2026-05-19¶
First tagged release. md-bridge is a self-hosted PDF and Markdown converter with a FastAPI backend and a React frontend.
Added¶
- PDF to Markdown conversion with heading detection, list recovery, table extraction, and YAML front matter.
- Markdown to PDF rendering through headless Chromium with a bundled A4 stylesheet.
- Batch mode in the UI: drop one file or a whole folder; each file is converted sequentially and can be downloaded as it lands.
/api/inspect-pdfendpoint returns diagnostics (fonts, sizes, tagged-PDF flag, OCR hint) so the UI can warn before conversion.- Bilingual UI in English (default) and Portuguese, with the choice
persisted to
localStorage. - Interactive API docs at
/docs(Swagger UI) and/redoc, plus a walkthrough indocs/API.md. - Docker Compose stack for one-command boot of API + Web with healthchecks.
- Test pyramid with 124 tests (92 unit, 26 integration, 6 end-to-end), every one of which runs on CI against the committed ISTQB CTAL-TA syllabus fixture. No silent CI skips.
- CI workflow for backend pytest, web build + lint + vitest, and Playwright end-to-end.
- CodeQL static security analysis on every push and pull request, with a weekly scheduled scan, covering both Python and TypeScript.
- Backend linting via
ruffwith theE F W I UP Brule set, enforced in CI. - Frontend linting via ESLint, enforced in CI.
- Open source governance files:
LICENSE(MIT),CONTRIBUTING.md,CODE_OF_CONDUCT.md,SECURITY.md,.github/dependabot.yml, issue and PR templates,.editorconfig.