Skip to content

Changelog

All notable changes to md-bridge are documented in this file.

The format is based on Keep a Changelog and this project adheres to Semantic Versioning. Each version section answers three questions for the reader:

  • Added — new behaviour or files that did not exist before.
  • Changed — behaviour or files that already existed and now work differently.
  • Removed — behaviour or files that no longer exist.

If a section is empty in a release, the section is omitted entirely.

Unreleased

Added

  • Optional Tesseract OCR pre-pass for scanned PDFs by @0exec (first external contribution from this account). New apps/api/app/services/ocr.py runs when MD_BRIDGE_OCR_ENABLED=1 and the inspect diagnostics report needs_ocr: true. PDFs that already carry a text layer skip OCR entirely. Response payload gained ocr_applied: bool. New Docker stage runtime-ocr (lean default runtime stage unchanged), [ocr] extras in pyproject.toml, and CI installs tesseract-ocr for the integration tests. (#86, closes #5)
  • Descriptive iframe title on the PDF preview by @zhouzhou626 (third PR landed by this contributor). New mdToPdf.previewIframeTitle dictionary key across EN, PT-BR, ES, bound at apps/web/src/pages/MdToPdf.tsx:119. Screen readers now announce "Generated PDF preview, frame" instead of inheriting the page heading. WCAG 4.1.2 (Name, Role, Value). (#87, closes #72)
  • Heuristics documentation page at docs/heuristics.md: document profile, heading detection, TOC normalization, list recovery, table cleanup, inline formatting, paragraph stitching, header/footer suppression, front matter, and the deliberately out-of-scope items. Each section names the function in packages/pdf-to-markdown/scripts/convert.py. (#92)
  • FAQ page at docs/faq.md: why OCR is opt-in, why the converter persists nothing, how to add a new locale, heuristics vs language model, Playwright + headless Chromium trade-off, server-side vs client-side conversion, offline operation. (#93)
  • API recipes page at docs/api-recipes.md: copy-paste recipes for the four HTTP endpoints from curl, Python requests, and JavaScript fetch (browser + Node), plus error-envelope reading, CORS configuration, and rate-limit guidance. (#97, closes #26)
  • Deployment recipes page at docs/deployment-other.md: Render, Fly.io, and Railway walkthroughs with blueprints, free-tier caveats, and a consolidated common-gotchas section (VITE_API_URL build-time, CORS, cold-start timeouts, 500 MB upload cap). (#98, partial fix for #7)

Changed

  • Contribution-guide section in every open issue body. All 27 open issues now carry a standardized "How to contribute" block covering claim workflow, branch naming, Conventional Commits, test pyramid, no AI co-authors, squash merge, and reversibility declaration. The section is idempotent: re-running the batch skips issues that already carry it.
  • Scorecard exceptions doc (docs/scorecard-exceptions.md) updated to cover the three new Pinned-Dependencies paths introduced by the OCR pipeline (apps/api/Dockerfile:49,60 and .github/workflows/ci.yml:28) plus a new Maintained section documenting the time-based auto-resolution. (#90)
  • Contributors recognition: @zhouzhou626 gains the a11y type (#88); @0exec joins the contributors block with code, doc, test, infra types (#89). Avatar URLs use numeric ID form so username changes do not break the links.

Removed

  • Orphan useConvert.ts hook at apps/web/src/hooks/. Declared usePdfToMd() and useMdToPdf() but had zero callsites. Pages call the API functions directly from src/lib/api.ts; the batch flow uses useBatchConvert. The hook was superseded during early development. (#91)
  • Unused docs/brand/social-preview.png asset. Not referenced from mkdocs.yml, README, any markdown, or any HTML meta tag. No og:image is wired in the docs site or the React app. (#91)
  • Dead mkdocs.yml nav entry that pointed at docs/deployment/oracle-cloud.md, which never existed in repo history. Replaced with the new Deploy: deployment-other.md entry so MkDocs builds without the orphan-page warning. (#98)

0.2.3 — 2026-05-20

Minor release. Headline change is the first external contribution: @ko4lax landed the WCAG 2.1 AA audit and remediation in PR #54. The release also ships the new design system catalogue (PRs #58 + #65) that future UI work tracks back to, plus the trilingual screenshots refresh, the Spanish locale page tests, the all-contributors adoption, and a handful of CI hygiene fixes accumulated since v0.2.2.

Added

  • WCAG 2.1 AA accessibility audit and remediation by @ko4lax (first external contributor on the project). DropZone.tsx refactored so the file input is a sibling of the role="button" wrapper instead of nested inside it (closes the axe nested-interactive-controls violation); skip-to-content link added in App.tsx; nav landmark labelled Main navigation; batch progress carries an aria-live="polite" announcement. New docs/accessibility-audit.md documents findings with WCAG identifiers (wcag2a, wcag2aa, wcag21a, wcag21aa) and reproduction steps. New apps/web/e2e/audit.spec.ts wires @axe-core/playwright into the existing E2E job so CI fails on critical or serious violations going forward. (#54, closes #36)
  • docs/design/ design system catalogue. Self-contained HTML (design-thinking.html) with hi-fi mockups and paste-ready issue specs for eight features: F1 CSS theme picker, F2 theme library, F3 per-conversion options panel, F4 format hub (DOCX/EPUB/HTML/RTF), F5 language workshop, F6 conversion presets, F7 local history, and F8 preferences page. The HTML reuses apps/web/src/styles/tokens.css verbatim so visual changes to the React app propagate to the catalogue. Published with the docs site at /design/; MkDocs nav now includes a Design system entry; CONTRIBUTING.md routes contributors who pick up design-required issues at the catalogue first; README gains a dedicated Design system section above Limits. Seeds six new feature issues (#59 F3, #60 F4, #61 F5, #62 F6, #63 F7, #64 F8). (#58)
  • docs/design/screenshots/ retina captures (1440x900 at 2x device scale) of every catalogue section: hero, principles, foundations, roadmap, plus F1..F8 mockup spreads. The design landing page and the GitHub-view README render the gallery so contributors preview the catalogue before opening the HTML. (#65)
  • docs/screenshots/home-es.png Spanish home-page screenshot at the same 2880x1800 retina resolution as the EN and PT companions. README's screenshot section now shows the trilingual UI in a 3-column table (EN / PT / ES). docs/screenshots/demo.gif was regenerated with the three locales at the start so the README hero shows the trilingual capability before the conversion flow. (#49)
  • Spanish locale coverage added to the page-level integration tests: About.test.tsx asserts the ES title, Home.test.tsx asserts the ES hero headline, and Navigation.test.tsx flips the whole UI to ES via the language toggle and asserts both the headline and the About link translate. The Portuguese tests stay in place. (#49)
  • all-contributors specification adopted. .all-contributorsrc at the repo root tracks every contributor and their kind of contribution (code, doc, translation, design, review, test, infra, maintenance). README gains a ## Contributors section between ## License and ## If md-bridge helped you that renders the current list. CONTRIBUTING.md documents how to be credited: no bot to install, no extra PR; the maintainer regenerates the README block during release prep. (#48)
  • @ko4lax credited as a contributor with categories code, doc, test, infra, translation per the diff classification in PR #54. Avatar URL uses the numeric-ID form per the maintainer credit rule. (#67)
  • .github/workflows/pr-linked-issue.yml posts a single one-line comment on every issue closed via "Closes #N" naming the PR author, so attribution survives in the casual reader's view. (#56, refined in #57)

Changed

  • All docs/screenshots/*.png refreshed at the current 2880x1800 retina resolution from the post-v0.2.2 UI state (after the About rewrite, the extensibility positioning, and the warning i18n fix). The pdf-to-md, pdf-to-md-batch, md-to-pdf, about, and swagger captures now match the UI a contributor sees today. demo.gif was regenerated with eight frames covering the trilingual home pages and the full conversion flow. (#51)
  • Validate PR title is now a required status check on main. Previously the Conventional Commits validation ran on every PR but did not block the merge if it failed (caught when PR #49 merged with the malformed scope feat(web,docs): despite a red title check). Branch protection now requires every PR title to parse cleanly as <type>(<scope>)<!>: <description> before the merge button enables. CONTRIBUTING.md's branch-protection list is updated to include the sixth required check. (#50)
  • CONTRIBUTING.md regression test guidance promoted from a single paragraph to a step-by-step checklist under "Writing a good regression test". Documents the failing-diff format, tier choice, fixture vs synthetic input, and the no-silent-skips rule with a worked example from PR #20. (#52)
  • @zhouzhou626's entry in .all-contributorsrc switched to the numeric-ID avatar URL (was producing identicon fallback) and gained the doc contribution credit for PR #52. CONTRIBUTING.md codifies the post-merge maintainer credit rule as a five-step mechanical checklist any future maintainer (or AI assistant) can apply without judgement. (#53)

Fixed

  • MdToPdf.tsx was passing t.pdfToMd.ready ("Ready" / "Pronto" / "Listo") to the ConvertButton's success label slot instead of the page-owned t.mdToPdf.success ("PDF ready." / equivalents). The branch was unreachable under the current status={batch.running ? 'loading' : 'idle'} state machine, so no user saw the wrong copy, but the dead path would have surfaced the next time anyone wired the success state. Aligned to the correct key. (#66)

0.2.2 — 2026-05-19

Patch release. Headline change is the trilingual warning fix; the rest is governance, infrastructure, and documentation polish accumulated since v0.2.1.

Fixed

  • /api/pdf-to-md warnings now follow the active UI locale. The backend used to emit hardcoded English strings ("Very little text was extracted…"); PT and ES users saw English while the rest of the UI was in their locale. Backend emits stable codes (needs_ocr, images_not_persisted); the frontend dictionary translates per locale. The lookup falls back to the raw string for unknown codes so future warnings stay forward-compatible. (#40, PR #42)
  • apps/api/app/main.py Swagger metadata pointed contact.url and the API_DESCRIPTION markdown link at the placeholder https://github.com/your-org/md-bridge. The Swagger UI at /docs surfaced both. Replaced with the real repository URL. (PR #46)
  • docker-publish.yml smoke test for the Web image was running nginx -t against the bundled config, which contains proxy_pass http://api:8000. In an isolated container the api hostname does not resolve, so the parse failed and the workflow reported a red CI even though the publish itself succeeded. The smoke now asserts that the Vite build stage produced index.html and copied it to the nginx web root. (PR #20)

Added

  • Conventional Commits 1.0.0 is now the project's commit and PR-title convention. New CI workflow semantic-pr.yml rejects PR titles that do not match <type>(<scope>)<!>: <description>. CONTRIBUTING.md gains a full reference section with the recognised types, bump rules, and worked examples. The release-drafter.yml config gains an autolabeler block. (PR #19)
  • CONTRIBUTING.md now documents the issue-claiming process: contributors comment to claim, maintainer assigns via the native GitHub assignee field, seven-day window before the issue returns to the pool. (PR #41)
  • Issue templates (bug_report.md, feature_request.md) now require a test plan with explicit file paths and tiers. Feature template also gains Architect and Design notes sections so the tri-disciplinary review pattern shows up before the issue is filed. (PR #43)
  • docs/screenshots/warning-i18n.png visual proof for #40 (deterministic Pillow render, no AI image generation). (PR #45)

Changed

  • Pre-commit hooks moved from a separate "Optional" section near Tests to Local setup in CONTRIBUTING.md so new contributors see them at the same moment they install Python and Node. The "Strongly recommended" framing replaces "Optional". A new paragraph explains that the hooks deliberately do not check branch staleness; branch protection on main ("require branches to be up to date before merging") handles that server-side. PR template checklist gains the two matching items. (PR #44)
  • Project descriptions across package.json, apps/api/pyproject.toml, README.md, and docs/index.md now state the extensibility intent explicitly: md-bridge is a document converter that ships PDF ↔ Markdown today and welcomes new format pairs as contributions land. The GitHub repo description and topics were updated to match. (PR #39)
  • About page copy rewritten across en, pt, and es in an OSS-professional register. New copy leads with positioning ("open source, self-hosted, deterministic, no model inference, no telemetry") and names the heuristic stack (PyMuPDF + headless Chromium) directly. "Built with" becomes "Open source" with explicit MIT-licence and CONTRIBUTING.md pointers. (PR #21)
  • Theme picker for Markdown → PDF (#14) reorganised as an umbrella issue with three sister sub-issues: design (#22, CSS templates), backend (#23, registry + /api/themes), frontend (#24, picker dropdown). The pattern is now the project's reference for multi-discipline features.

0.2.1 — 2026-05-19

Fixed

  • docker-publish.yml now builds multi-platform images (linux/amd64 + linux/arm64). The Oracle Cloud Always Free deployment recipe targets ARM Ampere A1 VMs but the previous amd64-only publishes failed docker pull on ARM hosts with a manifest-mismatch error. Apple Silicon developers were affected by the same issue. A post-publish smoke job verifies both arches by pulling the image and running a minimal probe. (#12)

0.2.0 — 2026-05-19

First minor release. Ships the new trilingual UI plus a wider set of visibility, distribution, and contributor-onboarding work.

Added

  • Spanish (es) locale in the web UI. The header toggle now lists EN / PT / ES. Locale detection and the <html lang> attribute were generalised so future locales drop in without further code changes. Translations are native-quality; tests and the Playwright spec exercise all three locales. (#9 by @zhouzhou626 — first external contributor.)
  • Oracle Cloud Always Free deployment recipe under deployment/oracle-cloud/: step-by-step README.md, bootstrap.sh that installs Docker + Caddy + the stack on a fresh ARM Ampere A1 VM, and a reference Caddyfile.example. Cost: zero. The docs site picks up the page under a new "Deploy" nav section.
  • Release-drafter workflow that keeps a draft GitHub Release in sync with merged PRs on main. Categories are driven by PR labels (enhancement, bug, security, documentation, chore, ...) and the next semver bump is resolved automatically (major / minor / patch). Config in .github/release-drafter.yml.
  • workflow_dispatch trigger on docker-publish.yml so a manual re-publish from the Actions UI is now possible without an unrelated commit.
  • Documentation site at https://vinicq.github.io/md-bridge/. MkDocs Material build deployed to GitHub Pages on every doc change. mkdocs.yml plus docs/index.md and docs/getting-started.md provide a curated landing experience separate from the GitHub README.
  • Docker images on GHCR: a release-triggered workflow publishes ghcr.io/vinicq/md-bridge-api and ghcr.io/vinicq/md-bridge-web so users can docker pull instead of building locally. Tags follow the semver scheme; both images are public.
  • OpenSSF Scorecard workflow that runs weekly + on push, surfaces the result in the Security tab, and exposes a public score at scorecard.dev. README gains a Scorecard badge alongside CI and CodeQL.
  • Brand assets under docs/brand/ (logo, wordmark, social preview). Programmatic Pillow geometry, deterministic, no AI generation.
  • Demo GIF at docs/screenshots/demo.gif, used as the README hero.
  • Star history chart and a "If md-bridge helped you" CTA at the bottom of the README.

0.1.1 — 2026-05-19

Maintenance and governance release. No behaviour changes in the converter; only infrastructure, security posture, and contributor ergonomics.

Added

  • Optional pre-commit configuration that runs ruff and basic hygiene hooks (trailing whitespace, EOF newline, YAML/TOML syntax, merge conflict markers, large files) before every commit. Documented in CONTRIBUTING.md.
  • .github/workflows/dependabot-auto-merge.yml enables gh pr merge --auto for Dependabot PRs that are patch bumps (X.Y.Z → X.Y.Z+1) or transitive (indirect) dependency updates. Branch protection still gates the actual merge on every required status check; minor and major bumps stay in the manual review queue.
  • Branch protection on main documented in CONTRIBUTING.md: required status checks for Backend, Web, End-to-end, and the two CodeQL jobs; force-push and deletion blocked; linear history required.
  • SECURITY.md now lists the GitHub-native defenses that are active on the repository so contributors know what they get for free (secret scanning, push protection, CodeQL, Dependabot, private vulnerability reporting).

Changed

  • GitHub Actions bumped to current majors: actions/checkout v4 → v6, actions/setup-python v5 → v6, github/codeql-action v3 → v4, actions/setup-node v4 → v6, actions/upload-artifact v4 → v7. Clears the Node.js 20 deprecation warnings on the runner.
  • Docker base images bumped: web node:22-alpinenode:26-alpine, web runtime nginx:1.27-alpinenginx:1.31-alpine.
  • npm devDependencies bumped: typescript-eslint 8.59.3 → 8.59.4 (patch), @types/node 24.12.4 → 25.9.0.

Security

  • Enabled GitHub-native repo features via API: secret scanning, push protection, private vulnerability reporting, vulnerability alerts, Dependabot security updates.
  • Branch protection on main requires every CI and CodeQL status check to pass before a merge can land.

0.1.0 — 2026-05-19

First tagged release. md-bridge is a self-hosted PDF and Markdown converter with a FastAPI backend and a React frontend.

Added

  • PDF to Markdown conversion with heading detection, list recovery, table extraction, and YAML front matter.
  • Markdown to PDF rendering through headless Chromium with a bundled A4 stylesheet.
  • Batch mode in the UI: drop one file or a whole folder; each file is converted sequentially and can be downloaded as it lands.
  • /api/inspect-pdf endpoint returns diagnostics (fonts, sizes, tagged-PDF flag, OCR hint) so the UI can warn before conversion.
  • Bilingual UI in English (default) and Portuguese, with the choice persisted to localStorage.
  • Interactive API docs at /docs (Swagger UI) and /redoc, plus a walkthrough in docs/API.md.
  • Docker Compose stack for one-command boot of API + Web with healthchecks.
  • Test pyramid with 124 tests (92 unit, 26 integration, 6 end-to-end), every one of which runs on CI against the committed ISTQB CTAL-TA syllabus fixture. No silent CI skips.
  • CI workflow for backend pytest, web build + lint + vitest, and Playwright end-to-end.
  • CodeQL static security analysis on every push and pull request, with a weekly scheduled scan, covering both Python and TypeScript.
  • Backend linting via ruff with the E F W I UP B rule set, enforced in CI.
  • Frontend linting via ESLint, enforced in CI.
  • Open source governance files: LICENSE (MIT), CONTRIBUTING.md, CODE_OF_CONDUCT.md, SECURITY.md, .github/dependabot.yml, issue and PR templates, .editorconfig.