Skip to content

fix(docx): preserve drawing shapes, list order/nesting, code breaks & highlighting (#176)#181

Merged
developer0hye merged 7 commits into
mainfrom
fix/issue-176-drawings-lists-code
Jun 1, 2026
Merged

fix(docx): preserve drawing shapes, list order/nesting, code breaks & highlighting (#176)#181
developer0hye merged 7 commits into
mainfrom
fix/issue-176-drawings-lists-code

Conversation

@developer0hye
Copy link
Copy Markdown
Owner

@developer0hye developer0hye commented May 31, 2026

What changed

Fixes the DOCX conversion fidelity issues reported in #176 for the attached LibreOffice-edited document.

Key changes:

  • Preserve DrawingML geometry-only shapes as floating shape IR instead of dropping them.
  • Render anchored rectangles, arrows, and floating text boxes at their flow-relative positions.
  • Group consecutive floating drawing anchors so related shapes share one baseline; this fixes the blue boxes being vertically offset from each other.
  • Preserve empty anchor paragraphs for floating drawings so following content, including the table, keeps the expected spacing.
  • Preserve ordered list continuation, nested list levels, hard line breaks in code blocks, and character-style syntax highlighting.
  • Avoid Typst's default table grid when the DOCX only defines explicit borders.

Why

The fixture combines pictures, LibreOffice-created shapes, text boxes, lists, source code, and a mostly borderless table. The previous conversion lost drawings and introduced layout shifts because shape-only drawing anchors were not represented in the IR and each floating anchor advanced flow independently.

Verification

  • Downloaded and extracted the issue attachment directly from GitHub.
  • Rendered the converted PDF to PNG and checked the result at image level.
  • Updated assets/issue-176-after.png with the verified output.
  • Ran cargo test -p office2pdf successfully.

Related: #176

… highlighting

Resolves four conversion-fidelity regressions for LibreOffice/pandoc-authored
DOCX files reported in issue #176.

What changed:
- Drawing shapes: docx-rs only models a <w:drawing> as a picture or text box,
  so geometry-only DrawingML word-processing shapes (wps:wsp rectangles, lines
  and arrows) parsed to `data == None` and were dropped entirely. Add a raw-XML
  side-channel (docx_context_shape.rs) that scans word/document.xml for such
  shapes — geometry, fill, stroke, arrowheads and anchor position — and a new
  `Block::FloatingShape` IR element rendered via the existing Typst shape
  renderer with #place() absolute positioning.
- List order & hierarchy: pandoc fragments one logical list across several
  numId values, so adjacent list paragraphs split into separate lists that each
  restarted ordered numbering at "1." and flattened ilvl nesting into bullets.
  group_into_lists/finalize_list now merge consecutive list paragraphs across
  differing numId, building a per-level style map and continuing the ordered
  counter so 1./2. and nested levels survive.
- Code block line breaks: a hard <w:br/> reached the IR as '\n' but Typst
  markup collapses a bare newline to a space, merging code lines. escape_typst
  now emits #linebreak() for '\n'.
- Syntax highlighting: build_style_map ignored character styles, so pandoc's
  per-token rStyle (BuiltInTok/StringTok/...) colors were lost. Character
  styles are now ingested and each run resolves its rStyle beneath explicit run
  formatting.

Tests: new shape-scanner unit tests (docx_context_shape_tests.rs) and a
list-merge regression test; full suite green, clippy clean.

Related: #176
Signed-off-by: Yonghye Kwon <developer.0hye@gmail.com>
Signed-off-by: Yonghye Kwon <developer.0hye@gmail.com>
Signed-off-by: Yonghye Kwon <developer.0hye@gmail.com>
…page

The DrawingML anchor positions a floating shape/text box relative to its
paragraph and text column, not the page. The renderer emitted a bare
`#place(top + left, ...)` at the document top level, which Typst anchors to the
page — so every shape and text box piled at the top of page 1, overlapping the
title, instead of sitting with the content they belong to.

Wrap each `#place` in a zero-size `#box` so "top + left" resolves to the
current flow position. Applied to both the new FloatingShape path and the
existing FloatingTextBox wrapNone/behind path so the box outline, its inner
text and the connector arrow are laid out together.

Related: #176
Signed-off-by: Yonghye Kwon <developer.0hye@gmail.com>
Signed-off-by: Yonghye Kwon <developer.0hye@gmail.com>
Signed-off-by: Yonghye Kwon <developer.0hye@gmail.com>
Signed-off-by: Yonghye Kwon <developer.0hye@gmail.com>
@developer0hye developer0hye merged commit b1d042e into main Jun 1, 2026
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant