fix(docx): preserve drawing shapes, list order/nesting, code breaks & highlighting (#176)#181
Merged
Merged
Conversation
… highlighting Resolves four conversion-fidelity regressions for LibreOffice/pandoc-authored DOCX files reported in issue #176. What changed: - Drawing shapes: docx-rs only models a <w:drawing> as a picture or text box, so geometry-only DrawingML word-processing shapes (wps:wsp rectangles, lines and arrows) parsed to `data == None` and were dropped entirely. Add a raw-XML side-channel (docx_context_shape.rs) that scans word/document.xml for such shapes — geometry, fill, stroke, arrowheads and anchor position — and a new `Block::FloatingShape` IR element rendered via the existing Typst shape renderer with #place() absolute positioning. - List order & hierarchy: pandoc fragments one logical list across several numId values, so adjacent list paragraphs split into separate lists that each restarted ordered numbering at "1." and flattened ilvl nesting into bullets. group_into_lists/finalize_list now merge consecutive list paragraphs across differing numId, building a per-level style map and continuing the ordered counter so 1./2. and nested levels survive. - Code block line breaks: a hard <w:br/> reached the IR as '\n' but Typst markup collapses a bare newline to a space, merging code lines. escape_typst now emits #linebreak() for '\n'. - Syntax highlighting: build_style_map ignored character styles, so pandoc's per-token rStyle (BuiltInTok/StringTok/...) colors were lost. Character styles are now ingested and each run resolves its rStyle beneath explicit run formatting. Tests: new shape-scanner unit tests (docx_context_shape_tests.rs) and a list-merge regression test; full suite green, clippy clean. Related: #176 Signed-off-by: Yonghye Kwon <developer.0hye@gmail.com>
Signed-off-by: Yonghye Kwon <developer.0hye@gmail.com>
Signed-off-by: Yonghye Kwon <developer.0hye@gmail.com>
…page The DrawingML anchor positions a floating shape/text box relative to its paragraph and text column, not the page. The renderer emitted a bare `#place(top + left, ...)` at the document top level, which Typst anchors to the page — so every shape and text box piled at the top of page 1, overlapping the title, instead of sitting with the content they belong to. Wrap each `#place` in a zero-size `#box` so "top + left" resolves to the current flow position. Applied to both the new FloatingShape path and the existing FloatingTextBox wrapNone/behind path so the box outline, its inner text and the connector arrow are laid out together. Related: #176 Signed-off-by: Yonghye Kwon <developer.0hye@gmail.com>
Signed-off-by: Yonghye Kwon <developer.0hye@gmail.com>
Signed-off-by: Yonghye Kwon <developer.0hye@gmail.com>
Signed-off-by: Yonghye Kwon <developer.0hye@gmail.com>
Merged
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changed
Fixes the DOCX conversion fidelity issues reported in #176 for the attached LibreOffice-edited document.
Key changes:
Why
The fixture combines pictures, LibreOffice-created shapes, text boxes, lists, source code, and a mostly borderless table. The previous conversion lost drawings and introduced layout shifts because shape-only drawing anchors were not represented in the IR and each floating anchor advanced flow independently.
Verification
assets/issue-176-after.pngwith the verified output.cargo test -p office2pdfsuccessfully.Related: #176