Improved Unparser #169

ibbem · 2025-08-11T12:28:15Z

This is a rebase of #152 with some additional fixes and refactorings.

During the rebase I applied the following changes on each commit:

format code
fix compilation errors
temporarily disable broken tests
remove licensed test cases
fix documentation (only typos and missing parameter annotations)
translate commit messages
consistent author name and email (all commits use eshul <eugen-shulimov@web.de> instead of DARIA-ACER\eshul <shulimov@mail.uni-paderborn.de> which was only used in the beginning)

Except two commits, which became empty, all commits where kept.

The following was addressed after the rebase in separate commits:

refactor the tests
fix a bug in the variation diff endif parsing (a8065d9)
Note that all tests cases, even the ones that where removed during the rebase, pass the test suit after this fix.
refactor the variation diff parser (0db0fdd)
improve some JavaDoc comments (9c91b08)
refactor VariationUnparser (fbd6f3c and 68d8131)
merge duplicated tree unparsing code (9de8a26)
fix a bug in Show.baddiff which I discovered while debugging the next item in this list (cbf504b)
fix a bug in BadVDiff which I discovered after implementing the next item in this list (c860a88)
store endifs in the label instead of the nodes directly (de10f75)

Notably, I didn't touch the experiment except changing the package name and formatting the code.

There are two things that I'm still unsure about:

Should we actually test with ignoreEmptyLines = true?
I don't know how relevant this was to the goal of the thesis. Just judging from the code, it doesn't make sense to test. We could probably get rid of all the removeWhitespace business and make the tests much tighter for ignoreEmptyLines = false.
Should we keep printSourceCode or unparseTree?
It turns out that @pmbittner already implemented a variation tree unparser for the views paper. Currently, I removed the stack based implementation because the recursive implementation is more intuitive to me. However, should we keep both interfaces or remove one of the two methods?

Note that some printed checks failed expectedly (those that didn't ignore whitespace or didn't compare diffs semantically) and thus are removed without replacement.

This kind of includes a breaking change: Keeping trailing empty lines and interpreting them as unchanged trailing empty lines. Although multiple empty lines should never occur in well formatted diffs (because each line contains at least the diff symbol), this is likely not intended and thus a bug. Note that preventing a new line in the special case of an empty output seems unnecessary but was explicitly done in the old version so the refactored version does this as well.

Given a `BadVDiff` with at least one node, `Show.baddiff` uses `VariationTree.toVariationDiff` which previously required all created nodes to have `DiffType.NON`. However, `Show.baddiff` supplies nodes with `DiffType.ADD` and `DiffType.REM` which triggered an assert in `DiffNode.addChild` during the `VariationTree.toVariationDiff` construction process.

It seems like `VariationDiffParser` never generates a variation diff which triggers the bug in `BadVDiff` that doesn't preserve the child order. Note that it should be possible to generate such variation diffs with a tree matcher such as GumTree.

This is necessary because when `endif`s are fully integrated (included in deep copies, assertConsistency and comparisons) some assumptions are broken (which fields are compare by `DiffNode`s) which is especially annoying in `BadVDiff`. By storing the `endif`s in the label instead of the nodes directly, no user needs to be updated except if the label is actually modified (there seems to be no instance of that in this code base). This also generalized the concept to trailing lines instead of limiting the concept to endifs. This is also kind of necessary because not all labels know the node type. Depending on the user of `VariationDiffParser`, this might be a breaking change because we now consider the `#endif` line as part of the label. Hence, if the `#endif` line changed (e.g., a comment is added) the node can no longer be `DiffType.NON` but we instead split the node into two `DiffType.REM` and `DiffType.ADD` nodes. Artifacts which are below such a split node are thus classified as refactored instead of unchanged.

src/main/java/org/variantsync/diffdetective/variation/VariationUnparser.java

src/main/java/org/variantsync/diffdetective/variation/diff/DiffNode.java

src/main/java/org/variantsync/diffdetective/variation/diff/DiffType.java

This makes the assumption of single character `DiffSymbol`s explicit.

eugen-shulimov and others added 30 commits August 11, 2025 13:58

feat: prepare DiffNode for storing endif lines

afd99b4

feat: prepare VariationTreeNode for storing endif lines

4bed4b5

feat: add a getter for the endif line in VariationNode

22d5dea

fix: handle endif lines in deep copies of variation trees

b694ce5

feat: create an unparser for variation trees

02f7340

feat: store endif lines when parsing variation diffs

e9fb7c5

feat: create an unparser for variation diffs

92b9247

feat: create a method that projects text diffs

e2934fe

refactor: move some code for readability

2c18279

test: create tests for VariationUnparser

a8f5306

feat: create a dataset for my bachelor thesis

af4eb0f

feat: create an experiment for testing the unparser

5408c72

fix: fix the analysis in the unparser experiment

b5a2a8c

feat: change the dataset of the unparse experiment

870b76a

feat: change some things in the analysis

5bdb8f1

feat: add error reporting to UnparseAnalysis

efb4b2c

fix: store endifs depending on the time

dee62f8

test: add an example of failing to unparse

dd54859

fix: add new lines to the unparsed code

e312978

feat: rework the unparse experiment evaluation

f743e2c

test: add a test for comparing unparsed diffs semantically

53b82e1

fix: fix removeWhitespace for diffs

ff46999

test: fail tests if an exception is thrown

6ac3440

test: use asserts instead of manually checking stdout

d18386f

Note that some printed checks failed expectedly (those that didn't ignore whitespace or didn't compare diffs semantically) and thus are removed without replacement.

test: remove unnecessary wrappers around unparser

0a35f62

test: reuse directory constants in the unparsing tests

b9fa82b

test: improve the variable names in the unparse tests

bb59058

test: factor out duplicate code in the unparser tests

79b0a27

test: make all unparser test helper methods private

6bcde64

test: refactor the unparser test case sources

84e5eab

ibbem added 10 commits August 11, 2025 14:01

test: split different parse options into separate unparse tests

69b406f

fix: fix a bug in the variation diff endif parsing

a8065d9

refactor: directly pass the time to popIfChain in the diff parser

0db0fdd

docs: improve some JavaDoc comments related to unparsing

9c91b08

refactor: rename the unparse methods

68d8131

refactor: remove duplicated tree unparsing code

9de8a26

pmbittner approved these changes Aug 26, 2025

View reviewed changes

pmbittner assigned ibbem Aug 26, 2025

pmbittner added enhancement New feature or request bm_work ibbem is paid for working on this labels Aug 26, 2025

pmbittner mentioned this pull request Aug 26, 2025

Unparser #152

Closed

ibbem added 2 commits September 4, 2025 12:33

fix: preserve the projection when splitting a DiffNode

bf6f5fb

feat: let DiffNode take a VariationLabel on construction

f22b3cd

ibbem force-pushed the unparse branch from 036b9c1 to f22b3cd Compare September 4, 2025 10:34

refactor: use char instead of String in DiffSymbol

fd60dfb

This makes the assumption of single character `DiffSymbol`s explicit.

pmbittner merged commit 5e35c9c into develop Sep 4, 2025
2 checks passed

pmbittner mentioned this pull request Sep 19, 2025

Implement a proper unparse method for DiffTrees #70

Closed

pmbittner mentioned this pull request Nov 3, 2025

Release 2.4.0 #178

Merged

2 tasks

pmbittner deleted the unparse branch November 3, 2025 09:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improved Unparser #169

Improved Unparser #169

Uh oh!

ibbem commented Aug 11, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Improved Unparser #169

Improved Unparser #169

Uh oh!

Conversation

ibbem commented Aug 11, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants