Skip to content

Conversation

@ibbem
Copy link
Collaborator

@ibbem ibbem commented Aug 11, 2025

This is a rebase of #152 with some additional fixes and refactorings.

During the rebase I applied the following changes on each commit:

  • format code
  • fix compilation errors
  • temporarily disable broken tests
  • remove licensed test cases
  • fix documentation (only typos and missing parameter annotations)
  • translate commit messages
  • consistent author name and email (all commits use eshul <eugen-shulimov@web.de> instead of DARIA-ACER\eshul <shulimov@mail.uni-paderborn.de> which was only used in the beginning)

Except two commits, which became empty, all commits where kept.

The following was addressed after the rebase in separate commits:

  • refactor the tests
  • fix a bug in the variation diff endif parsing (a8065d9)
    Note that all tests cases, even the ones that where removed during the rebase, pass the test suit after this fix.
  • refactor the variation diff parser (0db0fdd)
  • improve some JavaDoc comments (9c91b08)
  • refactor VariationUnparser (fbd6f3c and 68d8131)
  • merge duplicated tree unparsing code (9de8a26)
  • fix a bug in Show.baddiff which I discovered while debugging the next item in this list (cbf504b)
  • fix a bug in BadVDiff which I discovered after implementing the next item in this list (c860a88)
  • store endifs in the label instead of the nodes directly (de10f75)

Notably, I didn't touch the experiment except changing the package name and formatting the code.

There are two things that I'm still unsure about:

  1. Should we actually test with ignoreEmptyLines = true?
    I don't know how relevant this was to the goal of the thesis. Just judging from the code, it doesn't make sense to test. We could probably get rid of all the removeWhitespace business and make the tests much tighter for ignoreEmptyLines = false.
  2. Should we keep printSourceCode or unparseTree?
    It turns out that @pmbittner already implemented a variation tree unparser for the views paper. Currently, I removed the stack based implementation because the recursive implementation is more intuitive to me. However, should we keep both interfaces or remove one of the two methods?

eugen-shulimov and others added 30 commits August 11, 2025 13:58
Note that some printed checks failed expectedly (those that didn't
ignore whitespace or didn't compare diffs semantically) and thus are
removed without replacement.
ibbem added 10 commits August 11, 2025 14:01
This kind of includes a breaking change: Keeping trailing empty lines
and interpreting them as unchanged trailing empty lines. Although
multiple empty lines should never occur in well formatted diffs (because
each line contains at least the diff symbol), this is likely not
intended and thus a bug.

Note that preventing a new line in the special case of an empty output
seems unnecessary but was explicitly done in the old version so the
refactored version does this as well.
Given a `BadVDiff` with at least one node, `Show.baddiff` uses
`VariationTree.toVariationDiff` which previously required all created
nodes to have `DiffType.NON`. However, `Show.baddiff` supplies nodes
with `DiffType.ADD` and `DiffType.REM` which triggered an assert in
`DiffNode.addChild` during the `VariationTree.toVariationDiff`
construction process.
It seems like `VariationDiffParser` never generates a variation diff
which triggers the bug in `BadVDiff` that doesn't preserve the child
order. Note that it should be possible to generate such variation diffs
with a tree matcher such as GumTree.
This is necessary because when `endif`s are fully integrated (included
in deep copies, assertConsistency and comparisons) some assumptions are
broken (which fields are compare by `DiffNode`s) which is especially
annoying in `BadVDiff`. By storing the `endif`s in the label instead of
the nodes directly, no user needs to be updated except if the label is
actually modified (there seems to be no instance of that in this code
base).

This also generalized the concept to trailing lines instead of limiting
the concept to endifs. This is also kind of necessary because not all
labels know the node type.

Depending on the user of `VariationDiffParser`, this might be a breaking
change because we now consider the `#endif` line as part of the label.
Hence, if the `#endif` line changed (e.g., a comment is added) the node
can no longer be `DiffType.NON` but we instead split the node into two
`DiffType.REM` and `DiffType.ADD` nodes. Artifacts which are below such
a split node are thus classified as refactored instead of unchanged.
@pmbittner pmbittner added enhancement New feature or request bm_work ibbem is paid for working on this labels Aug 26, 2025
@pmbittner pmbittner mentioned this pull request Aug 26, 2025
This makes the assumption of single character `DiffSymbol`s explicit.
@pmbittner pmbittner merged commit 5e35c9c into develop Sep 4, 2025
2 checks passed
@pmbittner pmbittner mentioned this pull request Nov 3, 2025
2 tasks
@pmbittner pmbittner deleted the unparse branch November 3, 2025 09:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bm_work ibbem is paid for working on this enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants