Skip to content

Make mining worker delete commit files that has no merge conflict #169

@zegabr

Description

@zegabr

While doing my research on CSDiff, I had to compare many versions of it, meaning I had to run miningframework multiple times.

For a given time interval, the tool downloads every commit that is non-fast-forward, meaning that even if the commit has 1 files with conflict and 100 files with no conflicts (fast forward merge), all the 101 files will be downloaded. This can easily take about 30GB of the device's if you run the tool with 10 projects for an interval of 1 month.

In my case, I needed only the files where the results between CSDiff and Diff3 were different. To be able to obtain only the info i needed with the current implementation, I had to do some workarounds see this branch.

In summary I:

  1. created one csv for each project here
  2. ran miningframework once for each project script
    2.1) deleted every unwanted file using this after each run
  3. then i created another csv with the relevant data

I think this can be done directly via miningframework, probably around here. As the tool have filters for the commits, it could probably have filters for files too.

This would make it possible to get more data for next researches using less memory.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions