Skip to content

Trouble With Locations Sharing A Common Prefix #110

@LensPlaysGames

Description

@LensPlaysGames

It appears various forms of using sarif from the CLI to produce human-readable output share a behaviour that is not necessarily ideal. When multiple results exist in the input SARIF file, with each of those results having an artifactLocation property that has a matching uri property, the paths to those results are completely removed in the output.

So, given a SARIF file like so:

{
  "runs": [
    {
      "originalUriBaseIds": {
        "PWD": {
          "uri": "file:///home/lens_r/Programming/play/LensorCompilerCollection/"
        }
      },
      "results": [
        { "locations": [ { "physicalLocation": { "artifactLocation": { "uri": "./foo.c", "uriBaseId": "PWD" } } } ] , ...},
        { "locations": [ { "physicalLocation": { "artifactLocation": { "uri": "./foo.c", "uriBaseId": "PWD" } } } ] , ...}
      ]
    }
  ]
}

We get output from sarif emacs above.sarif like so:

-*- compilation -*-

Sarif Summary: GNU C17
Document generated on: 2025-12-15 09:16:01.681255
Total number of distinct issues of all severities (error, warning, note): 1



Severity : error [1]
:2: error expected ‘;’ before ‘}’ token
:6: error expected ‘;’ before ‘}’ token



Severity : warning [0]


Severity : note [0]

What I'd expect to see is foo.c:2: and foo.c:6:; instead, as you can see, the path is completely blank. Methinks this has something to do with a QoL feature which would automagically remove any shared prefix across results, such that a naive SARIF producer may simply place absolute paths (vs the recommended relative ones) and have the user still be fed human-readable paths, but I don't actually have any evidence for that other than a hunch.

NOTE: To produce a valid SARIF file that exhibits this problem, see the attached file, foo.c.sarif.json, or use GCC to make one for you from the following source.

gcc ./foo.c -fdiagnostics-format=sarif-file

// foo.c
int main(){
    return 0
}

int main1(){
    return 1
}

EDIT: The problem still occurs even if the artifactLocation uses an index property to refer to an artifact within the artifacts array, not just for strings passed to uri. So, the "resolution" happens before the problem occurs, I'd say.

EDIT: More insight: given the two locations differ in suffix but still share a common prefix, the path is split across a path segment somewhat weirdly. i.e. ./foo1.c and ./foo2.c will result in 1.c and 2.c in the output, rather than the expected foo1.c and foo2.c. It seems there is a rather rudimentary transformation applied to the final paths, where any shared prefix is removed without any care for path segments. I propose that it should remove all shared prefix path segments rather than just all shared prefix characters.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions