-
Notifications
You must be signed in to change notification settings - Fork 6
SPECAM-91 -- Add at-code support to ADL2 grammar #36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
How does this relate to openEHR/archie#659 ? @MattijsK |
|
@joostholslag Maybe I can answer since it is Sunday. I'm using this repository as the "official" openEHR grammars in my tools. I believe the openEHR/archie#659 is for syncing these grammars into the Archie repository. |
|
So why are they different repo's? |
|
Not everyone is using Archie, but everyone is using the grammars. I suspect that the grammars in this repository weren’t always 100% of what was required by Archie so they made a copy of them. I remember that we discussed some of the required changes 3 years ago and Archie’s team and I were able to iterate faster by having our own copies of the grammars. From the last review we are now both using the exact copy of the grammars in this repository. |
|
Ok check. As long as it's clear there's a proper source of truth. It might be interesting to do git submodule in Archie to include the ITS repo. And potentially include a branch of that in case you want to go ahead of the state of the ITS repo. |
Thanks for the info. @MattijsK seemed unaware. Could you kindle point to the commit/pr that changes the source for Archie? |
|
From our point of view:
If we want to make this grammar up-to-date with the grammar in Archie, we should carefully look into every difference and decide if we want to add it to this grammar. The grammar in this repository should be the leading grammar, but making Archie point directly to this grammar will be very complicated, or even almost impossible to work with. |
|
I was asked to review this PR (as a non-SEC member): I'm using the same grammar as proposed in this PR and it works in my tools. This means the grammar works in at least 2 separate implementations. |
|
@wolandscat can you please review this PR, and perhaps also answer the question if this is the repo to modify for the upcoming AM 2.4 release |
wolandscat
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change here allows 0-filling of id-codes and at-codes. Neither is allowed in standard ADL2; only the legacy at-codes from ADL1.4 archetypes have this.
In particular, the AT_CODE symbol in ADl2 specifically doesn't mean the old legacy at-codes, which are non-systematic.
I would reverse this change, and do the following:
ID_CODE : 'id' CODE_STR ;
AT_CODE : 'at' CODE_STR ;
AC_CODE : 'ac' CODE_STR ;
ADL14_AT_CODE: <put the old style regex here>
wolandscat
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now here, we should have:
node_identifier : ID_CODE | ADL14_AT_CODE
rongchen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, great job Mattijs!
|
As far as I can tell, the corrections I noted above have not been made. If they are not, the grammars will not function correctly. For example, they would allow nodes in top-level archetypes to have ids like |
|
@MattijsK and I have tried some options. This doesn't work: ROOT_ID_CODE : ('id1'|'at0000') '.1'* ;
ID_CODE : 'id' CODE_STR ;
AT_CODE : 'at' CODE_STR ;
AC_CODE : 'ac' CODE_STR ;
fragment CODE_STR : ('0' | [1-9][0-9]*) ( '.' ('0' | [1-9][0-9]* ))* ;
ADL14_AT_CODE : 'at' ('0' | [0-9][0-9][0-9][0-9]) ( '.' ('0' | [1-9][0-9]* ))* ;In this case the code A first alternative would be making the token rules more specific, but still allow both ADL 1.4 and ADL 2.3 codes to be matched with ID_CODE : 'id' CODE_STR ;
AT_CODE : 'at' ('0' | [1-9][0-9]* | [0-9][0-9][0-9][0-9]) ( '.' ('0' | [1-9][0-9]* ))* ;
AC_CODE : 'ac' CODE_STR ;
fragment CODE_STR : ('0' | [1-9][0-9]*) ( '.' ('0' | [1-9][0-9]* ))* ;This would solve some, but not all problems. A more complete solution would be to have a separate rule for overlapping at-codes: // Some at-codes are both valid ADL 1.4 and ADL 2.3 codes:
// Codes that start with at0 and codes at1000 to at9999.
OVERLAPPING_AT_CODE : 'at' ('0' | [1-9][0-9][0-9][0-9]) ( '.' ('0' | [1-9][0-9]* ))* ;
ROOT_ID_CODE : ('id1'|'at0000') '.1'* ;
ID_CODE : 'id' CODE_STR ;
AT_CODE : 'at' CODE_STR ;
AC_CODE : 'ac' CODE_STR ;
fragment CODE_STR : ('0' | [1-9][0-9]*) ( '.' ('0' | [1-9][0-9]* ))* ;
ADL14_AT_CODE : 'at' ('0' | [0-9][0-9][0-9][0-9]) ( '.' ('0' | [1-9][0-9]* ))* ;Then all places where an AT_CODE is expected, we would need to allow either AT_CODE or OVERLAPPING_AT_CODE. Or ADL14_AT_CODE and OVERLAPPING_AT_CODE where we expect ADL1.4 codes. This would make the parser rules more complex, but would solve all problems. @wolandscat Do you have a preference for one of the solutions, or maybe another suggestion? |
|
Jelte, Mattjis, you are right. This is the kind of problem that occurs because of sticking to the old coding system, which is not regular. The really correct way to handle it would be with modal lexing, which means knowing whether you are lexing/parsing:
The modes would use various lexer rules differently to correctly handle the codes. You may consider this too much work to be worth it, but in principle:
You probably already know how to do it, but an example of modal lexing can be seen here https://github.com/openEHR/openEHR-antlr4/blob/master/reader_adl2/src/main/antlr/Adl2Lexer.g4 Note, that Antlr4 repo is not the 'official' one, although in my view, openEHR should use it instead, because it uses modal lexing to solve various problems in the current grammars (regex handling and so on). But that is for another day ;) |
|
Could we create variants of the grammar? One with at coding for openehr RM users, and one for id coding for non-openehr? |
Personally, that is what I would do. To do such 'modes' is not modal lexing, but semantic predicates - see here (scroll down a bit to see an example with variant forms of Java language). The same could be done with 'adl2' and 'adl2-legacy' flags, or similar. |
|
For the sake of progress (this is one of the last blockers for adl 2.4), we decided to simply create two grammars: one id-coded and one at-coded. This way we can move forward with correct specification grammars. |
Allow at-codes in adl2 instead of id-codes. This is done by changing the
ID_CODEidentifier to anode_identifierwhich consist ofID_CODEandAT_CODE.For local testing of the grammar we used the following archetypes:
openEHR-EHR-OBSERVATION.body_weight.v2.1.6.adls.txt
openEHR-EHR-OBSERVATION.body_weight.v2.1.6-at-coded.adls.txt