Skip to content

Conversation

@MattijsK
Copy link

Allow at-codes in adl2 instead of id-codes. This is done by changing the ID_CODE identifier to a node_identifier which consist of ID_CODE and AT_CODE.

For local testing of the grammar we used the following archetypes:

@joostholslag
Copy link

joostholslag commented Feb 16, 2025

How does this relate to openEHR/archie#659 ? @MattijsK

@borutjures
Copy link

@joostholslag Maybe I can answer since it is Sunday. I'm using this repository as the "official" openEHR grammars in my tools. I believe the openEHR/archie#659 is for syncing these grammars into the Archie repository.

@joostholslag
Copy link

So why are they different repo's?

@borutjures
Copy link

Not everyone is using Archie, but everyone is using the grammars.

I suspect that the grammars in this repository weren’t always 100% of what was required by Archie so they made a copy of them. I remember that we discussed some of the required changes 3 years ago and Archie’s team and I were able to iterate faster by having our own copies of the grammars. From the last review we are now both using the exact copy of the grammars in this repository.

@joostholslag
Copy link

Ok check. As long as it's clear there's a proper source of truth. It might be interesting to do git submodule in Archie to include the ITS repo. And potentially include a branch of that in case you want to go ahead of the state of the ITS repo.

@joostholslag
Copy link

From the last review

Thanks for the info. @MattijsK seemed unaware. Could you kindle point to the commit/pr that changes the source for Archie?

@MattijsK
Copy link
Author

From our point of view:

  • The grammar in this repo should be the grammar exactly as it is specified in the specification. Any change to this grammar should be thought about carefully and also be reflected in the specification.
  • The grammar in Archie is an implementation grammar that should have (necessary) freedom to do (small) fixes and changes so that implementers can work with it.

If we want to make this grammar up-to-date with the grammar in Archie, we should carefully look into every difference and decide if we want to add it to this grammar.

The grammar in this repository should be the leading grammar, but making Archie point directly to this grammar will be very complicated, or even almost impossible to work with.

@borutjures
Copy link

I was asked to review this PR (as a non-SEC member): I'm using the same grammar as proposed in this PR and it works in my tools. This means the grammar works in at least 2 separate implementations.

@MattijsK MattijsK changed the title Add at-code support to ADL2 grammar SPECAM-91 Add at-code support to ADL2 grammar Apr 1, 2025
@MattijsK MattijsK changed the title SPECAM-91 Add at-code support to ADL2 grammar SPECAM-91 -- Add at-code support to ADL2 grammar Apr 1, 2025
@sebastian-iancu
Copy link
Member

@wolandscat can you please review this PR, and perhaps also answer the question if this is the repo to modify for the upcoming AM 2.4 release

Copy link
Member

@wolandscat wolandscat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change here allows 0-filling of id-codes and at-codes. Neither is allowed in standard ADL2; only the legacy at-codes from ADL1.4 archetypes have this.

In particular, the AT_CODE symbol in ADl2 specifically doesn't mean the old legacy at-codes, which are non-systematic.

I would reverse this change, and do the following:

ID_CODE      : 'id' CODE_STR ;
AT_CODE      : 'at' CODE_STR ;
AC_CODE      : 'ac' CODE_STR ;
ADL14_AT_CODE: <put the old style regex here>

Copy link
Member

@wolandscat wolandscat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now here, we should have:

node_identifier : ID_CODE | ADL14_AT_CODE

Copy link

@rongchen rongchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, great job Mattijs!

@wolandscat
Copy link
Member

As far as I can tell, the corrections I noted above have not been made. If they are not, the grammars will not function correctly. For example, they would allow nodes in top-level archetypes to have ids like CLUSTER[at3], when it can only be CLUSTER[id4] or CLUSTER[at0003]. The current changes don't distinguish badly formed (i.e. ADL1.4) at-codes and regular at-codes (i.e. ADL2 at-codes).

@J3173
Copy link

J3173 commented Dec 2, 2025

@MattijsK and I have tried some options. This doesn't work:

ROOT_ID_CODE : ('id1'|'at0000') '.1'* ;
ID_CODE      : 'id' CODE_STR ;
AT_CODE      : 'at' CODE_STR ;
AC_CODE      : 'ac' CODE_STR ;
fragment CODE_STR : ('0' | [1-9][0-9]*) ( '.' ('0' | [1-9][0-9]* ))* ;
ADL14_AT_CODE      : 'at' ('0' | [0-9][0-9][0-9][0-9]) ( '.' ('0' | [1-9][0-9]* ))* ;

In this case the code at1234 would match both AT_CODE and ADL14_AT_CODE, and the lexer would always pick the first match, which is AT_CODE.

A first alternative would be making the token rules more specific, but still allow both ADL 1.4 and ADL 2.3 codes to be matched with AT_CODE:

ID_CODE      : 'id' CODE_STR ;
AT_CODE      : 'at' ('0' | [1-9][0-9]* | [0-9][0-9][0-9][0-9]) ( '.' ('0' | [1-9][0-9]* ))* ;
AC_CODE      : 'ac' CODE_STR ;
fragment CODE_STR : ('0' | [1-9][0-9]*) ( '.' ('0' | [1-9][0-9]* ))* ;

This would solve some, but not all problems. CLUSTER[id0009] wouldn't be allowed anymore, but CLUSTER[at3] would still be allowed.

A more complete solution would be to have a separate rule for overlapping at-codes:

// Some at-codes are both valid ADL 1.4 and ADL 2.3 codes:
// Codes that start with at0 and codes at1000 to at9999.
OVERLAPPING_AT_CODE : 'at' ('0' | [1-9][0-9][0-9][0-9]) ( '.' ('0' | [1-9][0-9]* ))* ;

ROOT_ID_CODE : ('id1'|'at0000') '.1'* ;
ID_CODE      : 'id' CODE_STR ;
AT_CODE      : 'at' CODE_STR ;
AC_CODE      : 'ac' CODE_STR ;
fragment CODE_STR : ('0' | [1-9][0-9]*) ( '.' ('0' | [1-9][0-9]* ))* ;
ADL14_AT_CODE      : 'at' ('0' | [0-9][0-9][0-9][0-9]) ( '.' ('0' | [1-9][0-9]* ))* ;

Then all places where an AT_CODE is expected, we would need to allow either AT_CODE or OVERLAPPING_AT_CODE. Or ADL14_AT_CODE and OVERLAPPING_AT_CODE where we expect ADL1.4 codes. This would make the parser rules more complex, but would solve all problems.

@wolandscat Do you have a preference for one of the solutions, or maybe another suggestion?

@wolandscat
Copy link
Member

Jelte, Mattjis, you are right. This is the kind of problem that occurs because of sticking to the old coding system, which is not regular. The really correct way to handle it would be with modal lexing, which means knowing whether you are lexing/parsing:

  • 1a: a top-level (spec level = 0) legacy-coded archetype (apparently all ADL2 archetypes within openEHR at the moment), or
  • 1b: a specialised legacy-coded ADL2 archetype
  • 2: a proper ADL2 archetype (ADL2 archetypes outside of openEHR, and presumably inside openEHR one day in the future).

The modes would use various lexer rules differently to correctly handle the codes. You may consider this too much work to be worth it, but in principle:

  • Mode 1a.: only ADL14_TL_AT_CODE can be a node id ('TL' = top-level, i.e. 0-filled, like 'at0004')
  • Mode 1b: ADL14_TL_AT_CODE can appear in override paths; new structures can only have node id = ADL2_AT_CODE (regular at-codes, like 'at33')
  • Mode 2: the original ADL2 spec - id-codes = ID_CODE, universally

You probably already know how to do it, but an example of modal lexing can be seen here https://github.com/openEHR/openEHR-antlr4/blob/master/reader_adl2/src/main/antlr/Adl2Lexer.g4

Note, that Antlr4 repo is not the 'official' one, although in my view, openEHR should use it instead, because it uses modal lexing to solve various problems in the current grammars (regex handling and so on). But that is for another day ;)

@joostholslag
Copy link

Could we create variants of the grammar? One with at coding for openehr RM users, and one for id coding for non-openehr?

@wolandscat
Copy link
Member

Could we create variants of the grammar? One with at coding for openehr RM users, and one for id coding for non-openehr?

Personally, that is what I would do. To do such 'modes' is not modal lexing, but semantic predicates - see here (scroll down a bit to see an example with variant forms of Java language). The same could be done with 'adl2' and 'adl2-legacy' flags, or similar.

@MattijsK
Copy link
Author

MattijsK commented Jan 6, 2026

For the sake of progress (this is one of the last blockers for adl 2.4), we decided to simply create two grammars: one id-coded and one at-coded.

This way we can move forward with correct specification grammars.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants