SPECAM-91 -- Add at-code support to ADL2 grammar #36

MattijsK · 2024-11-20T10:15:46Z

Allow at-codes in adl2 instead of id-codes. This is done by changing the ID_CODE identifier to a node_identifier which consist of ID_CODE and AT_CODE.

For local testing of the grammar we used the following archetypes:

The CKM body_weight archetype with id-codes:
openEHR-EHR-OBSERVATION.body_weight.v2.1.6.adls.txt
Our manually changed version of the CKM body_weight archetype with at-codes:
openEHR-EHR-OBSERVATION.body_weight.v2.1.6-at-coded.adls.txt

joostholslag · 2025-02-16T09:30:23Z

How does this relate to openEHR/archie#659 ? @MattijsK

borutjures · 2025-02-16T09:37:24Z

@joostholslag Maybe I can answer since it is Sunday. I'm using this repository as the "official" openEHR grammars in my tools. I believe the openEHR/archie#659 is for syncing these grammars into the Archie repository.

joostholslag · 2025-02-17T10:59:18Z

So why are they different repo's?

borutjures · 2025-02-17T12:10:25Z

Not everyone is using Archie, but everyone is using the grammars.

I suspect that the grammars in this repository weren’t always 100% of what was required by Archie so they made a copy of them. I remember that we discussed some of the required changes 3 years ago and Archie’s team and I were able to iterate faster by having our own copies of the grammars. From the last review we are now both using the exact copy of the grammars in this repository.

joostholslag · 2025-02-18T07:16:09Z

Ok check. As long as it's clear there's a proper source of truth. It might be interesting to do git submodule in Archie to include the ITS repo. And potentially include a branch of that in case you want to go ahead of the state of the ITS repo.

joostholslag · 2025-02-18T16:55:13Z

From the last review

Thanks for the info. @MattijsK seemed unaware. Could you kindle point to the commit/pr that changes the source for Archie?

MattijsK · 2025-02-20T09:23:43Z

From our point of view:

The grammar in this repo should be the grammar exactly as it is specified in the specification. Any change to this grammar should be thought about carefully and also be reflected in the specification.
The grammar in Archie is an implementation grammar that should have (necessary) freedom to do (small) fixes and changes so that implementers can work with it.

If we want to make this grammar up-to-date with the grammar in Archie, we should carefully look into every difference and decide if we want to add it to this grammar.

The grammar in this repository should be the leading grammar, but making Archie point directly to this grammar will be very complicated, or even almost impossible to work with.

borutjures · 2025-03-31T13:36:35Z

I was asked to review this PR (as a non-SEC member): I'm using the same grammar as proposed in this PR and it works in my tools. This means the grammar works in at least 2 separate implementations.

sebastian-iancu · 2025-11-11T14:03:42Z

@wolandscat can you please review this PR, and perhaps also answer the question if this is the repo to modify for the upcoming AM 2.4 release

wolandscat

This change here allows 0-filling of id-codes and at-codes. Neither is allowed in standard ADL2; only the legacy at-codes from ADL1.4 archetypes have this.

In particular, the AT_CODE symbol in ADl2 specifically doesn't mean the old legacy at-codes, which are non-systematic.

I would reverse this change, and do the following:

ID_CODE      : 'id' CODE_STR ;
AT_CODE      : 'at' CODE_STR ;
AC_CODE      : 'ac' CODE_STR ;
ADL14_AT_CODE: <put the old style regex here>

wolandscat

Now here, we should have:

node_identifier : ID_CODE | ADL14_AT_CODE

rongchen

Looks good, great job Mattijs!

wolandscat · 2025-11-27T12:52:03Z

As far as I can tell, the corrections I noted above have not been made. If they are not, the grammars will not function correctly. For example, they would allow nodes in top-level archetypes to have ids like CLUSTER[at3], when it can only be CLUSTER[id4] or CLUSTER[at0003]. The current changes don't distinguish badly formed (i.e. ADL1.4) at-codes and regular at-codes (i.e. ADL2 at-codes).

J3173 · 2025-12-02T10:24:27Z

@MattijsK and I have tried some options. This doesn't work:

ROOT_ID_CODE : ('id1'|'at0000') '.1'* ;
ID_CODE      : 'id' CODE_STR ;
AT_CODE      : 'at' CODE_STR ;
AC_CODE      : 'ac' CODE_STR ;
fragment CODE_STR : ('0' | [1-9][0-9]*) ( '.' ('0' | [1-9][0-9]* ))* ;
ADL14_AT_CODE      : 'at' ('0' | [0-9][0-9][0-9][0-9]) ( '.' ('0' | [1-9][0-9]* ))* ;

In this case the code at1234 would match both AT_CODE and ADL14_AT_CODE, and the lexer would always pick the first match, which is AT_CODE.

A first alternative would be making the token rules more specific, but still allow both ADL 1.4 and ADL 2.3 codes to be matched with AT_CODE:

ID_CODE      : 'id' CODE_STR ;
AT_CODE      : 'at' ('0' | [1-9][0-9]* | [0-9][0-9][0-9][0-9]) ( '.' ('0' | [1-9][0-9]* ))* ;
AC_CODE      : 'ac' CODE_STR ;
fragment CODE_STR : ('0' | [1-9][0-9]*) ( '.' ('0' | [1-9][0-9]* ))* ;

This would solve some, but not all problems. CLUSTER[id0009] wouldn't be allowed anymore, but CLUSTER[at3] would still be allowed.

A more complete solution would be to have a separate rule for overlapping at-codes:

// Some at-codes are both valid ADL 1.4 and ADL 2.3 codes:
// Codes that start with at0 and codes at1000 to at9999.
OVERLAPPING_AT_CODE : 'at' ('0' | [1-9][0-9][0-9][0-9]) ( '.' ('0' | [1-9][0-9]* ))* ;

ROOT_ID_CODE : ('id1'|'at0000') '.1'* ;
ID_CODE      : 'id' CODE_STR ;
AT_CODE      : 'at' CODE_STR ;
AC_CODE      : 'ac' CODE_STR ;
fragment CODE_STR : ('0' | [1-9][0-9]*) ( '.' ('0' | [1-9][0-9]* ))* ;
ADL14_AT_CODE      : 'at' ('0' | [0-9][0-9][0-9][0-9]) ( '.' ('0' | [1-9][0-9]* ))* ;

Then all places where an AT_CODE is expected, we would need to allow either AT_CODE or OVERLAPPING_AT_CODE. Or ADL14_AT_CODE and OVERLAPPING_AT_CODE where we expect ADL1.4 codes. This would make the parser rules more complex, but would solve all problems.

@wolandscat Do you have a preference for one of the solutions, or maybe another suggestion?

wolandscat · 2025-12-02T13:09:29Z

Jelte, Mattjis, you are right. This is the kind of problem that occurs because of sticking to the old coding system, which is not regular. The really correct way to handle it would be with modal lexing, which means knowing whether you are lexing/parsing:

1a: a top-level (spec level = 0) legacy-coded archetype (apparently all ADL2 archetypes within openEHR at the moment), or
1b: a specialised legacy-coded ADL2 archetype
2: a proper ADL2 archetype (ADL2 archetypes outside of openEHR, and presumably inside openEHR one day in the future).

The modes would use various lexer rules differently to correctly handle the codes. You may consider this too much work to be worth it, but in principle:

Mode 1a.: only ADL14_TL_AT_CODE can be a node id ('TL' = top-level, i.e. 0-filled, like 'at0004')
Mode 1b: ADL14_TL_AT_CODE can appear in override paths; new structures can only have node id = ADL2_AT_CODE (regular at-codes, like 'at33')
Mode 2: the original ADL2 spec - id-codes = ID_CODE, universally

You probably already know how to do it, but an example of modal lexing can be seen here https://github.com/openEHR/openEHR-antlr4/blob/master/reader_adl2/src/main/antlr/Adl2Lexer.g4

Note, that Antlr4 repo is not the 'official' one, although in my view, openEHR should use it instead, because it uses modal lexing to solve various problems in the current grammars (regex handling and so on). But that is for another day ;)

joostholslag · 2025-12-04T06:29:22Z

Could we create variants of the grammar? One with at coding for openehr RM users, and one for id coding for non-openehr?

wolandscat · 2025-12-04T11:29:37Z

Could we create variants of the grammar? One with at coding for openehr RM users, and one for id coding for non-openehr?

Personally, that is what I would do. To do such 'modes' is not modal lexing, but semantic predicates - see here (scroll down a bit to see an example with variant forms of Java language). The same could be done with 'adl2' and 'adl2-legacy' flags, or similar.

MattijsK · 2026-01-06T09:54:34Z

For the sake of progress (this is one of the last blockers for adl 2.4), we decided to simply create two grammars: one id-coded and one at-coded.

This way we can move forward with correct specification grammars.

Add at-code support to ADL2 grammar

05d2c88

ErikSundvall approved these changes Mar 31, 2025

View reviewed changes

MattijsK changed the title ~~Add at-code support to ADL2 grammar~~ SPECAM-91 Add at-code support to ADL2 grammar Apr 1, 2025

MattijsK changed the title ~~SPECAM-91 Add at-code support to ADL2 grammar~~ SPECAM-91 -- Add at-code support to ADL2 grammar Apr 1, 2025

joostholslag mentioned this pull request Apr 8, 2025

Remove implied second location of grammars masterAppB-syntax_spec.adoc openEHR/specifications-AM#31

Merged

sebastian-iancu assigned wolandscat Nov 11, 2025

wolandscat requested changes Nov 13, 2025

View reviewed changes

rongchen approved these changes Nov 27, 2025

View reviewed changes

SPECAM-91 -- Add at-code support to ADL2 grammar #36

Are you sure you want to change the base?

SPECAM-91 -- Add at-code support to ADL2 grammar #36

Uh oh!

Conversation

MattijsK commented Nov 20, 2024

Uh oh!

joostholslag commented Feb 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

borutjures commented Feb 16, 2025

Uh oh!

joostholslag commented Feb 17, 2025

Uh oh!

borutjures commented Feb 17, 2025

Uh oh!

joostholslag commented Feb 18, 2025

Uh oh!

joostholslag commented Feb 18, 2025

Uh oh!

MattijsK commented Feb 20, 2025

Uh oh!

borutjures commented Mar 31, 2025

Uh oh!

sebastian-iancu commented Nov 11, 2025

Uh oh!

wolandscat left a comment

Choose a reason for hiding this comment

Uh oh!

wolandscat left a comment

Choose a reason for hiding this comment

Uh oh!

rongchen left a comment

Choose a reason for hiding this comment

Uh oh!

wolandscat commented Nov 27, 2025

Uh oh!

J3173 commented Dec 2, 2025

Uh oh!

wolandscat commented Dec 2, 2025

Uh oh!

joostholslag commented Dec 4, 2025

Uh oh!

wolandscat commented Dec 4, 2025

Uh oh!

MattijsK commented Jan 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

joostholslag commented Feb 16, 2025 •

edited

Loading