Skip to content

Improve Gromacs input parameter parsing further  #129

@JFRudzinski

Description

@JFRudzinski

in #122 we already made some serious improvements to str_to_input_parameters(), but there are still some issues:

  1. The approach may be able to be improved:

@ladinesa suggested:

re_section = re.compile(r'(?P<indent> +?)(?P<key>\S+?) +\(*(?P<index>\d*)\)*\:')
re_value = re.compile(r'(?P<indent> +?)(?P<key>\S+?) *[:=]+ *(?P<value>[^\n]+)')

converters = [
    (re.compile(r'([-+]?\d+)'), lambda x: int(x.group(1))),
    (re.compile(r'(?i:(?:true|false))'), lambda x: x.group(1).lower() == 'true'),
    (re.compile(r'([-+]?\d+\.*\d*(?:[Ee][-+]\d+)?)'), lambda x: float(x.group(1))),
    (re.compile(r'\{(\d+),\.\.\.,(\d+)\}'), lambda x: list(range(int(x.group(1)), int(x.group(2))))),
    (re.compile(r'\{*([\d\.]+,.+)\}*'), lambda x: [float(v) for v in x.group(1).split(',')])
]

def convert_value(value):
    value = value.strip()
    for pattern, converter in converters:
        match = pattern.match(value)
        if match:
            return converter(match)
    if value.lower() == 'not available':
        return None
    return value

def parse_indented(block):
    root = {}
    stack = [(root, -1)]

    for line in block.splitlines():
        match = re_section.match(line)
        if not match:
            match = re_value.match(line)
        if not match:
            continue

        match_dct = match.groupdict()
        indent = len(match_dct.get('indent'))
        key = match_dct.get('key')
        value = match_dct.get('value')

        while stack and stack[-1][1] >= indent:
            stack.pop()

        parent = stack[-1][0]

        child = convert_value(value )if value else {}
        if key in parent:
            if isinstance(parent[key], list):
                parent[key].append(child)
            else:
                parent[key] = [parent[key], child]
        else:
            parent[key] = child

        stack.append((child, indent))
    return root

but I was not yet able to get this working completely.

  1. There are some special cases, e.g., all-lambas where the indentation method fails due to length of the keyword. These exceptions need to be dealt with specially.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions