Skip to content

feat: Add Python Bridge for Modern I/O (POSCAR/ASE)#52

Open
Liam-Deacon wants to merge 3 commits intomasterfrom
feature/modern-io-bridge
Open

feat: Add Python Bridge for Modern I/O (POSCAR/ASE)#52
Liam-Deacon wants to merge 3 commits intomasterfrom
feature/modern-io-bridge

Conversation

@Liam-Deacon
Copy link
Owner

@Liam-Deacon Liam-Deacon commented Dec 20, 2025

Modern Input/Output Bridge for CLEED

This PR implements the "Python Bridge" strategy discussed in Issue #51 to enable CLEED to consume modern crystallography formats (POSCAR, CIF, XYZ) via ASE.

Features

1. cleed-io Python Package

A new lightweight package in tools/ that provides:

  • slicer.py: An algorithm to automatically "slice" a 3D periodic slab (from VASP/ASE) into the "Bulk" and "Overlayer" components required by CLEED.
  • writer.py: Generates valid .bul and .inp files from the sliced atomic structures.
  • cli.py: A command-line tool cleed-convert for easy batch processing.

2. Supported Workflow

Users can now generate a structure in Python/ASE or VASP and immediately convert it for LEED simulation:

# Convert a VASP POSCAR
cleed-convert --input POSCAR --output my_surface

This generates my_surface.bul and my_surface.inp ready for csearch.

3. Documentation

Added docs/modern-io.md explaining the motivation, installation, and usage of the new tools.

Technical Details

  • Alignment: The slicer automatically rotates arbitrary input cells so the surface vectors lie in the $xy$-plane (CLEED convention).
  • Bulk Detection: Currently uses a heuristic (bottom $N$ layers are bulk). Future work will implement rigorous periodicity detection.
  • Dependencies: Requires ase (Atomic Simulation Environment).

Related Issues

Closes #51

Summary by Sourcery

Introduce a Python-based I/O bridge to convert modern crystallography structures into CLEED bulk/overlayer input files.

New Features:

  • Add slicer utilities to align ASE slabs and split them into bulk and overlayer regions for CLEED.
  • Add writers to generate CLEED .bul and .inp files from sliced atomic structures, including basic default parameters.
  • Provide a cleed-convert CLI for batch conversion of POSCAR/CIF/XYZ files into CLEED inputs.
  • Package the new tools as a lightweight cleed-io Python package with an installable entry point script.

Build:

  • Add a pyproject.toml in tools/ to define the cleed-io package metadata, dependencies, and console script entry point.

Documentation:

  • Add modern-io documentation describing installation and usage of the new cleed-io tools for converting modern structure formats to CLEED files.

Summary by CodeRabbit

  • New Features

    • Added a cleed-convert CLI and supporting I/O bridge for bidirectional conversion between CLEED and common structure formats (POSCAR/CIF/XYZ), with configurable bulk-layer slicing and round-trip read/write support.
  • Documentation

    • Added a comprehensive modern I/O guide covering installation, usage examples, supported formats, options, and integration notes.
  • Tests

    • Added unit tests validating slab slicing and end-to-end conversion/round-trip behavior.
  • Chores

    • Added packaging metadata and a console entry point for the new CLI.

✏️ Tip: You can customize this high-level summary in your review settings.

Implements the 'cleed-convert' tool to slice 3D slabs into CLEED .bul/.inp files.

Resolves #51
@sourcery-ai
Copy link
Contributor

sourcery-ai bot commented Dec 20, 2025

Reviewer's Guide

Introduces a new Python-based cleed-io bridge package (with slicer, writer, and CLI) plus documentation, enabling conversion of ASE-readable structures (e.g., POSCAR/CIF/XYZ) into CLEED .bul and .inp files.

Sequence diagram for cleed-convert structure conversion pipeline

sequenceDiagram
    actor User
    participant CLI as cleed_convert_CLI
    participant ASE as ASE_io
    participant Slicer as cleed_io_slicer
    participant Writer as cleed_io_writer
    participant FS as File_system

    User->>CLI: cleed-convert -i POSCAR -o ni111_co --bulk-layers 2
    CLI->>ASE: read(input, format)
    ASE-->>CLI: Atoms_object

    CLI->>Slicer: slice_slab(Atoms_object, bulk_layers_count)
    Slicer->>Slicer: align_surface(Atoms)
    Slicer->>Slicer: cluster_layers(Atoms)
    Slicer-->>CLI: bulk_atoms, overlayer_atoms

    CLI->>Writer: write_bul(output_basename.bul, bulk_atoms)
    Writer->>FS: open(.bul, write)
    Writer->>FS: write lattice_vectors, bulk_atoms, parameters
    FS-->>Writer: close(.bul)

    CLI->>Writer: write_inp(output_basename.inp, overlayer_atoms, bulk_atoms)
    Writer->>FS: open(.inp, write)
    Writer->>FS: write superstructure, overlayer_atoms, search_parameters
    FS-->>Writer: close(.inp)

    Writer-->>CLI: success
    CLI-->>User: Done.
Loading

Class diagram for cleed-io package modules and key functions

classDiagram
    class CleedIOPackage {
      +string __version__
    }

    class SlicerModule {
      +align_surface(atoms, vacuum_direction, tolerance) Atoms
      +cluster_layers(atoms, tolerance) list
      +slice_slab(atoms, bulk_layers_count) tuple
    }

    class WriterModule {
      +format_vector(label, v) string
      +format_atom(prefix, atom, phase_map, vib_default) string
      +write_bul(filename, bulk_atoms, phase_map) void
      +write_inp(filename, overlayer_atoms, bulk_atoms, phase_map) void
    }

    class CliModule {
      +main() void
    }

    class ExternalASEAtoms {
      +positions
      +cell
      +pbc
      +rotate(axis_vector, axis_name, rotate_cell)
      +get_cell()
      +get_pbc()
      +set_cell(cell)
      +get_chemical_formula()
    }

    CleedIOPackage <|-- SlicerModule
    CleedIOPackage <|-- WriterModule
    CleedIOPackage <|-- CliModule

    SlicerModule ..> ExternalASEAtoms : uses
    WriterModule ..> ExternalASEAtoms : uses
    CliModule ..> SlicerModule : calls_slice_slab
    CliModule ..> WriterModule : calls_writers
    CliModule ..> ExternalASEAtoms : obtains_from_ASE_read
Loading

File-Level Changes

Change Details Files
Implements slab alignment and heuristic slab slicing into bulk and overlayer components using ASE Atoms.
  • Adds align_surface to rotate arbitrary ASE cells so surface vectors lie in the xy-plane with surface normal ~+z, assuming slab-like input.
  • Adds cluster_layers to group atoms into z-layer stacks using a simple distance-based tolerance on z-coordinates.
  • Adds slice_slab to call alignment and layer clustering, split atoms into bulk vs overlayer by bottom N layers, and construct a heuristic bulk c-vector from inter-layer spacing.
tools/cleed_io/slicer.py
Adds writers for CLEED .bul and .inp formats based on sliced ASE structures.
  • Adds helpers to format lattice vectors and atoms with optional phase label mapping and default vibrational parameters.
  • Implements write_bul that writes lattice vectors, bulk atoms, and default VR/VI parameters from a bulk Atoms object.
  • Implements write_inp that writes a default 1x1 superstructure, overlayer atoms, and default search energy parameters, using the bulk cell for in-plane periodicity.
tools/cleed_io/writer.py
Introduces a CLI tool cleed-convert to read modern structure files via ASE and emit CLEED input files.
  • Parses CLI arguments for input path, output basename, bulk-layer count, and optional ASE format override.
  • Reads input with ase.io.read, slices the slab via slice_slab, and writes .bul and .inp using the writer helpers.
  • Adds simple progress and error reporting to stdout/stderr and non-zero exit codes on failure.
tools/cleed_io/cli.py
Defines the cleed-io Python package metadata and entry point, and exposes a top-level package API.
  • Adds pyproject.toml under tools/ configuring setuptools build-backend, core dependencies (ase, numpy, pyyaml), Python version requirement, and console script entry point cleed-convert=cleed_io.cli:main.
  • Adds a minimal __init__.py with package docstring and version constant.
tools/pyproject.toml
tools/cleed_io/__init__.py
Documents the modern I/O workflow and usage of the new conversion tools.
  • Adds modern-io.md describing motivation, installation, example commands, bulk-layer configuration, and underlying algorithm steps (alignment, slicing, separation).
docs/modern-io.md

Assessment against linked issues

Issue Objective Addressed Explanation
#51 Implement a Python-based preprocessor/bridge that uses ASE to read modern structure files (e.g., POSCAR, CIF, XYZ) and prepare data for CLEED.
#51 Provide an automated slicing algorithm that converts a 3D slab/supercell into CLEED-style Bulk and Overlayer components and outputs corresponding .bul and .inp files.
#51 Add documentation describing the motivation, installation, and usage of the new Python bridge workflow for modern I/O.

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@coderabbitai
Copy link

coderabbitai bot commented Dec 20, 2025

Walkthrough

Adds a Python-based CLEED I/O bridge: documentation, an ASE-backed CLI (cleed-convert), slab alignment and layer-slicing logic, readers/writers for CLEED .bul/.inp files, package metadata/entry point, and unit tests exercising slicing and round-trip IO.

Changes

Cohort / File(s) Summary
Documentation
docs/modern-io.md
New guide describing the modern I/O workflow, cleed-convert usage, supported formats (via ASE), options (e.g., --bulk-layers), and internal slicing/alignment behaviour and conventions.
Package Init
tools/cleed_io/__init__.py
New package initializer with module docstring and __version__ = "0.1.0".
CLI
tools/cleed_io/cli.py
New CLI (cleed-convert) using argparse; supports mutually exclusive --to-cleed/--from-cleed modes, reads/writes via ASE, invokes slicer and reader/writer helpers, prints progress and exits on errors; exposes main().
Geometry processing
tools/cleed_io/slicer.py
New slicer module: align_surface(atoms), cluster_layers(atoms, tolerance=0.2), and slice_slab(atoms, bulk_layers_count=2) to orient slabs, group atoms into layers by z, and split bulk vs overlayer (contains TODOs for robust a3/unit-cell derivation).
CLEED file writers
tools/cleed_io/writer.py
New writer helpers: format_vector(), format_atom(), write_bul(filename, bulk_atoms, phase_map=None), and write_inp(filename, overlayer_atoms, bulk_atoms, phase_map=None) that emit .bul and .inp text files from ASE Atoms.
CLEED file reader
tools/cleed_io/reader.py
New reader utilities: parse_vector(line), parse_atom(line), and read_cleed_files(bul_file, inp_file) to parse .bul/.inp and reconstruct an ASE Atoms object; raises on missing lattice vectors.
Project manifest & entry point
tools/pyproject.toml
New pyproject defining build-system, metadata, dependencies (ase>=3.22.0, numpy, pyyaml), Python >=3.8, console entry cleed-convert = cleed_io.cli:main, and optional phaseshifts group.
Tests
tools/tests/test_io.py
New unit tests TestCleedIO with test_slice_slab and test_round_trip that build a Ni(111)+O slab, validate slicing, write/read .bul/.inp, and assert round-trip atom counts/formula and lattice alignment.

Sequence Diagram(s)

sequenceDiagram
    participant User as User/CLI
    participant CLI as cleed_io.cli
    participant ASE as ASE (reader/writer)
    participant Slicer as cleed_io.slicer
    participant Writer as cleed_io.writer
    participant Reader as cleed_io.reader
    participant FS as File System

    User->>CLI: cleed-convert input.poscar -o output --to-cleed
    CLI->>ASE: read(input.poscar)
    ASE-->>CLI: Atoms object
    CLI->>Slicer: slice_slab(Atoms, bulk_layers_count)
    Slicer-->>CLI: (bulk_atoms, overlayer_atoms)
    CLI->>Writer: write_bul(output.bul, bulk_atoms)
    Writer->>FS: create/write output.bul
    CLI->>Writer: write_inp(output.inp, overlayer_atoms, bulk_atoms)
    Writer->>FS: create/write output.inp
    FS-->>CLI: success
    CLI-->>User: print success message

    alt round-trip
      User->>CLI: cleed-convert --from-cleed output.bul output.inp -o out.poscar
      CLI->>Reader: read_cleed_files(output.bul, output.inp)
      Reader-->>CLI: Atoms object
      CLI->>ASE: write(out.poscar, Atoms)
      ASE-->>FS: write out.poscar
      CLI-->>User: print success message
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related issues

Suggested labels

documentation, enhancement

Poem

🐰 I nibble atoms, align them straight,
I slice the slab, I shape its fate,
From POSCAR fields to CLEED's embrace,
A tiny bridge in code and space,
Hop—now legacy finds a modern place.

🚥 Pre-merge checks | ✅ 4 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 53.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: introducing a Python bridge for modern I/O format conversion (POSCAR/ASE to CLEED), which is the primary objective of the PR.
Linked Issues check ✅ Passed The PR implements all core objectives from #51: ASE-based format conversion, automated slab slicing algorithm, .bul/.inp file generation, CLI tool, documentation, and support for multiple file formats through ASE.
Out of Scope Changes check ✅ Passed All changes are directly related to the Python Bridge feature: documentation, core conversion logic (slicer, reader, writer), CLI tool, package metadata, and integration tests. No unrelated modifications detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codacy-production
Copy link

codacy-production bot commented Dec 20, 2025

Codacy's Analysis Summary

28 new issues (≤ 0 issue)
0 new security issue
59 complexity
0 duplications

Review Pull Request in Codacy →

AI Reviewer available: add the codacy-review label to get contextual insights without leaving GitHub.

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 5 issues, and left some high level feedback:

  • In align_surface, the vacuum_direction and tolerance parameters are unused and the negative-c case is explicitly ignored (pass), which makes the function’s behavior hard to reason about; consider either implementing these options or removing/renaming them and making the z-orientation handling explicit and deterministic.
  • The bulk cell construction in slice_slab uses dz * len(layers) for the a3 vector even though the comment describes a3 as the inter-layer spacing; if the intent is to represent the bulk periodicity, this should likely be just dz (or a clearly derived periodic translation), otherwise the .bul output will encode the full slab thickness instead of the bulk repeat.
  • In the CLI, ase.io.read can return a list of Atoms objects for multi-frame inputs (e.g., trajectories); it may be safer to explicitly handle that case (e.g., select the first frame or error out) so downstream slicing logic always receives a single Atoms instance.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `align_surface`, the `vacuum_direction` and `tolerance` parameters are unused and the negative-`c` case is explicitly ignored (`pass`), which makes the function’s behavior hard to reason about; consider either implementing these options or removing/renaming them and making the z-orientation handling explicit and deterministic.
- The bulk cell construction in `slice_slab` uses `dz * len(layers)` for the `a3` vector even though the comment describes `a3` as the inter-layer spacing; if the intent is to represent the bulk periodicity, this should likely be just `dz` (or a clearly derived periodic translation), otherwise the `.bul` output will encode the full slab thickness instead of the bulk repeat.
- In the CLI, `ase.io.read` can return a list of `Atoms` objects for multi-frame inputs (e.g., trajectories); it may be safer to explicitly handle that case (e.g., select the first frame or error out) so downstream slicing logic always receives a single `Atoms` instance.

## Individual Comments

### Comment 1
<location> `tools/cleed_io/slicer.py:39-44` </location>
<code_context>
+    atoms.rotate(normal, 'z', rotate_cell=True)
+    
+    # Ensure c is positive z
+    final_cell = atoms.get_cell()
+    if final_cell[2][2] < 0:
+        # Flip z? No, that mirrors the structure. 
+        # Just ensure the slab is oriented correctly.
+        # ViPErLEED: "positive z-direction pointing away from the surface into the vacuum"
+        pass
+
+    return atoms
</code_context>

<issue_to_address>
**issue (bug_risk):** The negative z-component of `c` is detected but never corrected, which can violate the stated CLEED convention.

Right now, when `final_cell[2][2] < 0` we detect the issue but do nothing, so the returned structure can violate the documented CLEED convention (+z into vacuum) without any signal to the caller. Consider either applying a proper 180° rotation around x or y to fix the sign while preserving handedness, raising an error instead of silently continuing, or clearly documenting that the orientation is not enforced.
</issue_to_address>

### Comment 2
<location> `tools/cleed_io/slicer.py:125-132` </location>
<code_context>
+    if len(layers) >= 2:
+        z1 = np.mean(aligned.positions[layers[0], 2])
+        z2 = np.mean(aligned.positions[layers[1], 2])
+        dz = z2 - z1
+        # This assumes vertical stacking.
+        # Ideally, we find the translation vector between layer 0 and layer 1.
+        
+        # Heuristic: Use the slab cell for a1, a2.
+        # Construct a3 from the layer spacing.
+        cell = aligned.get_cell()
+        bulk_atoms.set_cell([cell[0], cell[1], [0, 0, dz * len(layers)]]) # Dummy a3
+        
+    return bulk_atoms, overlayer_atoms
</code_context>

<issue_to_address>
**suggestion (bug_risk):** Using `dz * len(layers)` for the bulk a3 vector is a questionable heuristic and likely misrepresents the bulk periodicity.

Here `dz` is the spacing between the first two layers, but a3 is scaled by `len(layers)` (including any overlayer). This will often make a3 larger than the true bulk repeat and tie it to whether an overlayer is present. If this is only a placeholder, consider using `dz` (or `dz * bulk_layer_count` if appropriate) and making that explicit in the API (e.g., flag/metadata), or allow the caller to provide the bulk a3 explicitly.

Suggested implementation:

```python
        # Heuristic: Use the slab cell for a1, a2.
        # Construct a3 from the inter-layer spacing between the first two bulk-like layers.
        # NOTE: This is a placeholder for the bulk c vector; callers should supply the
        # true bulk a3 if known.
        cell = aligned.get_cell()
        bulk_c = abs(dz)
        bulk_atoms.set_cell([cell[0], cell[1], [0, 0, bulk_c]])  # Dummy a3 based on first-layer spacing

```

To fully address the concern and make the heuristic explicit in the API:
1. Consider adding an optional parameter (e.g. `bulk_a3=None` or `bulk_c=None`) to this function so callers can provide the correct bulk periodicity when available.
2. If such a parameter is added, use it here instead of the placeholder when provided, and fall back to the `abs(dz)` heuristic only when it is not given.
3. Optionally attach a flag/metadata attribute on `bulk_atoms` (e.g. `bulk_atoms.info["a3_is_placeholder"] = True`) when using the heuristic, so downstream code can detect and handle this.
</issue_to_address>

### Comment 3
<location> `tools/cleed_io/slicer.py:5-14` </location>
<code_context>
+from ase import Atoms
+from ase.geometry import cell_to_cellpar
+
+def align_surface(atoms: Atoms, vacuum_direction='z', tolerance=1e-3) -> Atoms:
+    """
+    Aligns the atoms object such that the periodic surface vectors (a, b)
+    lie in the xy-plane (z=0) and the surface normal points along +z.
+    
+    This matches the CLEED convention where z is the surface normal.
+    """
+    # 1. Identify the non-periodic direction or the longest vector if 3D periodic
+    # For a standard slab, we assume c is the vacuum direction.
+    
+    cell = atoms.get_cell()
+    pbc = atoms.get_pbc()
+    
+    # We assume standard ASE slab convention: 3rd vector is out-of-plane
</code_context>

<issue_to_address>
**suggestion:** `vacuum_direction` (and `pbc`) are unused, which could confuse callers about supported orientations.

The implementation assumes the third cell vector is out-of-plane and always maps `cell[0]` to x, so the current behavior doesn’t reflect the `vacuum_direction`/`pbc` arguments. Either add logic to honor these parameters or remove them from the API for now so callers aren’t misled about what orientations are actually supported.

Suggested implementation:

```python
def align_surface(atoms: Atoms) -> Atoms:
    """
    Aligns the atoms object such that the periodic surface vectors (a, b)
    lie in the xy-plane (z=0) and the surface normal points along +z.

    This matches the CLEED convention where z is the surface normal.

    Note:
        The current implementation assumes the standard ASE slab convention
        where the third cell vector is the out-of-plane (vacuum) direction.
        Other orientations are not yet supported.
    """
    # 1. Identify the non-periodic direction or the longest vector if 3D periodic
    # For a standard slab, we assume c is the vacuum direction.

    cell = atoms.get_cell()

```

1. All call sites of `align_surface` must be updated to drop the `vacuum_direction` and `tolerance` arguments, e.g. `align_surface(atoms)` instead of `align_surface(atoms, vacuum_direction='z')`.
2. If the `pbc` variable is used later in this function (outside the snippet shown), reintroduce it at the point of first use; otherwise, keep it removed to avoid unused-variable warnings.
</issue_to_address>

### Comment 4
<location> `tools/cleed_io/writer.py:39-19` </location>
<code_context>
+    """
+    Writes the CLEED .inp file.
+    """
+    cell = bulk_atoms.get_cell() # Overlayer uses bulk 2D periodicity usually
+    
+    with open(filename, 'w') as f:
+        f.write(f"c: Generated by cleed-convert\n")
+        
</code_context>

<issue_to_address>
**nitpick:** `cell` is computed but never used in `write_inp`, suggesting either dead code or a missing use.

If the overlayer `.inp` truly doesn’t need lattice vectors, remove this assignment. If you expect to use the bulk 2D periodicity later (e.g., for superstructure or overlayer positioning), either hook `cell` into that logic now or add a clear TODO so its purpose is explicit.
</issue_to_address>

### Comment 5
<location> `docs/modern-io.md:7` </location>
<code_context>
+
+## Motivation
+
+Standard DFT workflows (VASP, Quantum Espresso) use "Supercell" geometries (3D periodic boxes). CLEED requires a specific "Layered" definition (Semi-infinite Bulk + Finite Overlayer). 
+
+The `cleed-convert` tool bridges this gap by automatically slicing a 3D slab into the required LEED components.
</code_context>

<issue_to_address>
**nitpick (typo):** Consider using the official capitalization "Quantum ESPRESSO".

Please update "Quantum Espresso" to "Quantum ESPRESSO" to match the official project name used in their documentation and common usage.

```suggestion
Standard DFT workflows (VASP, Quantum ESPRESSO) use "Supercell" geometries (3D periodic boxes). CLEED requires a specific "Layered" definition (Semi-infinite Bulk + Finite Overlayer). 
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +39 to +44
final_cell = atoms.get_cell()
if final_cell[2][2] < 0:
# Flip z? No, that mirrors the structure.
# Just ensure the slab is oriented correctly.
# ViPErLEED: "positive z-direction pointing away from the surface into the vacuum"
pass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): The negative z-component of c is detected but never corrected, which can violate the stated CLEED convention.

Right now, when final_cell[2][2] < 0 we detect the issue but do nothing, so the returned structure can violate the documented CLEED convention (+z into vacuum) without any signal to the caller. Consider either applying a proper 180° rotation around x or y to fix the sign while preserving handedness, raising an error instead of silently continuing, or clearly documenting that the orientation is not enforced.

Comment on lines +125 to +132
dz = z2 - z1
# This assumes vertical stacking.
# Ideally, we find the translation vector between layer 0 and layer 1.

# Heuristic: Use the slab cell for a1, a2.
# Construct a3 from the layer spacing.
cell = aligned.get_cell()
bulk_atoms.set_cell([cell[0], cell[1], [0, 0, dz * len(layers)]]) # Dummy a3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (bug_risk): Using dz * len(layers) for the bulk a3 vector is a questionable heuristic and likely misrepresents the bulk periodicity.

Here dz is the spacing between the first two layers, but a3 is scaled by len(layers) (including any overlayer). This will often make a3 larger than the true bulk repeat and tie it to whether an overlayer is present. If this is only a placeholder, consider using dz (or dz * bulk_layer_count if appropriate) and making that explicit in the API (e.g., flag/metadata), or allow the caller to provide the bulk a3 explicitly.

Suggested implementation:

        # Heuristic: Use the slab cell for a1, a2.
        # Construct a3 from the inter-layer spacing between the first two bulk-like layers.
        # NOTE: This is a placeholder for the bulk c vector; callers should supply the
        # true bulk a3 if known.
        cell = aligned.get_cell()
        bulk_c = abs(dz)
        bulk_atoms.set_cell([cell[0], cell[1], [0, 0, bulk_c]])  # Dummy a3 based on first-layer spacing

To fully address the concern and make the heuristic explicit in the API:

  1. Consider adding an optional parameter (e.g. bulk_a3=None or bulk_c=None) to this function so callers can provide the correct bulk periodicity when available.
  2. If such a parameter is added, use it here instead of the placeholder when provided, and fall back to the abs(dz) heuristic only when it is not given.
  3. Optionally attach a flag/metadata attribute on bulk_atoms (e.g. bulk_atoms.info["a3_is_placeholder"] = True) when using the heuristic, so downstream code can detect and handle this.

"""
cell = bulk_atoms.get_cell()

with open(filename, 'w') as f:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: cell is computed but never used in write_inp, suggesting either dead code or a missing use.

If the overlayer .inp truly doesn’t need lattice vectors, remove this assignment. If you expect to use the bulk 2D periodicity later (e.g., for superstructure or overlayer positioning), either hook cell into that logic now or add a clear TODO so its purpose is explicit.


## Motivation

Standard DFT workflows (VASP, Quantum Espresso) use "Supercell" geometries (3D periodic boxes). CLEED requires a specific "Layered" definition (Semi-infinite Bulk + Finite Overlayer).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick (typo): Consider using the official capitalization "Quantum ESPRESSO".

Please update "Quantum Espresso" to "Quantum ESPRESSO" to match the official project name used in their documentation and common usage.

Suggested change
Standard DFT workflows (VASP, Quantum Espresso) use "Supercell" geometries (3D periodic boxes). CLEED requires a specific "Layered" definition (Semi-infinite Bulk + Finite Overlayer).
Standard DFT workflows (VASP, Quantum ESPRESSO) use "Supercell" geometries (3D periodic boxes). CLEED requires a specific "Layered" definition (Semi-infinite Bulk + Finite Overlayer).

Adds 'reader.py' to parse legacy CLEED files and updates CLI to support '--from-cleed' export to POSCAR/CIF. Includes unit tests.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 11

📜 Review details

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d5ba80a and 79aa2d2.

📒 Files selected for processing (6)
  • docs/modern-io.md (1 hunks)
  • tools/cleed_io/__init__.py (1 hunks)
  • tools/cleed_io/cli.py (1 hunks)
  • tools/cleed_io/slicer.py (1 hunks)
  • tools/cleed_io/writer.py (1 hunks)
  • tools/pyproject.toml (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-12-20T09:06:56.375Z
Learnt from: CR
Repo: Liam-Deacon/CLEED PR: 0
File: doc/AGENTS.md:0-0
Timestamp: 2025-12-20T09:06:56.375Z
Learning: Applies to doc/CLEED_Manual.pdf : Keep `doc/CLEED_Manual.pdf` tracked in git as the canonical reference manual for the original CLEED program suite

Applied to files:

  • docs/modern-io.md
🧬 Code graph analysis (1)
tools/cleed_io/cli.py (2)
tools/cleed_io/slicer.py (1)
  • slice_slab (75-134)
tools/cleed_io/writer.py (2)
  • write_bul (13-33)
  • write_inp (35-56)
🪛 GitHub Check: Codacy Static Code Analysis
docs/modern-io.md

[notice] 30-30: docs/modern-io.md#L30
Expected: 1; Actual: 3


[notice] 30-30: docs/modern-io.md#L30
Lists should be surrounded by blank lines


[notice] 31-31: docs/modern-io.md#L31
Expected: 1; Actual: 3


[notice] 44-44: docs/modern-io.md#L44
Expected: 1; Actual: 2


[notice] 45-45: docs/modern-io.md#L45
Expected: 1; Actual: 2


[notice] 46-46: docs/modern-io.md#L46
Expected: 1; Actual: 2


[notice] 47-47: docs/modern-io.md#L47
Expected: 1; Actual: 3


[notice] 48-48: docs/modern-io.md#L48
Expected: 1; Actual: 3


[notice] 53-53: docs/modern-io.md#L53
Expected: 1; Actual: 3


[notice] 53-53: docs/modern-io.md#L53
Lists should be surrounded by blank lines


[notice] 54-54: docs/modern-io.md#L54
Expected: 1; Actual: 3

tools/cleed_io/cli.py

[notice] 38-38: tools/cleed_io/cli.py#L38
expected 2 blank lines after class or function definition, found 1 (E305)

tools/cleed_io/slicer.py

[warning] 3-3: tools/cleed_io/slicer.py#L3
'ase.geometry.cell_to_cellpar' imported but unused (F401)


[warning] 3-3: tools/cleed_io/slicer.py#L3
Unused cell_to_cellpar imported from ase.geometry


[notice] 16-16: tools/cleed_io/slicer.py#L16
Unused variable 'pbc'


[warning] 16-16: tools/cleed_io/slicer.py#L16
local variable 'pbc' is assigned to but never used (F841)

tools/cleed_io/writer.py

[warning] 1-1: tools/cleed_io/writer.py#L1
'numpy as np' imported but unused (F401)


[warning] 1-1: tools/cleed_io/writer.py#L1
Unused numpy imported as np


[notice] 39-39: tools/cleed_io/writer.py#L39
Unused variable 'cell'


[warning] 39-39: tools/cleed_io/writer.py#L39
local variable 'cell' is assigned to but never used (F841)


[warning] 42-42: tools/cleed_io/writer.py#L42
f-string is missing placeholders (F541)

🪛 markdownlint-cli2 (0.18.1)
docs/modern-io.md

7-7: Trailing spaces
Expected: 0 or 2; Actual: 1

(MD009, no-trailing-spaces)

🪛 Ruff (0.14.8)
tools/cleed_io/cli.py

34-34: Do not catch blind exception: Exception

(BLE001)

tools/cleed_io/slicer.py

5-5: Unused function argument: vacuum_direction

(ARG001)


5-5: Unused function argument: tolerance

(ARG001)


16-16: Local variable pbc is assigned to but never used

Remove assignment to unused variable pbc

(F841)


96-96: Avoid specifying long messages outside the exception class

(TRY003)

tools/cleed_io/writer.py

39-39: Local variable cell is assigned to but never used

Remove assignment to unused variable cell

(F841)


42-42: f-string without any placeholders

Remove extraneous f prefix

(F541)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Codacy Static Code Analysis
  • GitHub Check: windows-latest
🔇 Additional comments (7)
tools/cleed_io/__init__.py (1)

1-6: LGTM!

Clean package initializer with appropriate version metadata.

docs/modern-io.md (1)

1-54: Excellent documentation.

The documentation clearly explains the motivation, installation, usage, and technical details of the modern I/O bridge. The examples are practical and helpful.

tools/cleed_io/cli.py (2)

34-36: Bare exception catch is acceptable for CLI error handling.

While the static analyzer flags the bare Exception catch, this is appropriate for a CLI tool's top-level error handler where all exceptions should be caught and reported to the user.


7-33: Well-structured CLI implementation.

The CLI follows best practices with clear argument parsing, progress messages, and proper error handling. The workflow is straightforward and user-friendly.

tools/cleed_io/writer.py (1)

4-11: Well-designed formatting utilities.

The format_vector and format_atom functions provide clean, reusable formatting with appropriate defaults and optional customization.

tools/cleed_io/slicer.py (2)

48-73: Solid layer clustering implementation.

The cluster_layers function correctly groups atoms by z-coordinate with appropriate tolerance handling and edge cases (empty structure, single atom).


25-36: The rotation logic is correctly implemented and aligns with ASE's rotate() method behavior. The method signature rotate(a, v) aligns vector a with direction v (not rotates around v), which is exactly what the code intends. Line 26 correctly aligns the cell[0] vector with the x-axis, and line 36 correctly aligns the surface normal with the z-axis to standardize the slab orientation for CLEED. The comments and logic match the surface crystallography convention. No critical issues found.

Likely an incorrect or invalid review comment.


## Motivation

Standard DFT workflows (VASP, Quantum Espresso) use "Supercell" geometries (3D periodic boxes). CLEED requires a specific "Layered" definition (Semi-infinite Bulk + Finite Overlayer).
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Remove trailing space.

Markdownlint detected a trailing space at the end of this line.

🔎 Proposed fix
-Standard DFT workflows (VASP, Quantum Espresso) use "Supercell" geometries (3D periodic boxes). CLEED requires a specific "Layered" definition (Semi-infinite Bulk + Finite Overlayer). 
+Standard DFT workflows (VASP, Quantum Espresso) use "Supercell" geometries (3D periodic boxes). CLEED requires a specific "Layered" definition (Semi-infinite Bulk + Finite Overlayer).
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
Standard DFT workflows (VASP, Quantum Espresso) use "Supercell" geometries (3D periodic boxes). CLEED requires a specific "Layered" definition (Semi-infinite Bulk + Finite Overlayer).
Standard DFT workflows (VASP, Quantum Espresso) use "Supercell" geometries (3D periodic boxes). CLEED requires a specific "Layered" definition (Semi-infinite Bulk + Finite Overlayer).
🧰 Tools
🪛 markdownlint-cli2 (0.18.1)

7-7: Trailing spaces
Expected: 0 or 2; Actual: 1

(MD009, no-trailing-spaces)

🤖 Prompt for AI Agents
In docs/modern-io.md around line 7, remove the trailing whitespace at the end of
the line that reads 'Standard DFT workflows (VASP, Quantum Espresso) use
"Supercell" geometries (3D periodic boxes). CLEED requires a specific "Layered"
definition (Semi-infinite Bulk + Finite Overlayer).' by deleting the extra space
character at the end so the line ends with the period; re-save the file to
ensure markdownlint no longer flags a trailing space.

Comment on lines +12 to +16
dependencies = [
"ase>=3.22.0",
"numpy",
"pyyaml"
]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find and read the pyproject.toml file in tools directory
find . -path "*/tools/pyproject.toml" -type f | head -1 | xargs cat -n

Repository: Liam-Deacon/CLEED

Length of output: 667


Add version constraints to numpy and pyyaml dependencies.

The numpy and pyyaml dependencies lack version constraints, which could lead to unexpected breakage if major versions introduce incompatible changes (e.g., numpy's 1.x to 2.x transition). Consider specifying minimum and ideally maximum version bounds for these dependencies to ensure stability.

🤖 Prompt for AI Agents
In tools/pyproject.toml around lines 12 to 16, numpy and pyyaml are listed
without version constraints; add appropriate version bounds (at minimum a
minimum version and preferably an upper bound) to both entries (for example pin
numpy to a compatible >= and < major bump and pyyaml similarly) so the project
avoids breaking changes from major releases; update the dependencies array to
include those constraints and run dependency resolution to confirm
compatibility.

Comment on lines +22 to +23
[project.optional-dependencies]
phaseshifts = ["phaseshifts"]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Add version constraint for optional dependency.

The phaseshifts optional dependency has no version constraint, which could lead to compatibility issues.

🤖 Prompt for AI Agents
In tools/pyproject.toml around lines 22-23, the optional dependency entry
"phaseshifts" has no version constraint; update the
project.optional-dependencies entry to include a semantic version specifier (for
example use a range like a minimum supported version and an upper bound to avoid
breaking changes, or pin to a specific tested version), commit the change, and
run your dependency resolver (poetry/pip-tools) to verify the constraint is
valid and the lockfile updates accordingly.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

♻️ Duplicate comments (2)
tools/cleed_io/cli.py (1)

64-65: Add blank line before module guard.

PEP 8 recommends two blank lines before top-level code following function definitions.

🔎 Proposed fix
         sys.exit(1)
 

 if __name__ == "__main__":
     main()
docs/modern-io.md (1)

7-7: Fix trailing space and capitalization.

Line 7 has a trailing space and uses "Quantum Espresso" instead of the official "Quantum ESPRESSO" capitalization.

🔎 Proposed fix
-Standard DFT workflows (VASP, Quantum Espresso) use "Supercell" geometries (3D periodic boxes). CLEED requires a specific "Layered" definition (Semi-infinite Bulk + Finite Overlayer). 
+Standard DFT workflows (VASP, Quantum ESPRESSO) use "Supercell" geometries (3D periodic boxes). CLEED requires a specific "Layered" definition (Semi-infinite Bulk + Finite Overlayer).
📜 Review details

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 79aa2d2 and a417fae.

📒 Files selected for processing (4)
  • docs/modern-io.md (1 hunks)
  • tools/cleed_io/cli.py (1 hunks)
  • tools/cleed_io/reader.py (1 hunks)
  • tools/tests/test_io.py (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-12-20T09:06:56.375Z
Learnt from: CR
Repo: Liam-Deacon/CLEED PR: 0
File: doc/AGENTS.md:0-0
Timestamp: 2025-12-20T09:06:56.375Z
Learning: Applies to doc/CLEED_Manual.pdf : Keep `doc/CLEED_Manual.pdf` tracked in git as the canonical reference manual for the original CLEED program suite

Applied to files:

  • docs/modern-io.md
🧬 Code graph analysis (2)
tools/cleed_io/cli.py (3)
tools/cleed_io/slicer.py (1)
  • slice_slab (75-134)
tools/cleed_io/writer.py (2)
  • write_bul (13-33)
  • write_inp (35-56)
tools/cleed_io/reader.py (1)
  • read_cleed_files (49-114)
tools/tests/test_io.py (3)
tools/cleed_io/slicer.py (1)
  • slice_slab (75-134)
tools/cleed_io/reader.py (1)
  • read_cleed_files (49-114)
tools/cleed_io/writer.py (2)
  • write_bul (13-33)
  • write_inp (35-56)
🪛 GitHub Check: Codacy Static Code Analysis
tools/cleed_io/cli.py

[warning] 3-3: tools/cleed_io/cli.py#L3
'os' imported but unused (F401)


[warning] 3-3: tools/cleed_io/cli.py#L3
Unused import os


[notice] 64-64: tools/cleed_io/cli.py#L64
expected 2 blank lines after class or function definition, found 1 (E305)

docs/modern-io.md

[notice] 22-22: docs/modern-io.md#L22
Expected: 1; Actual: 3


[notice] 22-22: docs/modern-io.md#L22
Lists should be surrounded by blank lines


[notice] 23-23: docs/modern-io.md#L23
Expected: 1; Actual: 3


[notice] 24-24: docs/modern-io.md#L24
Expected: 1; Actual: 3


[notice] 25-25: docs/modern-io.md#L25
Expected: 1; Actual: 3


[notice] 26-26: docs/modern-io.md#L26
Expected: 1; Actual: 3


[notice] 27-27: docs/modern-io.md#L27
Expected: 1; Actual: 3


[notice] 40-40: docs/modern-io.md#L40
Expected: 1; Actual: 3


[notice] 40-40: docs/modern-io.md#L40
Lists should be surrounded by blank lines


[notice] 41-41: docs/modern-io.md#L41
Expected: 1; Actual: 3


[notice] 44-44: docs/modern-io.md#L44
Expected: 1; Actual: 3


[notice] 44-44: docs/modern-io.md#L44
Lists should be surrounded by blank lines


[notice] 58-58: docs/modern-io.md#L58
Expected: 1; Actual: 2


[notice] 59-59: docs/modern-io.md#L59
Expected: 1; Actual: 2


[notice] 60-60: docs/modern-io.md#L60
Expected: 1; Actual: 2


[notice] 61-61: docs/modern-io.md#L61
Expected: 1; Actual: 3


[notice] 62-62: docs/modern-io.md#L62
Expected: 1; Actual: 3


[notice] 67-67: docs/modern-io.md#L67
Expected: 1; Actual: 3


[notice] 67-67: docs/modern-io.md#L67
Lists should be surrounded by blank lines


[notice] 68-68: docs/modern-io.md#L68
Expected: 1; Actual: 3

tools/cleed_io/reader.py

[warning] 49-49: tools/cleed_io/reader.py#L49
Method read_cleed_files has a cyclomatic complexity of 18 (limit is 8)

🪛 markdownlint-cli2 (0.18.1)
docs/modern-io.md

7-7: Trailing spaces
Expected: 0 or 2; Actual: 1

(MD009, no-trailing-spaces)


68-68: Files should end with a single newline character

(MD047, single-trailing-newline)

🪛 Ruff (0.14.8)
tools/cleed_io/cli.py

60-60: Do not catch blind exception: Exception

(BLE001)

tools/tests/test_io.py

35-35: Use a regular assert instead of unittest-style assertEqual

Replace assertEqual(...) with assert ...

(PT009)


36-36: Use a regular assert instead of unittest-style assertEqual

Replace assertEqual(...) with assert ...

(PT009)


37-37: Use a regular assert instead of unittest-style assertEqual

Replace assertEqual(...) with assert ...

(PT009)


38-38: Use a regular assert instead of unittest-style assertEqual

Replace assertEqual(...) with assert ...

(PT009)


51-51: Use a regular assert instead of unittest-style assertEqual

Replace assertEqual(...) with assert ...

(PT009)


56-56: Use a regular assert instead of unittest-style assertEqual

Replace assertEqual(...) with assert ...

(PT009)

tools/cleed_io/reader.py

66-66: Unnecessary mode argument

Remove mode argument

(UP015)


69-69: Call startswith once with a tuple

Merge into a single startswith call

(PIE810)


88-88: Unnecessary mode argument

Remove mode argument

(UP015)


91-91: Call startswith once with a tuple

Merge into a single startswith call

(PIE810)


101-101: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Codacy Static Code Analysis
  • GitHub Check: windows-latest
🔇 Additional comments (9)
tools/cleed_io/reader.py (3)

4-16: LGTM: Simple and correct vector parsing.

The function correctly extracts the first three numeric values from a line, ignoring comments and non-numeric tokens.


26-32: Fix the coordinate parsing loop logic.

Line 32 increments idx unconditionally, even when float() raises ValueError. This could skip over valid coordinate values if a non-numeric token appears between coordinates.

🔎 Proposed fix
     coords = []
     idx = 2
     while len(coords) < 3 and idx < len(parts):
         try:
             coords.append(float(parts[idx]))
+            idx += 1
         except ValueError:
-            pass # Should not happen for coords
-        idx += 1
+            idx += 1  # Skip non-numeric token

Likely an incorrect or invalid review comment.


34-46: The symbol extraction logic is correct for all observed CLEED label formats.

The code extracts all contiguous alphabetic characters from labels like "Ni_BVH" → "Ni" and "O" → "O", which works correctly across all example CLEED files (Ca, Au, Cl, Ag, Cu, Ni, O, C). The fallback to "X" handles edge cases, though it's never triggered in practice. No CLEED files use numeric-prefixed labels like "123Ni", making the stated concern about such formats hypothetical. While optional validation using ASE's ase.data.chemical_symbols is feasible (ASE is already a project dependency), the current heuristic is sufficient and robust for real-world CLEED data.

Likely an incorrect or invalid review comment.

tools/tests/test_io.py (4)

11-18: LGTM: Clean test setup.

The test fixture creates a representative Ni(111) slab with an O adsorbate, providing good coverage for the slicing and I/O operations.


20-24: LGTM: Proper test cleanup.

Teardown correctly removes generated test files.


26-38: LGTM: Well-structured slicing test.

The test correctly validates atom counts and element symbols for the sliced bulk and overlayer components.


40-64: LGTM: Comprehensive round-trip validation.

The test correctly validates that the write-read cycle preserves atom count, composition, and in-plane lattice vectors. The comment on line 59 correctly notes that a3 may differ due to the slicer's heuristic bulk cell reconstruction.

Note: The test validates composition but not atomic position preservation. Consider adding position validation in a future enhancement:

# After reconstruction, validate positions are preserved
np.testing.assert_array_almost_equal(
    sorted(self.slab.positions[:, :2]),  # xy coordinates
    sorted(reconstructed.positions[:, :2]),
    decimal=3
)
tools/cleed_io/cli.py (2)

9-26: LGTM: Well-structured argument parser.

The mutually exclusive group for mode selection is appropriate, and argument definitions are clear with helpful descriptions.


28-62: LGTM: Clean conversion logic with appropriate error handling.

The to-cleed and from-cleed workflows are well-structured with proper validation and user feedback. The broad exception catch on line 60 is acceptable for a CLI tool to provide user-friendly error messages.

Comment on lines +1 to +68
# Modern Input/Output for CLEED

This guide explains how to use the `cleed-io` tools to convert modern crystallography files (POSCAR, CIF, XYZ) into CLEED-compatible `.bul` and `.inp` files, and vice-versa.

## Motivation

Standard DFT workflows (VASP, Quantum Espresso) use "Supercell" geometries (3D periodic boxes). CLEED requires a specific "Layered" definition (Semi-infinite Bulk + Finite Overlayer).

The `cleed-convert` tool bridges this gap by automatically slicing a 3D slab into the required LEED components.

## Installation

The tools require Python 3.8+ and the Atomic Simulation Environment (ASE).

```bash
pip install tools/
```

## Supported Formats

Via the **ASE** backend, `cleed-convert` supports reading and writing over 30 formats, including:
* **VASP:** `POSCAR`, `CONTCAR`, `XDATCAR`
* **Crystallography:** `.cif`
* **XYZ:** `.xyz`, `.extxyz`
* **Quantum Espresso:** `.in`, `.out`
* **Gaussian:** `.log`
* **FHI-aims:** `.in`

## Usage

### 1. Import: POSCAR $\to$ CLEED

Convert a VASP POSCAR to CLEED files:

```bash
cleed-convert --to-cleed --input POSCAR --output my_surface
```

This generates:
* `my_surface.bul`: Contains the bulk lattice vectors and atoms.
* `my_surface.inp`: Contains the overlayer atoms.

**Options:**
* `--bulk-layers N`: Specify how many bottom layers constitute the bulk (default: 2).

### 2. Export: CLEED $\to$ POSCAR

Convert CLEED simulation files back to a format suitable for visualization (VESTA, Ovito):

```bash
cleed-convert --from-cleed --input my_surface.inp --bulk my_surface.bul --output RECONSTRUCTED_POSCAR --format vasp
```

This reconstructs the full 3D slab geometry.

## How it Works

1. **Alignment:** The tool rotates the input structure so the lattice vectors $\vec{a}$ and $\vec{b}$ lie in the $xy$-plane ($z=0$), matching LEED conventions.
2. **Slicing:** Atoms are clustered into layers based on their $z$-coordinate.
3. **Separation:**
* The bottom $N$ layers are extracted to form the `.bul` file.
* All remaining layers (surface + adsorbate) are written to the `.inp` file.

## Integration with ViPErLEED

This tool supports the conventions used by ViPErLEED:
* Slab is asymmetric (Bulk at bottom).
* Vacuum is along $+z$. No newline at end of file
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

LGTM: Comprehensive and well-structured documentation.

The documentation clearly explains the motivation, installation, usage, and technical details of the CLEED I/O bridge. The examples are practical and the integration notes with ViPErLEED conventions are helpful.

One minor improvement: add a final newline at the end of the file (line 68) to comply with POSIX standards:

 *   Slab is asymmetric (Bulk at bottom).
 *   Vacuum is along $+z$.
+
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Modern Input/Output for CLEED
This guide explains how to use the `cleed-io` tools to convert modern crystallography files (POSCAR, CIF, XYZ) into CLEED-compatible `.bul` and `.inp` files, and vice-versa.
## Motivation
Standard DFT workflows (VASP, Quantum Espresso) use "Supercell" geometries (3D periodic boxes). CLEED requires a specific "Layered" definition (Semi-infinite Bulk + Finite Overlayer).
The `cleed-convert` tool bridges this gap by automatically slicing a 3D slab into the required LEED components.
## Installation
The tools require Python 3.8+ and the Atomic Simulation Environment (ASE).
```bash
pip install tools/
```
## Supported Formats
Via the **ASE** backend, `cleed-convert` supports reading and writing over 30 formats, including:
* **VASP:** `POSCAR`, `CONTCAR`, `XDATCAR`
* **Crystallography:** `.cif`
* **XYZ:** `.xyz`, `.extxyz`
* **Quantum Espresso:** `.in`, `.out`
* **Gaussian:** `.log`
* **FHI-aims:** `.in`
## Usage
### 1. Import: POSCAR $\to$ CLEED
Convert a VASP POSCAR to CLEED files:
```bash
cleed-convert --to-cleed --input POSCAR --output my_surface
```
This generates:
* `my_surface.bul`: Contains the bulk lattice vectors and atoms.
* `my_surface.inp`: Contains the overlayer atoms.
**Options:**
* `--bulk-layers N`: Specify how many bottom layers constitute the bulk (default: 2).
### 2. Export: CLEED $\to$ POSCAR
Convert CLEED simulation files back to a format suitable for visualization (VESTA, Ovito):
```bash
cleed-convert --from-cleed --input my_surface.inp --bulk my_surface.bul --output RECONSTRUCTED_POSCAR --format vasp
```
This reconstructs the full 3D slab geometry.
## How it Works
1. **Alignment:** The tool rotates the input structure so the lattice vectors $\vec{a}$ and $\vec{b}$ lie in the $xy$-plane ($z=0$), matching LEED conventions.
2. **Slicing:** Atoms are clustered into layers based on their $z$-coordinate.
3. **Separation:**
* The bottom $N$ layers are extracted to form the `.bul` file.
* All remaining layers (surface + adsorbate) are written to the `.inp` file.
## Integration with ViPErLEED
This tool supports the conventions used by ViPErLEED:
* Slab is asymmetric (Bulk at bottom).
* Vacuum is along $+z$.
# Modern Input/Output for CLEED
This guide explains how to use the `cleed-io` tools to convert modern crystallography files (POSCAR, CIF, XYZ) into CLEED-compatible `.bul` and `.inp` files, and vice-versa.
## Motivation
Standard DFT workflows (VASP, Quantum Espresso) use "Supercell" geometries (3D periodic boxes). CLEED requires a specific "Layered" definition (Semi-infinite Bulk + Finite Overlayer).
The `cleed-convert` tool bridges this gap by automatically slicing a 3D slab into the required LEED components.
## Installation
The tools require Python 3.8+ and the Atomic Simulation Environment (ASE).
🧰 Tools
🪛 GitHub Check: Codacy Static Code Analysis

[notice] 22-22: docs/modern-io.md#L22
Expected: 1; Actual: 3


[notice] 22-22: docs/modern-io.md#L22
Lists should be surrounded by blank lines


[notice] 23-23: docs/modern-io.md#L23
Expected: 1; Actual: 3


[notice] 24-24: docs/modern-io.md#L24
Expected: 1; Actual: 3


[notice] 25-25: docs/modern-io.md#L25
Expected: 1; Actual: 3


[notice] 26-26: docs/modern-io.md#L26
Expected: 1; Actual: 3


[notice] 27-27: docs/modern-io.md#L27
Expected: 1; Actual: 3


[notice] 40-40: docs/modern-io.md#L40
Expected: 1; Actual: 3


[notice] 40-40: docs/modern-io.md#L40
Lists should be surrounded by blank lines


[notice] 41-41: docs/modern-io.md#L41
Expected: 1; Actual: 3


[notice] 44-44: docs/modern-io.md#L44
Expected: 1; Actual: 3


[notice] 44-44: docs/modern-io.md#L44
Lists should be surrounded by blank lines


[notice] 58-58: docs/modern-io.md#L58
Expected: 1; Actual: 2


[notice] 59-59: docs/modern-io.md#L59
Expected: 1; Actual: 2


[notice] 60-60: docs/modern-io.md#L60
Expected: 1; Actual: 2


[notice] 61-61: docs/modern-io.md#L61
Expected: 1; Actual: 3


[notice] 62-62: docs/modern-io.md#L62
Expected: 1; Actual: 3


[notice] 67-67: docs/modern-io.md#L67
Expected: 1; Actual: 3


[notice] 67-67: docs/modern-io.md#L67
Lists should be surrounded by blank lines


[notice] 68-68: docs/modern-io.md#L68
Expected: 1; Actual: 3

🪛 markdownlint-cli2 (0.18.1)

7-7: Trailing spaces
Expected: 0 or 2; Actual: 1

(MD009, no-trailing-spaces)


68-68: Files should end with a single newline character

(MD047, single-trailing-newline)

🤖 Prompt for AI Agents
In docs/modern-io.md around lines 1 to 68 the file is missing a trailing newline
(POSIX requires files end with a newline); fix by editing the file and adding a
single newline character at the end of the file so the last line ends with a LF.

Comment on lines +49 to +114
def read_cleed_files(bul_file, inp_file):
"""
Reads CLEED .bul and .inp files and reconstructs the full unit cell.

Args:
bul_file (str): Path to .bul file.
inp_file (str): Path to .inp file.

Returns:
Atoms: ASE Atoms object containing Bulk + Overlayer.
"""

# 1. Parse Bulk File
a1 = a2 = a3 = None
bulk_atoms = []
bulk_symbols = []

with open(bul_file, 'r') as f:
for line in f:
line = line.strip()
if not line or line.startswith('#') or line.startswith('c:'):
continue

if line.lower().startswith('a1:'):
a1 = parse_vector(line)
elif line.lower().startswith('a2:'):
a2 = parse_vector(line)
elif line.lower().startswith('a3:'):
a3 = parse_vector(line)
elif line.lower().startswith('pb:'):
sym, pos = parse_atom(line)
bulk_symbols.append(sym)
bulk_atoms.append(pos)

# 2. Parse Input File (Overlayer)
over_atoms = []
over_symbols = []
# m1, m2 could be parsed here for superstructures

with open(inp_file, 'r') as f:
for line in f:
line = line.strip()
if not line or line.startswith('#') or line.startswith('c:'):
continue

if line.lower().startswith('po:'):
sym, pos = parse_atom(line)
over_symbols.append(sym)
over_atoms.append(pos)

# 3. Construct Unit Cell
if a1 is None or a2 is None or a3 is None:
raise ValueError("Lattice vectors (a1, a2, a3) not found in .bul file")

cell = np.array([a1, a2, a3])

# Combine atoms
all_symbols = bulk_symbols + over_symbols
all_positions = bulk_atoms + over_atoms

if not all_positions:
# Create empty cell if no atoms
return Atoms(cell=cell, pbc=True)

atoms = Atoms(symbols=all_symbols, positions=all_positions, cell=cell, pbc=True)
return atoms
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Consider refactoring to reduce cyclomatic complexity.

The function has a cyclomatic complexity of 18 (threshold is typically 8-10), as it handles file parsing, lattice vector extraction, atom parsing, and Atoms construction in one function.

Consider splitting into helper functions:

  • _parse_bul_file(bul_file) → returns (a1, a2, a3, bulk_symbols, bulk_atoms)
  • _parse_inp_file(inp_file) → returns (over_symbols, over_atoms)
  • read_cleed_files orchestrates and constructs the Atoms object

This would improve testability and maintainability.

🧰 Tools
🪛 GitHub Check: Codacy Static Code Analysis

[warning] 49-49: tools/cleed_io/reader.py#L49
Method read_cleed_files has a cyclomatic complexity of 18 (limit is 8)

🪛 Ruff (0.14.8)

66-66: Unnecessary mode argument

Remove mode argument

(UP015)


69-69: Call startswith once with a tuple

Merge into a single startswith call

(PIE810)


88-88: Unnecessary mode argument

Remove mode argument

(UP015)


91-91: Call startswith once with a tuple

Merge into a single startswith call

(PIE810)


101-101: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents
In tools/cleed_io/reader.py around lines 49-114 the read_cleed_files function is
too large and has high cyclomatic complexity; refactor by extracting two
helpers: _parse_bul_file(bul_file) that opens bul_file, extracts and returns
(a1, a2, a3, bulk_symbols, bulk_atoms) preserving current parsing logic and
error raise if lattice vectors missing, and _parse_inp_file(inp_file) that opens
inp_file and returns (over_symbols, over_atoms) using the same parsing rules;
then simplify read_cleed_files to call these helpers, build cell =
np.array([a1,a2,a3]), combine symbols/positions, and return Atoms exactly as
before (including the empty-cell branch), ensuring no change in behavior or
exceptions.

bulk_atoms = []
bulk_symbols = []

with open(bul_file, 'r') as f:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Remove unnecessary mode='r' argument.

Python's open() defaults to read mode, so the explicit mode='r' is redundant.

🔎 Proposed fix
-    with open(bul_file, 'r') as f:
+    with open(bul_file) as f:
         for line in f:
-    with open(inp_file, 'r') as f:
+    with open(inp_file) as f:
         for line in f:

Also applies to: 88-88

🧰 Tools
🪛 Ruff (0.14.8)

66-66: Unnecessary mode argument

Remove mode argument

(UP015)

🤖 Prompt for AI Agents
In tools/cleed_io/reader.py around lines 66 and 88, remove the redundant
explicit mode='r' argument from the open(...) calls (open(bul_file, 'r') ->
open(bul_file)) since open defaults to read mode; update both occurrences to
call open with just the file path and keep the existing context manager and
handling unchanged.

with open(bul_file, 'r') as f:
for line in f:
line = line.strip()
if not line or line.startswith('#') or line.startswith('c:'):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Optimize startswith checks with tuple.

Multiple startswith calls can be merged into a single call with a tuple for better performance and readability.

🔎 Proposed fix
-            if not line or line.startswith('#') or line.startswith('c:'):
+            if not line or line.startswith(('#', 'c:')):
                 continue

Apply the same pattern on line 91.

Also applies to: 91-91

🧰 Tools
🪛 Ruff (0.14.8)

69-69: Call startswith once with a tuple

Merge into a single startswith call

(PIE810)

🤖 Prompt for AI Agents
In tools/cleed_io/reader.py around lines 69 and 91, the code uses multiple
sequential startswith checks (e.g., line.startswith('#') or
line.startswith('c:')), which should be consolidated into a single startswith
call with a tuple for performance and readability; update both occurrences to
use line.startswith(('#', 'c:')) (and similarly include any other prefixes used
on line 91) so the condition becomes one startswith call with a tuple of
prefixes.

@sonarqubecloud
Copy link

sonarqubecloud bot commented Jan 9, 2026

@coderabbitai coderabbitai bot added documentation Improvements or additions to documentation enhancement New feature or request labels Jan 9, 2026
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

🤖 Fix all issues with AI agents
In @tools/cleed_io/cli.py:
- Around line 63-64: Add two blank lines before the module-level execution block
so PEP 8 is satisfied: ensure there are exactly two blank lines separating the
preceding function definitions and the "if __name__ == '__main__':" block that
calls main(), i.e., insert a blank line before and another blank line above the
"if __name__ == '__main__':" line that invokes main().
- Line 9: The ArgumentParser description is too narrow; update the
argparse.ArgumentParser(...) call where parser is created to reflect support for
"POSCAR/CIF/XYZ" and that many other formats are supported via ASE (e.g., change
description to something like "Convert between structure files (POSCAR, CIF, XYZ
and other ASE-supported formats) and CLEED (.bul/.inp)"); modify the description
string passed to parser to mention XYZ and ASE-supported formats so the help
text matches the PR scope.
- Around line 28-46: Before calling slice_slab in the args.to_cleed branch,
validate the input structure returned by read(...) to ensure it is a slab:
verify the cell/periodicity indicates periodicity in at least two directions
(e.g., cell vectors / periodic flags), compute atom z-range (max z - min z) and
compare to the cell z length to confirm a vacuum region (cell_z_length
significantly larger than atom_z_range), and if these checks fail call
parser.error or print a clear error and exit; update the to-cleed flow around
args.to_cleed/read/slice_slab/write_bul/write_inp to perform these checks and
emit a concise error like "input does not appear to be a slab: requires >=2
periodic directions and vacuum along z" before proceeding.

In @tools/cleed_io/slicer.py:
- Around line 119-129: The bulk unit cell a3 is being set incorrectly by
multiplying dz by len(layers); instead compute the inter-layer spacing between
the first two detected layers (dz = z2 - z1) and set the bulk cell's third
lattice vector to represent a single bulk repeat along z (i.e., use [0, 0, dz]
rather than [0, 0, dz * len(layers)]). Update the call to bulk_atoms.set_cell to
use cell[0], cell[1], and the corrected a3, and if necessary add a parameter or
fallback to allow callers to override the bulk repeat vector for multi-layer
unit cells; reference aligned.get_cell(), aligned.positions, layers, dz, and
bulk_atoms.set_cell when locating and changing the code.
- Around line 45-70: The clustering compares each atom z to the first atom of
the current layer (current_z), which fails when z drifts; inside cluster_layers
update the layer reference z after adding an atom (either set current_z to the
newly added atom's z or recompute the layer mean z) so subsequent comparisons
use the running z (use the last added z for simplicity or mean for robustness),
keeping the rest of the logic (sorted_indices, tolerance, layers, current_layer)
intact.
- Around line 4-43: The if-block in align_surface checks final_cell[2][2] < 0
but does nothing; fix by ensuring the slab's out-of-plane vector points +z by
applying a 180° rotation about an in-plane axis (e.g., use atoms.rotate(180,
'x', rotate_cell=True)) when final_cell[2][2] is negative, so the cell[2]
z-component becomes positive without mirroring the slab; reference
align_surface, final_cell[2][2], atoms.rotate, and normal to locate and
implement this change (or remove the dead if-block if you prefer no-op
behavior).

In @tools/cleed_io/writer.py:
- Line 41: The string passed to f.write uses an unnecessary f-string; remove the
f prefix so the call uses a regular string literal. Locate the write call on the
file object named f that writes "c: Generated by cleed-convert\n" and change it
from using an f-string to a plain string literal (i.e., use f.write(...) ->
f.write("c: Generated by cleed-convert\n")).
- Line 38: Remove the unused local variable assignment "cell =
bulk_atoms.get_cell()" in tools/cleed_io/writer.py (or if the cell is intended
for validation/documentation, replace the assignment with a direct use such as
asserting expected periodicity or adding a brief comment using
bulk_atoms.get_cell() result), i.e., eliminate the dead variable or consume the
result via an assertion/log to justify keeping the call.
📜 Review details

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a417fae and e174da5.

📒 Files selected for processing (3)
  • tools/cleed_io/cli.py
  • tools/cleed_io/slicer.py
  • tools/cleed_io/writer.py
🧰 Additional context used
🧬 Code graph analysis (1)
tools/cleed_io/cli.py (3)
tools/cleed_io/slicer.py (1)
  • slice_slab (72-131)
tools/cleed_io/writer.py (2)
  • write_bul (12-32)
  • write_inp (34-55)
tools/cleed_io/reader.py (1)
  • read_cleed_files (49-114)
🪛 GitHub Check: Codacy Static Code Analysis
tools/cleed_io/writer.py

[notice] 38-38: tools/cleed_io/writer.py#L38
Unused variable 'cell'


[warning] 38-38: tools/cleed_io/writer.py#L38
local variable 'cell' is assigned to but never used (F841)


[warning] 41-41: tools/cleed_io/writer.py#L41
f-string is missing placeholders (F541)

tools/cleed_io/cli.py

[notice] 63-63: tools/cleed_io/cli.py#L63
expected 2 blank lines after class or function definition, found 1 (E305)

🪛 Ruff (0.14.10)
tools/cleed_io/writer.py

38-38: Local variable cell is assigned to but never used

Remove assignment to unused variable cell

(F841)


41-41: f-string without any placeholders

Remove extraneous f prefix

(F541)

tools/cleed_io/slicer.py

93-93: Avoid specifying long messages outside the exception class

(TRY003)

tools/cleed_io/cli.py

59-59: Do not catch blind exception: Exception

(BLE001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Codacy Static Code Analysis
  • GitHub Check: windows-latest
🔇 Additional comments (2)
tools/cleed_io/writer.py (2)

12-32: LGTM - write_bul implementation is solid.

The function correctly writes the CLEED .bul format with:

  • Appropriate header and comments
  • Lattice vectors using format_vector
  • Bulk atoms using format_atom
  • Default parameters with clear comments for user adjustment

The implementation aligns well with the CLEED format requirements and provides good user guidance.


6-10: Fix potential AttributeError when phase_map is None.

Line 8 calls phase_map.get(symbol, symbol) without checking if phase_map is None, which will raise an AttributeError.

🔧 Proposed fix
 def format_atom(prefix, atom, phase_map=None, vib_default="dr3 0.025 0.025 0.025"):
     symbol = atom.symbol
-    label = phase_map.get(symbol, symbol) if phase_map else symbol
+    label = phase_map.get(symbol, symbol) if phase_map is not None else symbol
     pos = atom.position
     return f"{prefix}: {label:<6s} {pos[0]:10.4f} {pos[1]:10.4f} {pos[2]:10.4f} {vib_default}\n"

Note: The original code has the conditional in the right place, but if phase_map will be False for an empty dict {}, while if phase_map is not None correctly handles both None and empty dict cases.

Likely an incorrect or invalid review comment.

from cleed_io.reader import read_cleed_files

def main():
parser = argparse.ArgumentParser(description="Convert between structure files (POSCAR, CIF) and CLEED (.bul/.inp)")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Expand format examples in help text to match PR scope.

The description mentions only "POSCAR, CIF" but the PR objectives state support for "POSCAR/CIF/XYZ" and "many file formats through ASE". Consider updating the help text to be more inclusive.

📝 Updated description
-    parser = argparse.ArgumentParser(description="Convert between structure files (POSCAR, CIF) and CLEED (.bul/.inp)")
+    parser = argparse.ArgumentParser(description="Convert between structure files (POSCAR, CIF, XYZ, and more via ASE) and CLEED (.bul/.inp)")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
parser = argparse.ArgumentParser(description="Convert between structure files (POSCAR, CIF) and CLEED (.bul/.inp)")
parser = argparse.ArgumentParser(description="Convert between structure files (POSCAR, CIF, XYZ, and more via ASE) and CLEED (.bul/.inp)")
🤖 Prompt for AI Agents
In @tools/cleed_io/cli.py at line 9, The ArgumentParser description is too
narrow; update the argparse.ArgumentParser(...) call where parser is created to
reflect support for "POSCAR/CIF/XYZ" and that many other formats are supported
via ASE (e.g., change description to something like "Convert between structure
files (POSCAR, CIF, XYZ and other ASE-supported formats) and CLEED
(.bul/.inp)"); modify the description string passed to parser to mention XYZ and
ASE-supported formats so the help text matches the PR scope.

Comment on lines +28 to +46
if args.to_cleed:
if not args.input:
parser.error("--to-cleed requires --input")

print(f"Reading {args.input}...")
atoms = read(args.input, format=args.format)

print(f"Slicing structure ({len(atoms)} atoms)...")
bulk, over = slice_slab(atoms, bulk_layers_count=args.bulk_layers)

bul_file = f"{args.output}.bul"
inp_file = f"{args.output}.inp"

print(f"Writing {bul_file}...")
write_bul(bul_file, bulk)

print(f"Writing {inp_file}...")
write_inp(inp_file, over, bulk)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Add validation for slab structure in to-cleed mode.

The to-cleed workflow reads an arbitrary structure file but doesn't validate that it's actually a slab (i.e., has vacuum, correct periodicity). This could lead to confusing errors downstream in slice_slab or produce invalid output.

Consider adding basic validation:

  • Check that the structure has periodicity in at least 2 directions
  • Check for the presence of vacuum (large z-dimension compared to atom distribution)
  • Provide a clear error message if the structure doesn't appear to be a slab
💡 Example validation
             print(f"Reading {args.input}...")
             atoms = read(args.input, format=args.format)
+            
+            # Validate structure is a slab
+            pbc = atoms.get_pbc()
+            if not (pbc[0] and pbc[1]):
+                print("Warning: Structure may not be a proper slab (missing xy periodicity)", file=sys.stderr)
+            
+            cell = atoms.get_cell()
+            z_extent = atoms.positions[:, 2].ptp()  # peak-to-peak
+            if cell[2][2] < z_extent * 1.5:
+                print("Warning: Structure may not have sufficient vacuum", file=sys.stderr)
             
             print(f"Slicing structure ({len(atoms)} atoms)...")
             bulk, over = slice_slab(atoms, bulk_layers_count=args.bulk_layers)

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In @tools/cleed_io/cli.py around lines 28 - 46, Before calling slice_slab in the
args.to_cleed branch, validate the input structure returned by read(...) to
ensure it is a slab: verify the cell/periodicity indicates periodicity in at
least two directions (e.g., cell vectors / periodic flags), compute atom z-range
(max z - min z) and compare to the cell z length to confirm a vacuum region
(cell_z_length significantly larger than atom_z_range), and if these checks fail
call parser.error or print a clear error and exit; update the to-cleed flow
around args.to_cleed/read/slice_slab/write_bul/write_inp to perform these checks
and emit a concise error like "input does not appear to be a slab: requires >=2
periodic directions and vacuum along z" before proceeding.

Comment on lines +63 to +64
if __name__ == "__main__":
main()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Add blank line before module-level code.

PEP 8 requires 2 blank lines before module-level code following a function definition.

📝 Formatting fix
         print(f"Error: {e}", file=sys.stderr)
         sys.exit(1)
 
+
 if __name__ == "__main__":
     main()
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if __name__ == "__main__":
main()
print(f"Error: {e}", file=sys.stderr)
sys.exit(1)
if __name__ == "__main__":
main()
🧰 Tools
🪛 GitHub Check: Codacy Static Code Analysis

[notice] 63-63: tools/cleed_io/cli.py#L63
expected 2 blank lines after class or function definition, found 1 (E305)

🤖 Prompt for AI Agents
In @tools/cleed_io/cli.py around lines 63 - 64, Add two blank lines before the
module-level execution block so PEP 8 is satisfied: ensure there are exactly two
blank lines separating the preceding function definitions and the "if __name__
== '__main__':" block that calls main(), i.e., insert a blank line before and
another blank line above the "if __name__ == '__main__':" line that invokes
main().

Comment on lines +4 to +43
def align_surface(atoms: Atoms) -> Atoms:
"""
Aligns the atoms object such that the periodic surface vectors (a, b)
lie in the xy-plane (z=0) and the surface normal points along +z.

This matches the CLEED convention where z is the surface normal.
"""
# 1. Identify the non-periodic direction or the longest vector if 3D periodic
# For a standard slab, we assume c is the vacuum direction.

cell = atoms.get_cell()
# We assume standard ASE slab convention: 3rd vector is out-of-plane
# Check if we need to rotate.

# Vector a (cell[0]) should be along x
# Vector b (cell[1]) should be in xy plane
# Vector c (cell[2]) points up

# Standardize orientation: a along x
atoms.rotate(cell[0], 'x', rotate_cell=True)

# Now rotate around x to put b in xy plane
# We need the normal of (a, b) to be along z

current_cell = atoms.get_cell()
# Cross product of a and b
normal = np.cross(current_cell[0], current_cell[1])

# Align this normal to z
atoms.rotate(normal, 'z', rotate_cell=True)

# Ensure c is positive z
final_cell = atoms.get_cell()
if final_cell[2][2] < 0:
# Flip z? No, that mirrors the structure.
# Just ensure the slab is oriented correctly.
# ViPErLEED: "positive z-direction pointing away from the surface into the vacuum"
pass

return atoms
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# First, find and examine the file
fd "slicer.py" --type f

Repository: Liam-Deacon/CLEED

Length of output: 85


🏁 Script executed:

#!/bin/bash
# Read the actual file to see the current state
cat -n tools/cleed_io/slicer.py | head -50

Repository: Liam-Deacon/CLEED

Length of output: 2074


🌐 Web query:

ASE Atoms.rotate() method signature and documentation

💡 Result:

Signature:
atoms.rotate(a, v, center=(0, 0, 0), rotate_cell=False)

Description (from ASE docs/source):

  • Rotates the Atoms object either by an angle a around axis v, or rotates vector a into vector v.
  • a: angle in degrees (number) or a vector (then a is rotated into v).
  • v: axis vector (can be e.g. (0,0,1)) or axis name string ('x','-x','y', ...).
  • center: point kept fixed during rotation; a 3-tuple or one of 'COM' (center of mass), 'COP' (center of positions), 'COU' (center of cell).
  • rotate_cell: if True, the unit cell is rotated as well.

Examples:

  • atoms.rotate(90, 'z') — rotate 90° about z.
  • atoms.rotate('x', 'y') — rotate x-axis into y-axis.
  • atoms.rotate((1,0,0), (0,1,0)) — rotate vector (1,0,0) into (0,1,0).

Sources: ASE documentation and source code. [1][2]

References:
[1] ASE Atoms documentation — rotate method.
[2] ASE source implementation of Atoms.rotate.


Remove incomplete if-block or implement the orientation check logic.

Lines 37-41 contain a conditional check for negative z-component in the final cell vector, but the if-block only contains a pass statement with a comment. Either implement the necessary logic to handle cases where final_cell[2][2] < 0 (e.g., by flipping the cell orientation, rotating around the surface normal, or documenting why no action is needed) or remove the dead code entirely.

The rotation logic itself correctly uses the ASE API to align cell[0] to the x-axis and then aligns the normal vector to the z-axis, which properly positions the surface vectors in the xy-plane.

🤖 Prompt for AI Agents
In @tools/cleed_io/slicer.py around lines 4 - 43, The if-block in align_surface
checks final_cell[2][2] < 0 but does nothing; fix by ensuring the slab's
out-of-plane vector points +z by applying a 180° rotation about an in-plane axis
(e.g., use atoms.rotate(180, 'x', rotate_cell=True)) when final_cell[2][2] is
negative, so the cell[2] z-component becomes positive without mirroring the
slab; reference align_surface, final_cell[2][2], atoms.rotate, and normal to
locate and implement this change (or remove the dead if-block if you prefer
no-op behavior).

Comment on lines +45 to +70
def cluster_layers(atoms: Atoms, tolerance=0.2):
"""
Groups atoms into layers based on Z-coordinate.
Returns a list of lists of atom indices.
"""
z_coords = atoms.positions[:, 2]
sorted_indices = np.argsort(z_coords)

layers = []
if len(sorted_indices) == 0:
return layers

current_layer = [sorted_indices[0]]
current_z = z_coords[sorted_indices[0]]

for idx in sorted_indices[1:]:
z = z_coords[idx]
if abs(z - current_z) < tolerance:
current_layer.append(idx)
else:
layers.append(current_layer)
current_layer = [idx]
current_z = z

layers.append(current_layer)
return layers
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fix the layer clustering logic to handle intra-layer z-coordinate drift.

The current implementation compares each atom's z-coordinate against the first atom in the current layer (current_z), which is set only when starting a new layer (line 58, 67). This causes issues when atoms within a single layer have gradually varying z-coordinates.

Example failure case:

  • Atoms with z-coordinates: [0.0, 0.15, 0.25] and tolerance=0.2
  • First atom (0.0) starts layer, current_z=0.0
  • Second atom (0.15): |0.15 - 0.0| = 0.15 < 0.2 ✓ added to layer
  • Third atom (0.25): |0.25 - 0.0| = 0.25 > 0.2 ✗ starts new layer incorrectly

The third atom should be in the same layer since |0.25 - 0.15| = 0.1 < 0.2.

🔧 Proposed fix to track running z-coordinate

Option 1: Update current_z to the last added atom's z-coordinate:

 def cluster_layers(atoms: Atoms, tolerance=0.2):
     """
     Groups atoms into layers based on Z-coordinate.
     Returns a list of lists of atom indices.
     """
     z_coords = atoms.positions[:, 2]
     sorted_indices = np.argsort(z_coords)
     
     layers = []
     if len(sorted_indices) == 0:
         return layers
         
     current_layer = [sorted_indices[0]]
     current_z = z_coords[sorted_indices[0]]
     
     for idx in sorted_indices[1:]:
         z = z_coords[idx]
         if abs(z - current_z) < tolerance:
             current_layer.append(idx)
+            current_z = z  # Update to track drift
         else:
             layers.append(current_layer)
             current_layer = [idx]
             current_z = z
             
     layers.append(current_layer)
     return layers

Option 2: Use the mean z-coordinate of the current layer:

 def cluster_layers(atoms: Atoms, tolerance=0.2):
     """
     Groups atoms into layers based on Z-coordinate.
     Returns a list of lists of atom indices.
     """
     z_coords = atoms.positions[:, 2]
     sorted_indices = np.argsort(z_coords)
     
     layers = []
     if len(sorted_indices) == 0:
         return layers
         
     current_layer = [sorted_indices[0]]
-    current_z = z_coords[sorted_indices[0]]
     
     for idx in sorted_indices[1:]:
         z = z_coords[idx]
+        current_z = np.mean(z_coords[current_layer])
         if abs(z - current_z) < tolerance:
             current_layer.append(idx)
         else:
             layers.append(current_layer)
             current_layer = [idx]
-            current_z = z
             
     layers.append(current_layer)
     return layers

Option 1 is simpler and handles gradual drift; Option 2 is more robust for noisy data.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def cluster_layers(atoms: Atoms, tolerance=0.2):
"""
Groups atoms into layers based on Z-coordinate.
Returns a list of lists of atom indices.
"""
z_coords = atoms.positions[:, 2]
sorted_indices = np.argsort(z_coords)
layers = []
if len(sorted_indices) == 0:
return layers
current_layer = [sorted_indices[0]]
current_z = z_coords[sorted_indices[0]]
for idx in sorted_indices[1:]:
z = z_coords[idx]
if abs(z - current_z) < tolerance:
current_layer.append(idx)
else:
layers.append(current_layer)
current_layer = [idx]
current_z = z
layers.append(current_layer)
return layers
def cluster_layers(atoms: Atoms, tolerance=0.2):
"""
Groups atoms into layers based on Z-coordinate.
Returns a list of lists of atom indices.
"""
z_coords = atoms.positions[:, 2]
sorted_indices = np.argsort(z_coords)
layers = []
if len(sorted_indices) == 0:
return layers
current_layer = [sorted_indices[0]]
current_z = z_coords[sorted_indices[0]]
for idx in sorted_indices[1:]:
z = z_coords[idx]
if abs(z - current_z) < tolerance:
current_layer.append(idx)
current_z = z # Update to track drift
else:
layers.append(current_layer)
current_layer = [idx]
current_z = z
layers.append(current_layer)
return layers
🤖 Prompt for AI Agents
In @tools/cleed_io/slicer.py around lines 45 - 70, The clustering compares each
atom z to the first atom of the current layer (current_z), which fails when z
drifts; inside cluster_layers update the layer reference z after adding an atom
(either set current_z to the newly added atom's z or recompute the layer mean z)
so subsequent comparisons use the running z (use the last added z for simplicity
or mean for robustness), keeping the rest of the logic (sorted_indices,
tolerance, layers, current_layer) intact.

Comment on lines +99 to +105
bulk_indices = []
for i in range(bulk_layers_count):
bulk_indices.extend(layers[i])

overlayer_indices = []
for i in range(bulk_layers_count, len(layers)):
overlayer_indices.extend(layers[i])
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Consider using list comprehensions for cleaner code.

The index collection loops can be simplified using list comprehensions:

♻️ Optional refactor
-    bulk_indices = []
-    for i in range(bulk_layers_count):
-        bulk_indices.extend(layers[i])
-        
-    overlayer_indices = []
-    for i in range(bulk_layers_count, len(layers)):
-        overlayer_indices.extend(layers[i])
+    bulk_indices = [idx for i in range(bulk_layers_count) for idx in layers[i]]
+    overlayer_indices = [idx for i in range(bulk_layers_count, len(layers)) for idx in layers[i]]
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
bulk_indices = []
for i in range(bulk_layers_count):
bulk_indices.extend(layers[i])
overlayer_indices = []
for i in range(bulk_layers_count, len(layers)):
overlayer_indices.extend(layers[i])
bulk_indices = [idx for i in range(bulk_layers_count) for idx in layers[i]]
overlayer_indices = [idx for i in range(bulk_layers_count, len(layers)) for idx in layers[i]]

Comment on lines +119 to +129
if len(layers) >= 2:
z1 = np.mean(aligned.positions[layers[0], 2])
z2 = np.mean(aligned.positions[layers[1], 2])
dz = z2 - z1
# This assumes vertical stacking.
# Ideally, we find the translation vector between layer 0 and layer 1.

# Heuristic: Use the slab cell for a1, a2.
# Construct a3 from the layer spacing.
cell = aligned.get_cell()
bulk_atoms.set_cell([cell[0], cell[1], [0, 0, dz * len(layers)]]) # Dummy a3
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix the bulk unit cell a3 vector calculation.

The calculation of the bulk unit cell's a3 vector is incorrect:

bulk_atoms.set_cell([cell[0], cell[1], [0, 0, dz * len(layers)]]) # Dummy a3

Issues:

  1. Multiplying by len(layers) (total layers in slab) instead of the bulk periodicity makes the bulk cell artificially large
  2. For a bulk unit cell, a3 should represent the 3D periodicity of the bulk crystal, typically the inter-layer spacing: [0, 0, dz]
  3. The comment acknowledges this is "Dummy a3" and wrong, but the error will propagate to the .bul file and cause incorrect CLEED calculations

Impact: The .bul file will contain an incorrect bulk unit cell, leading to wrong LEED calculations. This is a critical correctness issue for the core functionality of this converter.

🔧 Proposed fix for bulk unit cell
     if len(layers) >= 2:
         z1 = np.mean(aligned.positions[layers[0], 2])
         z2 = np.mean(aligned.positions[layers[1], 2])
         dz = z2 - z1
-        # This assumes vertical stacking.
-        # Ideally, we find the translation vector between layer 0 and layer 1.
         
-        # Heuristic: Use the slab cell for a1, a2.
-        # Construct a3 from the layer spacing.
+        # Set a3 to the inter-layer spacing (bulk periodicity in z)
+        # For more complex bulk structures, this may need user input
         cell = aligned.get_cell()
-        bulk_atoms.set_cell([cell[0], cell[1], [0, 0, dz * len(layers)]]) # Dummy a3
+        bulk_atoms.set_cell([cell[0], cell[1], [0, 0, dz]])
+    else:
+        # Fallback: single layer, use original a3 (may be incorrect)
+        cell = aligned.get_cell()
+        bulk_atoms.set_cell([cell[0], cell[1], cell[2]])
         
     return bulk_atoms, overlayer_atoms

Note: For complex bulk structures (e.g., multi-layer bulk unit cells), the inter-layer spacing may not equal the bulk repeat vector. Consider adding a parameter for users to specify the bulk unit cell explicitly.

🤖 Prompt for AI Agents
In @tools/cleed_io/slicer.py around lines 119 - 129, The bulk unit cell a3 is
being set incorrectly by multiplying dz by len(layers); instead compute the
inter-layer spacing between the first two detected layers (dz = z2 - z1) and set
the bulk cell's third lattice vector to represent a single bulk repeat along z
(i.e., use [0, 0, dz] rather than [0, 0, dz * len(layers)]). Update the call to
bulk_atoms.set_cell to use cell[0], cell[1], and the corrected a3, and if
necessary add a parameter or fallback to allow callers to override the bulk
repeat vector for multi-layer unit cells; reference aligned.get_cell(),
aligned.positions, layers, dz, and bulk_atoms.set_cell when locating and
changing the code.

"""
Writes the CLEED .inp file.
"""
cell = bulk_atoms.get_cell() # Overlayer uses bulk 2D periodicity usually
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Remove unused variable assignment.

The variable cell is assigned but never used. Either remove the assignment or use it for validation/documentation purposes.

🔧 Proposed fixes

Option 1: Remove the unused assignment

 def write_inp(filename, overlayer_atoms: Atoms, bulk_atoms: Atoms, phase_map=None):
     """
     Writes the CLEED .inp file.
     """
-    cell = bulk_atoms.get_cell() # Overlayer uses bulk 2D periodicity usually
     
     with open(filename, 'w') as f:

Option 2: Use it for validation or documentation

 def write_inp(filename, overlayer_atoms: Atoms, bulk_atoms: Atoms, phase_map=None):
     """
     Writes the CLEED .inp file.
     """
-    cell = bulk_atoms.get_cell() # Overlayer uses bulk 2D periodicity usually
+    # Note: overlayer uses bulk 2D periodicity (a1, a2)
+    # The superstructure matrix below assumes 1x1, could be generalized
     
     with open(filename, 'w') as f:
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
cell = bulk_atoms.get_cell() # Overlayer uses bulk 2D periodicity usually
def write_inp(filename, overlayer_atoms: Atoms, bulk_atoms: Atoms, phase_map=None):
"""
Writes the CLEED .inp file.
"""
with open(filename, 'w') as f:
🧰 Tools
🪛 GitHub Check: Codacy Static Code Analysis

[notice] 38-38: tools/cleed_io/writer.py#L38
Unused variable 'cell'


[warning] 38-38: tools/cleed_io/writer.py#L38
local variable 'cell' is assigned to but never used (F841)

🪛 Ruff (0.14.10)

38-38: Local variable cell is assigned to but never used

Remove assignment to unused variable cell

(F841)

🤖 Prompt for AI Agents
In @tools/cleed_io/writer.py at line 38, Remove the unused local variable
assignment "cell = bulk_atoms.get_cell()" in tools/cleed_io/writer.py (or if the
cell is intended for validation/documentation, replace the assignment with a
direct use such as asserting expected periodicity or adding a brief comment
using bulk_atoms.get_cell() result), i.e., eliminate the dead variable or
consume the result via an assertion/log to justify keeping the call.

cell = bulk_atoms.get_cell() # Overlayer uses bulk 2D periodicity usually

with open(filename, 'w') as f:
f.write(f"c: Generated by cleed-convert\n")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Remove unnecessary f-string prefix.

The f-string at line 41 has no placeholders, so the f prefix is unnecessary.

📝 Fix
-        f.write(f"c: Generated by cleed-convert\n")
+        f.write("c: Generated by cleed-convert\n")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
f.write(f"c: Generated by cleed-convert\n")
f.write("c: Generated by cleed-convert\n")
🧰 Tools
🪛 GitHub Check: Codacy Static Code Analysis

[warning] 41-41: tools/cleed_io/writer.py#L41
f-string is missing placeholders (F541)

🪛 Ruff (0.14.10)

41-41: f-string without any placeholders

Remove extraneous f prefix

(F541)

🤖 Prompt for AI Agents
In @tools/cleed_io/writer.py at line 41, The string passed to f.write uses an
unnecessary f-string; remove the f prefix so the call uses a regular string
literal. Locate the write call on the file object named f that writes "c:
Generated by cleed-convert\n" and change it from using an f-string to a plain
string literal (i.e., use f.write(...) -> f.write("c: Generated by
cleed-convert\n")).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature Request] Modernize Input: Support for POSCAR/ASE Structures via Python Bridge

1 participant