Skip to content

LSP server: semantic tokens, workspace symbols, runtime fixes#394

Merged
lagergren merged 27 commits intomasterfrom
lagergren/lsp-extend4
Feb 20, 2026
Merged

LSP server: semantic tokens, workspace symbols, runtime fixes#394
lagergren merged 27 commits intomasterfrom
lagergren/lsp-extend4

Conversation

@lagergren
Copy link
Contributor

@lagergren lagergren commented Feb 19, 2026

Summary

Extends the XTC LSP server with two major new capabilities -- semantic token highlighting and workspace-wide symbol search -- adds the scaffold for debug adapter protocol (DAP) support, and fixes
several runtime issues discovered during IntelliJ integration testing.

Semantic tokens (tree-sitter based)

The LSP server now provides rich semantic token data to editors, enabling context-aware syntax highlighting beyond static TextMate grammars. The SemanticTokenEncoder walks the tree-sitter AST
and classifies tokens into 18 types (class, interface, enum, method, property, parameter, type, decorator, namespace, etc.) with modifier bitmasks (declaration, static, readonly, abstract). This
lets editors visually distinguish e.g. a type reference from a variable, or a declaration from a usage. Opt-in via lsp.semanticTokens=true in gradle.properties.

Workspace symbol index with cross-file go-to-definition

A background indexer scans all .x files in the workspace on startup and incrementally re-indexes on file changes. The WorkspaceIndex supports 4-tier fuzzy matching (exact, prefix, CamelCase,
subsequence) for the LSP workspace/symbol request. Cross-file go-to-definition now falls back to the workspace index when the symbol isn't found in the current file, preferring type declarations
over methods/properties. The indexer uses a dedicated tree-sitter parser instance with a bounded thread pool to avoid FFM race conditions with the LSP message thread.

DAP server scaffold and IntelliJ integration

Renamed debug-adapter to dap-server for naming consistency with lsp-server. The DAP server is a stub implementation that handles the core protocol lifecycle (initialize, setBreakpoints,
launch, attach, threads, disconnect) with placeholder responses. On the IntelliJ side, XtcDebugAdapterFactory is registered via LSP4IJ's debugAdapterServer extension point and launches the DAP
server out-of-process with a provisioned Java 25 JRE.

Adapter architecture refactor

Extracted shared default implementations from the XtcCompilerAdapter interface into AbstractXtcCompilerAdapter. All 30+ LSP methods now have input-logging stubs in the base class, so concrete
adapters only override what they actually implement. New adapter methods: getSemanticTokens(), findWorkspaceSymbols(), initializeWorkspace(), didChangeWatchedFile(), closeDocument().

Runtime crash fixes

  • Incremental parsing disabled: Tree-sitter Tree.edit() was never called, so incremental reparsing used stale byte offsets causing silent corruption. Full reparse on every edit until
    Tree.edit() is properly wired.
  • EDT violation: JRE provisioner was calling a blocking check on the EDT; moved off-thread.
  • Defensive bounds checking: XtcNode.text now guards against out-of-bounds byte offsets from freed FFM memory.
  • Rename + semantic tokens crash: Fixed race where concurrent async LSP handlers could access a freed tree-sitter parse tree.

Shared IntelliJ run configurations

Five shared run configurations in .run/ are auto-discovered by IntelliJ:

  • Run Plugin in IDE / Run Plugin Tests / Build Plugin ZIP -- IntelliJ plugin development (require includeBuildLang=true)
  • xtc build / xtc run -- compile and run XTC modules directly via the Launcher class (require xdk:installDist)

See .run/README.md for prerequisites and customization.

Build and infrastructure

  • Composite build property isolation: Replaced project.findProperty() with xdkProperties for properties that must be visible across the included build boundary
  • includeBuildLang / includeBuildAttachLang flags: lang is now a separately controllable included build -- includeBuildLang controls IDE visibility and task addressability,
    includeBuildAttachLang controls whether lang lifecycle tasks wire into the root build. Both default to false so CI and other developers are unaffected.
  • Pipeline logging: All query engine methods log per-match details, all adapter methods log inputs and results, query names identified in executeQuery output
  • ASCII-only output: Replaced Unicode arrows/em-dashes with ASCII equivalents to prevent garbled log output in non-UTF-8 viewers
  • Shared PluginPaths utility: Centralized server JAR resolution (LSP + DAP) for the IntelliJ plugin
  • Repo cleanup: Removed abandoned javatools_backend/ directory and root Makefile

Test plan

  • ./gradlew :lang:lsp-server:test -PincludeBuildLang=true -- LSP server tests including semantic tokens, workspace index, rename regression tests
  • ./gradlew :lang:intellij-plugin:test -PincludeBuildLang=true -- IntelliJ plugin tests including JRE provisioner and JAR resolution
  • Manual: open .x file in IntelliJ with plugin, verify semantic highlighting, rename a symbol, confirm no crash
  • Verify tail -f ~/.xtc/logs/lsp-server.log shows query names and per-match details

… testing guide

- Add .run/ directory with shared IntelliJ run configurations:
  Run Plugin in IDE, Run Plugin Tests, Build Plugin ZIP,
  xtc build, and xtc run (targeting FizzBuzz.x as example)
- Add lang/benchmarks/ with cold vs hot cache Gradle profile
  comparison (2m40s cold → 1m32s hot at same commit)
- Add lang/intellij-plugin/TESTING.md: comprehensive reference
  for IntelliJ plugin testing infrastructure (platform test
  framework, Starter+Driver E2E, debugging, plugin verifier,
  log harvesting, CI pipeline)
- Update gradle.properties comments and enable searchable options
- Bump IntelliJ IDE from 2025.1 to 2025.3.2 (unified distribution)
- Bump IntelliJ Platform Gradle Plugin from 2.10.5 to 2.11.0
- Use intellijIdea() instead of intellijIdeaCommunity() for 2025.3+
- Remove brittle Gradle cache path introspection from build script
- Fix race condition in TreeSitterAdapter.compile() where closing the
  old tree while async LSP handlers still held node references caused
  IllegalStateException ("Already closed"), triggering LSP4IJ restart loop
- Move close() to XtcCompilerAdapter interface with default no-op
Add "Design Decision: LSP4IJ over IntelliJ Built-in LSP" section to
PLAN_IDE_INTEGRATION.md covering DAP support, LSP feature coverage gap,
and cost/benefit analysis. Simplify the race condition comment block in
XtcLspConnectionProvider to a TODO with GitHub issue link.
Rename the lang/debug-adapter/ module to lang/dap-server/ to match the
naming convention of lang/lsp-server/. Update all Gradle build files,
documentation, and log file paths.
Register XtcDebugAdapterFactory via the debugAdapterServer extension
point in plugin.xml. The factory creates XtcDebugAdapterDescriptor
which launches the DAP server out-of-process using the same JRE
provisioning infrastructure as the LSP server. This connects LSP4IJ's
DAP client to our lang/dap-server/ stub.
Extract server JAR resolution from LSP and DAP code into shared
PluginPaths object. Error messages now show actual searched paths.
Update lsp-processes.md: fix class names (LSP4IJ, not built-in LSP),
cache location (Gradle home, not IDE system path), Java version
(25, not 24), JAR location (bin/, not lib/), and file structure.
Add KDoc to XtcDebugAdapterDescriptor explaining:
- Why AtomicBoolean notification guard is not needed for DAP
  (user-initiated sessions vs LSP auto-start race condition)
- Out-of-process JBR 21 compatibility (provisioned Java 25)
- LSP vs DAP process lifecycle differences (OSProcessStreamConnectionProvider
  vs DebugAdapterDescriptor)

Update plan-dap-debugging.md with Phase 0 (IDE-side DAP wiring) marked
complete, covering XtcDebugAdapterFactory, PluginPaths, module rename,
and architecture documentation.
- Add §1 current status: all implemented, stub, and unregistered features
- Add §8 additional LSP features: document link resolution, code lens,
  on-type formatting, linked editing, pull diagnostics, type definition
- Add §11 IntelliJ (LSP4IJ): automatic feature mapping, plugin-specific
  features, LSP4IJ considerations
- Add §12 VS Code: architecture, vscode-languageclient, DAP integration,
  JRE provisioning, feature parity comparison, extension roadmap
- Add §13 Other editors: Eclipse, Neovim, Zed, Helix, Sublime, Emacs
  with config examples, shared assets, IDE-specific vs shared diagram
- Renumber all sections for new table of contents
- Update sprint plan with Sprint 6 (additional features) and expanded
  future/compiler items
- Move Semantic Tokens Tier 1 to Sprint 1 (parallel with index build,
  no index dependency, most visible user improvement)
- Move Document Link Resolution to Sprint 2 (trivial once index exists)
- Add workspace indexing progress reporting as critical UX requirement
- Add "Compiler Adapter Milestone" section explaining what the compiler
  adapter unlocks and what it looks like architecturally
- Add VS Code JRE provisioning alternatives (per-platform builds,
  shell script, TS client, user-installed) with recommendations
- Restructure Sprint 1 into parallel Track A (index) and Track B
  (semantic tokens)
Replace sparse "Other Editors" section in plan-next-steps-lsp.md with
comprehensive multi-IDE strategy including Stack Overflow 2025 and
JRebel 2025 market share data, per-editor effort estimates, protocol
support matrix, and a 4-wave rollout plan prioritizing Neovim and
Helix first (config-only, highest ROI), then Sublime/Zed, Eclipse,
and community-contributed configs.
Add adapter interface methods for every planned LSP capability from
plan-next-steps-lsp.md: declaration, typeDefinition, implementation,
type hierarchy (prepare/supertypes/subtypes), call hierarchy
(prepare/incoming/outgoing), codeLens, onTypeFormatting, and
linkedEditingRange. Each returns null/empty and logs the full input
parameters and unimplemented state.

Add corresponding LSP handler stubs in XtcLanguageServer that
delegate to the adapter, convert types, log timing, and respond
gracefully (capabilities not advertised yet — handlers exist for
code structure and trace-along debugging).

Add data classes: TypeHierarchyItem, CallHierarchyItem,
CallHierarchyIncomingCall, CallHierarchyOutgoingCall, CodeLens,
CodeLensCommand, LinkedEditingRanges.

Enrich all existing interface default stubs with full input parameter
logging (uri, line, column, range coordinates, content length, etc.)
so every adapter call is traceable in the LSP server log.

Add missing logging to: AbstractXtcCompilerAdapter.getHoverInfo,
TreeSitterAdapter.findSymbolAt/getCompletions/getSelectionRanges,
and all XtcCompilerAdapterStub methods.
Move all "not yet implemented" default method bodies from the
XtcCompilerAdapter interface into AbstractXtcCompilerAdapter.
The interface is now pure signatures; the abstract class provides
logging defaults with [displayName] prefix for all unimplemented
features. Also deduplicate formatContent (was identical in
TreeSitterAdapter and MockXtcCompilerAdapter, now shared in the
abstract class). Suppress unused warning on healthCheck JSON-RPC
endpoint and fix broken KDoc references.
Implement syntax-level semantic token classification using a single-pass
O(n) tree-sitter AST walk. Covers 18 AST contexts (class/interface/mixin/
service/const/enum declarations, methods, constructors, properties,
variables, parameters, modules, packages, annotations, type expressions,
call expressions, member expressions). Disabled by default behind
lsp.semanticTokens=false in gradle.properties; enable with
-Plsp.semanticTokens=true. Updates PLAN_TREE_SITTER.md with status,
Tier 2 roadmap, and expanded grammar field() migration analysis.
…t test

- Mark Semantic Tokens Tier 1 as complete in plan (§1, §2, §10)
- Restructure sprint plan: completed vs recommended next work
- Highlight Workspace Symbol Index as the highest-leverage next step
- Rename SemanticTokenLegend constants to idiomatic Kotlin camelCase
  (TOKEN_TYPES→tokenTypes, TOKEN_MODIFIERS→tokenModifiers, etc.)
- Replace imperative for-loop in modifierBitmask with fold
- Add SemanticTokensVsTextMateTest (10 tests) demonstrating concrete
  benefits of semantic tokens over TextMate pattern-based highlighting:
  identifier disambiguation, type category distinction, modifier
  bitmasks, annotation classification, member expression context
Implement the workspace symbol index infrastructure (Sprint 1A) and wire
it into workspace symbols and cross-file go-to-definition (Sprint 2):

- IndexedSymbol, WorkspaceIndex (4-tier fuzzy: exact/prefix/CamelCase/
  subsequence), WorkspaceIndexer (background parallel scan with
  serialized tree-sitter parsing via parseLock)
- TreeSitterAdapter: owns index/indexer, initializeWorkspace(),
  findWorkspaceSymbols(), cross-file findDefinition() fallback,
  re-indexing on compile(), didChangeWatchedFile()
- XtcLanguageServer: workspace folder extraction, health check before
  indexing, dynamic **/*.x file watcher registration, advertise
  workspaceSymbolProvider capability
- 23 new tests (17 unit + 6 integration) for index and indexer
- Update plan with Sprint 1A/2 completion status
…code

- Enable semantic tokens and treesitter adapter by default in
  gradle.properties, fix typo ("ot" -> "to")
- Fix highlights.scm.template: member_expression field name
  (property -> member), annotation name uses qualified_name
- Simplify XtcQueries by removing redundant query patterns
- Refactor XtcNode, XtcQueryEngine, XtcParser, XtcTree for
  semantic token encoder support
- Remove unused freshUri/uriCounter from SemanticTokensVsTextMateTest
- Remove blank lines in CompilationResultTest
- Add file count to tree-sitter parse test summary output
…n tests

- Fix parser thread-safety (parseLock), deadlock ordering, findReferences,
  documentSymbol caching, closeDocument wiring, scanWorkspace error handling
- Add configurable log levels via -Plog=<level>, -Dxtc.logLevel, XTC_LOG_LEVEL
- Standardize all ~250 logger calls with [Module] prefixes ([Server], [Launcher],
  [Parser], [TreeSitter], [Mock], [Compiler], [QueryEngine], [WorkspaceIndexer])
- Expand JAR resolution tests to 13 cases covering PluginPaths.resolveInBin for
  LSP/DAP JARs across sandbox, ZIP install, and marketplace deployment layouts
- WorkspaceIndexer uses dedicated parser/queryEngine (no main-thread sharing)
- Add closeDocument() and getCachedResult() to XtcCompilerAdapter interface
- Cascading logback variable resolution: system property > env var > INFO default
- Document all configurable properties in lsp-server README
- Add known issues and follow-ups to PLAN_IDE_INTEGRATION.md
- Delete benchmarks directory (profile data, not needed for PR)
- Disable dev flags in gradle.properties (includeBuildLang=false)
- Fix XtcNode.text byte-vs-char offset mismatch for non-ASCII sources
- Fix SemanticTokensVsTextMateTest native memory leak (use .use {})
- Fix SemanticTokenEncoder.nodeKey collision by including node type hash
- Fix Windows IDE path from 2025.1 to 2025.3
- Enable lsp.semanticTokens=true by default in gradle.properties
- Update README feature matrix: semantic tokens and workspace symbols now ✅
- Update PLAN_TREE_SITTER.md: mark grammar fields, workspace symbols,
  Phase 5 cross-file support as complete
- Update PLAN_IDE_INTEGRATION.md: mark semantic tokens Phase 1 complete,
  add workspace symbols and semantic tokens to feature table,
  mark follow-up issues 3-6 as fixed
project.findProperty() and providers.gradleProperty() only see an
included build's own gradle.properties, which doesn't exist for lang/.
Properties like lsp.semanticTokens, lsp.adapter, lsp.buildSearchableOptions,
and log were silently falling back to hardcoded defaults. Use xdkProperties
which resolves through XdkPropertiesService (loads from composite root's
gradle.properties at settings time). Enable includeBuildLang by default.
1. XtcParser: disable incremental parsing (always full reparse). Passing
   oldTree without Tree.edit() caused stale byte offsets in the new tree,
   leading to StringIndexOutOfBoundsException in SemanticTokenEncoder after
   renames that change document length. Full reparse is still sub-ms.

2. XtcNode.text: add defensive bounds checking for both fast and slow
   paths — returns empty string instead of crashing on stale offsets.

3. XtcLspConnectionProvider: move JRE resolution from init{} to start().
   The init block called ProjectJdkTable.getInstance() which is prohibited
   on EDT, causing "slow operations" warnings and potential freezes.

4. Add per-match location logging to XtcQueryEngine.findAllIdentifiers
   and TreeSitterAdapter.findReferences for debugging.
Add "Known Issues" section to README documenting LSP4IJ bugs (duplicate
server spawning, unclickable "Show Logs" popup, error notifications),
IntelliJ platform issues (Ultimate plugin noise, EDT enforcement, CDS
warnings), and debugging tips with log file locations and workarounds.

Add 4 regression tests for the rename-then-semantic-tokens crash:
- Rename shortens document (exact reproduction of the reported bug)
- Rename lengthens document (reverse case)
- Rapid sequential recompilations (10 iterations, same URI)
- Folding ranges after rename (verifies line/column path)
- XtcQueryEngine.executeQuery() now logs query name (e.g., 'allDeclarations')
- findAllDeclarations, findMethodDeclarations, findImports, findImportLocations
  now log per-match details (kind, name, file, line:col)
- TreeSitterAdapter: getFoldingRanges, getSemanticTokens, getCodeActions,
  getDocumentLinks now consistently log inputs and results
- Replace all Unicode arrows (U+2192) and em-dashes (U+2014) with ASCII
  '->' and '--' across all source, test, and documentation files to prevent
  garbled output in log viewers using ISO-8859-1/Latin-1 encoding
…le()

The comment said "with incremental parsing if we have an old tree" but
XtcParser.parse() always ignores the oldTree parameter and does a full
reparse. Updated comment and removed the unused isIncremental variable
and log format string that referenced it.
@lagergren lagergren requested review from cpurdy and ggleyzer February 19, 2026 14:14
…violation

- Remove redundant log prefixes (logPrefix, TAG, [ClassName]) from all LSP
  server code; SLF4J %logger{0} already provides class names
- Audit and fix stale documentation across 20+ markdown files: version
  numbers, feature matrices, phantom XtcCompilerAdapterFull references,
  default adapter mock->treesitter, Java 23->25 version refs
- Consolidate duplicate feature matrices into single source of truth in
  PLAN_IDE_INTEGRATION.md; remove 3 completed task docs
- Add onlyIf guards to Download tasks so they SKIP when files exist
- Fix EDT SlowOperations violation in XtcNewProjectWizardStep.refreshVfs()
- Add lang composite build properties documentation to CLAUDE.md
@lagergren lagergren merged commit 6c50508 into master Feb 20, 2026
4 checks passed
@lagergren lagergren deleted the lagergren/lsp-extend4 branch February 20, 2026 14:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants