Clarpse is a multi-language architectural code analysis library for building better software tools.
<dependency>
<groupId>io.github.hadi-technology</groupId>
<artifactId>clarpse</artifactId>
<version>8.2.0</version>
</dependency>Clarpse facilitates the development of tools that operate over the higher level, architectural details of source code, which are exposed via an easy to use, object oriented API. Checkout the power of Clarpse in striff-lib.
Clarpse is a multi-language parsing and analysis library that converts source code into a language-agnostic, object-oriented model. That model makes it easy to build tooling on top of architecture-level details like components, references, and structure without dealing with raw ASTs.
- Supports Java with a lightweight, architecture-focused parser.
- Supports TypeScript with compiler-accurate, tsconfig-aware parsing and resolution.
- Light weight
- Performant
- Easy to use
- Clean API built on top of AST
- Support for parsing comments
- Java 17
- Maven 3.x
- Node.js 18/20/22 (required for TypeScript parsing)
- TypeScript compiler:
npm install -g typescript
Build the jar:
mvn clean package assembly:single
Start the HTTP API:
java -cp target/clarpse-<version>.jar com.hadi.clarpse.server.ClarpseServer
Health check:
curl -s http://localhost:8080/health
Parse a JSON request:
curl -s -X POST http://localhost:8080/parse \
-H "Content-Type: application/json" \
-d '{"language":"java","files":[{"path":"src/Foo.java","content":"package test; class Foo { void m() {} }"}]}'Parse a zip (Java or TypeScript):
curl -s -X POST "http://localhost:8080/parse?lang=typescript" \
-H "Content-Type: application/zip" \
--data-binary @project.zipNotes:
- TypeScript parsing requires a valid
tsconfig.jsonin the project input. - Environment variables:
CLARPSE_PORT,CLARPSE_MAX_BYTES,CLARPSE_PARALLELISM.
Build and run the container (no local jar required):
docker build -t clarpse-api .
docker run -p 8080:8080 clarpse-apiThen call the API the same way as the local server:
curl -s -X POST http://localhost:8080/parse \
-H "Content-Type: application/json" \
-d '{"language":"java","files":[{"path":"src/Foo.java","content":"package test; class Foo { void m() {} }"}]}'Clarpse supports a global parallelism setting for language compilers that can parse files in parallel.
CLARPSE_PARALLELISMcontrols the max number of parser threads.- Values
1or lower force serial parsing. - If unset, Clarpse uses
min(availableProcessors, fileCount).
Example:
CLARPSE_PARALLELISM=4 mvn test
Key areas of the repository:
src/main/java/com/hadi/clarpse/compiler- Language compilers, project file handling, and orchestration.src/main/java/com/hadi/clarpse/compiler/typescript- TypeScript compiler bridge and models.src/main/java/com/hadi/clarpse/listener- Parse tree listeners that build the source model (Java).src/main/java/com/hadi/clarpse/sourcemodel- Component and package models.src/main/java/com/hadi/clarpse/reference- Component reference types.src/main/resources- Parser helpers and tool configuration (TypeScript daemon lives here).src/test/java- Unit and integration tests.src/test/resources- Test fixtures and zipped codebases used by tests.
| Term | Definition |
|---|---|
| Component | A language independent source unit of the code, typically represented by a class, method, interface, field variable, local variable, enum, etc .. |
| OOPSourceCodeModel | A representation of a codebase through a collection of Component objects. |
| Component Reference | A reference between an original component to a target component, which typically exist in the form of import statements, variable declarations, method calls, and so on. |
Build and test in three steps:
- Generate ANTLR sources:
mvn generate-resources - Run tests:
mvn test - Build the full artifact:
mvn clean package assembly:single
Run a single test class:
mvn -Dtest=com.hadi.test.java.SmokeTest test
The parsing flow is:
ProjectFiles -> ClarpseProject -> ClarpseCompiler -> Language Listener -> OOPSourceCodeModel
High level steps:
- Collect files in
ProjectFiles(directory, zip, or in-memory). ClarpseProjectselects a language compiler.- The compiler parses files and walks the parse tree.
- The language listener builds
Componentobjects and references. - The resulting
OOPSourceCodeModelis used by downstream tooling.
Core classes and where they live:
- Project entry and orchestration:
src/main/java/com/hadi/clarpse/compiler/ClarpseProject.java - Project inputs:
src/main/java/com/hadi/clarpse/compiler/ProjectFiles.java,src/main/java/com/hadi/clarpse/compiler/ProjectFile.java - Compiler selection and results:
src/main/java/com/hadi/clarpse/compiler/CompilerFactory.java,src/main/java/com/hadi/clarpse/compiler/ClarpseCompiler.java,src/main/java/com/hadi/clarpse/compiler/CompileResult.java - Language compilers:
src/main/java/com/hadi/clarpse/compiler/ClarpseJavaCompiler.java,src/main/java/com/hadi/clarpse/compiler/typescript/ClarpseTypeScriptCompiler.java - Parse listeners:
src/main/java/com/hadi/clarpse/listener/JavaTreeListener.java - Source model:
src/main/java/com/hadi/clarpse/sourcemodel/OOPSourceCodeModel.java,src/main/java/com/hadi/clarpse/sourcemodel/Component.java,src/main/java/com/hadi/clarpse/sourcemodel/Package.java - References:
src/main/java/com/hadi/clarpse/reference/ComponentReference.javaand related types insrc/main/java/com/hadi/clarpse/reference - TypeScript daemon:
src/main/resources/typescript/daemon.js
Architecture docs:
docs/typescript-architecture.md
Clarpse abstracts source code into a higher level model in a language-agnostic way. This
model focuses on the architectural properties of the original code. The code snippet below
illustrates how this model can be generated from a ProjectFiles object which represents the
source code to be analyzed.
final String code = " package com.foo; "
+ " public class SampleClass extends AbstractClass { "
+ " /** Sample Doc Comment */ "
+ " @SampleAnnotation "
+ " public void sampleMethod(String sampleMethodParam) throws AnException {"
+ " SampleClassB.fooMethod();
+ " } "
+ " } ";;
final ProjectFiles projectFiles = new ProjectFiles();
projectFiles.insertFile(new ProjectFile("SampleClass.java", code));
final ClarpseProject project = new ClarpseProject(projectFiles, Lang.JAVA);
CompileResult compileResult = project.result();
// Get the code model
OOPSourceCodeModel codeModel = compileResult.model();
// View any compile errors for any files
Collection<CompileFailure> failures = compileResult.failures();Note, the ProjectFiles object can be initialized from a local directory, a local zip file, or an
input stream to a zip file - see ProjectFilesTest.java for more information.
TypeScript usage follows the same API, but requires Node.js and a valid tsconfig.json:
final ProjectFiles projectFiles = new ProjectFiles("/path/to/typescript-project");
final ClarpseProject project = new ClarpseProject(projectFiles, Lang.TYPESCRIPT);
CompileResult compileResult = project.result();
OOPSourceCodeModel codeModel = compileResult.model();Next, the compiled
OOPSourceCodeModel is the polygot representation of our source code through a
collection of Component objects. Details about these components and the relationships
between them can be fetched in the following way:
codeModel.components().forEach(component -> {
System.out.println(component.name());
System.out.println(component.type());
System.out.println(component.comment());
System.out.println(component.modifiers());
System.out.println(component.children());
System.out.println(component.sourceFile());
...
// Check out the Component class for a full list of component attributes that can be retrieved
});We can also get specific components by their unique name:
Component mainClassComponent = codeModel.get("com.foo.java.SampleClass");
mainclassComponent.name(); // --> "SampleClass"
mainClassComponent.type(); // --> CLASS
mainClassComponent.comment(); // --> "Sample Doc Comment"
mainClassComponent.modifiers(); // --> ["public"]
mainClassComponent.children(); // --> ["foo.java.SampleClass.sampleMethod(java.lang.String)"]
mainClassComponent.sourceFile(); // --> "foo.java"
mainClassComponent.references(); // --> ["SimpleTypeReference: String", "TypeExtensionReference: com.foo.AbstractClass", "SimpleTypeReference: com.foo.SampleClassB"]
// Fetch the the inner method component
methodComponent = codeModel.get(mainClassComponent.children().get(0));
methodComponent.name(); // --> "sampleMethod"
methodComponent.type(); // --> METHOD
methodComponent.modifiers(); // --> ["public"]
methodComponent.children(); // --> ["com.foo.java.SampleClass.sampleMethod(String).sampleMethodParam"]
methodComopnent.codeFragment(); // --> "sampleMethod(String)"
methodComponent.sourceFile(); // --> "foo.java"
methodComponent.references(); // --> ["SimpleTypeReference: String"]Checklist for adding or updating a language implementation:
- Add or update the grammar in
src/main/antlr4/.... - Run
mvn generate-resourcesto regenerate parser sources. - Add a compiler in
src/main/java/com/hadi/clarpse/compiler. - Add a listener in
src/main/java/com/hadi/clarpse/listener. - Register the language and file extensions in
src/main/java/com/hadi/clarpse/compiler/Lang.java. - Add tests under
src/test/javaand fixtures undersrc/test/resources.
- Submit an issue describing your proposed change.
- Fork the repo, develop and test your code changes.
- Run
mvn testand ensure all tests pass. - If your change requires a version bump, update
pom.xmlandREADME.mdusing the x.y.z scheme:- x = main version number (breaking changes)
- y = feature number (new features, optional bug fixes)
- z = hotfix number (bug fixes only)
- Submit a pull request.