Skip to content

A compiler built in Java to understand the core concepts of compiler design, including lexical analysis, parsing, syntax tree generation, and code generation.

Notifications You must be signed in to change notification settings

yonas790/java-compiler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

61 Commits
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Simple C-Style Compiler

License: Unlicensed Language: Java

A complete, teaching-oriented compiler implementation in Java that compiles C-style source code into assembly-like intermediate code. This project demonstrates all major stages of compilation: lexical analysis, parsing, semantic analysis, symbol table management, and code generation.

Disclaimer: This project is designed for educational purposes and learning compiler construction concepts.

🌟 Features

  • βœ… Complete Compilation Pipeline: From source code to generated assembly instructions
  • βœ… Lexical Analysis: Tokenizes source code with support for keywords, operators, numbers, and identifiers
  • βœ… Recursive Descent Parser: Builds Abstract Syntax Tree (AST) from token stream
  • βœ… Symbol Table Management: Hierarchical scope-based symbol tables with offset tracking
  • βœ… Type System: Support for int, float, boolean, and array types
  • βœ… Code Generation: Generates stack-based assembly-like intermediate code
  • βœ… File Output: Saves compiled code to .asm files in a compiled/ directory
  • βœ… Comprehensive Test Suite: 10 test files covering various language features

πŸ“‹ Table of Contents

πŸš€ Quick Start

Requirements

  • Java Development Kit (JDK) 11 or newer

Build

From the project root directory:

# Create output directory
mkdir -p out

# Compile all Java source files
javac -d out src/**/*.java

Run

Navigate to the out directory and run the compiler:

cd out

# Compile a test file
java javaCompiler.JavaCompiler ../src/test/09_mixed_operations.tst

This will:

  1. Parse the source code and build the AST
  2. Display the symbol table tree
  3. Generate and display intermediate code
  4. Save the compiled code to compiled/09_mixed_operations.asm

πŸ—οΈ Compiler Architecture

This compiler implements a complete compilation pipeline with the following stages:

1. Lexical Analysis (Lexer)

Location: src/lexer/

The lexer tokenizes source code into a stream of tokens. It recognizes:

  • Keywords: if, else, while, do, break, true, false
  • Data Types: int, float, boolean
  • Operators: +, -, *, /, <, >, <=, >=, ==, !=, &&, ||, !
  • Literals: Integer numbers, floating-point numbers, boolean values
  • Identifiers: Variable names
  • Delimiters: {, }, (, ), [, ], ;, =

Key Classes:

  • Lexer.java - Main tokenizer
  • Token.java - Base token class
  • Keyword.java - Keyword tokens
  • Num.java - Integer literals
  • Real.java - Float literals
  • Tag.java - Token type constants

2. Syntax Analysis (Parser)

Location: src/parser/

Uses recursive descent parsing to build an Abstract Syntax Tree (AST). The parser implements the following grammar:

PROG     β†’ BLOCK
BLOCK    β†’ { DECLS STMTS }
DECLS    β†’ DECLS DECL | Ξ΅
DECL     β†’ TYPE id ;
TYPE     β†’ TYPE [num] | int | float | boolean
STMTS    β†’ STMTS STMT | Ξ΅
STMT     β†’ id = BOOL ; | if (BOOL) STMT | if (BOOL) STMT else STMT 
         | while (BOOL) STMT | do STMT while (BOOL) ;
         | break ; | BLOCK
BOOL     β†’ BOOL || JOIN | JOIN
JOIN     β†’ JOIN && EQUALITY | EQUALITY
EQUALITY β†’ EQUALITY == REL | EQUALITY != REL | REL
REL      β†’ EXPR < EXPR | EXPR <= EXPR | EXPR >= EXPR | EXPR > EXPR | EXPR
EXPR     β†’ EXPR + TERM | EXPR - TERM | TERM
TERM     β†’ TERM * UNARY | TERM / UNARY | UNARY
UNARY    β†’ !UNARY | -UNARY | FACTOR
FACTOR   β†’ (BOOL) | id | id[BOOL] | num | real | true | false

Key Features:

  • Handles operator precedence correctly
  • Supports nested scopes with proper symbol table management
  • Error reporting with line numbers

3. Intermediate Representation (AST)

Location: src/inter/

The AST is composed of two main categories:

Expressions (inter/expr/)

  • Constant - Literal values (numbers, booleans)
  • Id - Variable identifiers
  • Arith - Arithmetic operations (+, -, *, /)
  • Rel - Relational operations (<, >, <=, >=, ==, !=, &&, ||)
  • Not - Logical negation
  • Unary - Unary minus
  • Access - Array element access

Statements (inter/stmt/)

  • Set - Variable assignment
  • SetArrayElem - Array element assignment
  • If - Conditional statement
  • IfElse - Conditional with else branch
  • While - While loop
  • DoWhile - Do-while loop
  • Break - Break statement
  • StmtSeq - Statement sequence

4. Symbol Table & Type System

Location: src/symbol/

Manages variable declarations and type information with:

  • Hierarchical Scopes: Nested block scopes with parent-child relationships
  • Type Checking: Support for int, float, boolean, and array types
  • Offset Tracking: Calculates memory offsets for variables
  • Scope Resolution: Proper variable lookup through scope chain

Key Classes:

  • Env.java - Symbol table environment (one per scope)
  • Type.java - Type representations
  • Array.java - Array type with dimensions
  • SymbolTablePrinter.java - Pretty-prints the symbol table tree

5. Code Generation

Location: src/codegen/

Generates stack-based assembly-like intermediate code with instructions:

Instructions:

  • PUSH <value> - Push value onto stack
  • LOAD <var> - Load variable value onto stack
  • STORE <var> - Store top of stack to variable
  • ADD, SUB, MUL, DIV - Arithmetic operations
  • CMP <op> - Compare operation
  • NOT - Logical negation
  • JZ <label> - Jump if zero
  • JNZ <label> - Jump if not zero
  • JMP <label> - Unconditional jump
  • LABEL <name> - Label definition
  • LOADARR <array> - Load array element
  • STOREARR <array> - Store to array element

Key Classes:

  • CodeGen.java - Code generator (visitor pattern)
  • Instruction.java - Instruction representation

πŸ’» Language Features

Supported Features

Data Types

int i;
float x;
boolean done;
int arr[10];

Arithmetic Operations

x = a + b * c;
y = (a - b) / 2.0;
z = -x;

Boolean Expressions

if (x > 5 && y < 10) { ... }
while (i < 100 || !done) { ... }
if (a == b) { ... }

Control Flow

// If-else
if (condition) {
    statement;
} else {
    statement;
}

// While loop
while (condition) {
    statement;
}

// Do-while loop
do {
    statement;
} while (condition);

// Break
while (true) {
    if (condition) break;
}

Nested Scopes

{
    int x;
    x = 5;
    {
        float y;
        y = 2.5;
        {
            boolean flag;
            flag = true;
        }
    }
}

πŸ“ Project Structure

C-Compiler/
β”œβ”€β”€ README.md
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ codegen/              # Code generation
β”‚   β”‚   β”œβ”€β”€ CodeGen.java      # Main code generator
β”‚   β”‚   └── Instruction.java  # Instruction representation
β”‚   β”œβ”€β”€ inter/                # Intermediate representation (AST)
β”‚   β”‚   β”œβ”€β”€ Node.java         # Base AST node
β”‚   β”‚   β”œβ”€β”€ expr/             # Expression nodes
β”‚   β”‚   β”‚   β”œβ”€β”€ Constant.java
β”‚   β”‚   β”‚   β”œβ”€β”€ Id.java
β”‚   β”‚   β”‚   β”œβ”€β”€ Access.java
β”‚   β”‚   β”‚   β”œβ”€β”€ Expr.java
β”‚   β”‚   β”‚   β”œβ”€β”€ Rel.java
β”‚   β”‚   β”‚   β”œβ”€β”€ arith/
β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ Arith.java
β”‚   β”‚   β”‚   β”‚   └── Unary.java
β”‚   β”‚   β”‚   └── logic/
β”‚   β”‚   β”‚       └── Not.java
β”‚   β”‚   └── stmt/             # Statement nodes
β”‚   β”‚       β”œβ”€β”€ Stmt.java
β”‚   β”‚       β”œβ”€β”€ Set.java
β”‚   β”‚       β”œβ”€β”€ SetArrayElem.java
β”‚   β”‚       β”œβ”€β”€ If.java
β”‚   β”‚       β”œβ”€β”€ IfElse.java
β”‚   β”‚       β”œβ”€β”€ While.java
β”‚   β”‚       β”œβ”€β”€ DoWhile.java
β”‚   β”‚       β”œβ”€β”€ Break.java
β”‚   β”‚       └── StmtSeq.java
β”‚   β”œβ”€β”€ parser/               # Parser
β”‚   β”‚   └── Parser.java       # Recursive descent parser
β”‚   β”œβ”€β”€ lexer/                # Lexical analyzer
β”‚   β”‚   β”œβ”€β”€ Lexer.java        # Tokenizer
β”‚   β”‚   β”œβ”€β”€ Token.java        # Base token
β”‚   β”‚   β”œβ”€β”€ Keyword.java      # Keywords
β”‚   β”‚   β”œβ”€β”€ Num.java          # Integer literals
β”‚   β”‚   β”œβ”€β”€ Real.java         # Float literals
β”‚   β”‚   └── Tag.java          # Token tags
β”‚   β”œβ”€β”€ symbol/               # Symbol table & types
β”‚   β”‚   β”œβ”€β”€ Env.java          # Environment/scope
β”‚   β”‚   β”œβ”€β”€ Type.java         # Type system
β”‚   β”‚   β”œβ”€β”€ Array.java        # Array types
β”‚   β”‚   └── SymbolTablePrinter.java
β”‚   β”œβ”€β”€ javaCompiler/         # Main driver
β”‚   β”‚   └── JavaCompiler.java # Entry point
β”‚   └── test/                 # Test files
β”‚       β”œβ”€β”€ 01_empty_block.tst
β”‚       β”œβ”€β”€ 02_variable_declarations.tst
β”‚       β”œβ”€β”€ 03_simple_assignment.tst
β”‚       β”œβ”€β”€ 04_arithmetic_ops.tst
β”‚       β”œβ”€β”€ 05_boolean_expressions.tst
β”‚       β”œβ”€β”€ 06_if_else.tst
β”‚       β”œβ”€β”€ 07_while_loop.tst
β”‚       β”œβ”€β”€ 08_do_while_break.tst
β”‚       β”œβ”€β”€ 09_mixed_operations.tst
β”‚       └── 10_complex_boolean.tst
└── out/
    β”œβ”€β”€ [compiled class files]
    └── compiled/             # Generated assembly files
        └── *.asm

πŸ“ Usage Examples

Example 1: Simple Arithmetic

Input (test.tst):

{
    int x;
    int y;
    int result;
    
    x = 10;
    y = 20;
    result = x + y * 2;
}

Run:

java javaCompiler.JavaCompiler test.tst

Output:

=== Symbol Table Tree ===
Scope Level 0
  Scope Level 1
  +------------+----------+--------+
  | Identifier | Type     | Offset |
  +------------+----------+--------+
  | result     | int      | 8      |
  | y          | int      | 4      |
  | x          | int      | 0      |
  +------------+----------+--------+

=== Generated Code ===
PUSH 10
STORE x
PUSH 20
STORE y
LOAD x
LOAD y
PUSH 2
MUL
ADD
STORE result

=== Output saved to: compiled/test.asm ===

Example 2: Control Flow

Input:

{
    int i;
    i = 0;
    
    while (i < 5) {
        i = i + 1;
    }
}

Generated Code:

PUSH 0
STORE i
LABEL L0
LOAD i
PUSH 5
CMP <
JZ L1
LOAD i
PUSH 1
ADD
STORE i
JMP L0
LABEL L1

πŸ§ͺ Test Suite

The project includes 10 comprehensive test files:

Test File Description
01_empty_block.tst Empty block
02_variable_declarations.tst Variable declarations with different types
03_simple_assignment.tst Basic assignment statements
04_arithmetic_ops.tst Arithmetic operations and precedence
05_boolean_expressions.tst Boolean logic and comparisons
06_if_else.tst Conditional statements
07_while_loop.tst While loop structures
08_do_while_break.tst Do-while loops with break
09_mixed_operations.tst Complex nested structures
10_complex_boolean.tst Advanced boolean expressions

Running All Tests

Linux/macOS:

cd out
for f in ../src/test/*.tst; do
    echo "--- Running: $f ---"
    java javaCompiler.JavaCompiler "$f"
    echo ""
done

Windows (PowerShell):

cd out
Get-ChildItem ..\src\test\*.tst | ForEach-Object {
    Write-Host "--- Running: $($_.Name) ---"
    java javaCompiler.JavaCompiler $_.FullName
    Write-Host ""
}

πŸ”§ Development

Building from Source

# Clean previous build
rm -rf out

# Create output directory
mkdir -p out

# Compile
javac -d out src/**/*.java

# Navigate to output
cd out

Adding New Features

The codebase is designed to be extensible:

  1. New Operators: Add to Lexer.java and create corresponding AST nodes in inter/expr/
  2. New Statements: Create new statement classes in inter/stmt/ and update Parser.java
  3. New Data Types: Extend Type.java and update the parser's type() method
  4. Code Generation: Add new instruction types in Instruction.java and update CodeGen.java

Code Organization

  • Single Responsibility: Each class has a clear, focused purpose
  • Visitor Pattern: Code generation uses visitor pattern for AST traversal
  • Recursive Descent: Parser mirrors the grammar structure
  • Hierarchical Scopes: Symbol table uses parent pointers for scope chain

πŸŽ“ Learning Resources

This compiler is organized to demonstrate key compiler concepts:

  1. Lexical Analysis: See Lexer.java for tokenization techniques
  2. Parsing: Study Parser.java for recursive descent implementation
  3. AST Construction: Review inter/ for tree node design
  4. Symbol Tables: Examine Env.java for scope management
  5. Code Generation: Analyze CodeGen.java for instruction generation

πŸ“ˆ Future Enhancements

Potential improvements and extensions:

  • Optimization: Constant folding, dead code elimination
  • Arrays: Full multi-dimensional array support
  • Functions: Function declarations and calls
  • Structs: User-defined types
  • Error Recovery: Better error handling and recovery
  • Target Code: Generate actual machine code or bytecode
  • Virtual Machine: Implement an interpreter for the generated code
  • Debugger: Add debugging support with breakpoints

🀝 Contributing

This is an educational project. Contributions that enhance learning value are welcome!

Guidelines

  1. Keep changes focused and well-documented
  2. Add test cases for new features
  3. Maintain the teaching-oriented style
  4. Comment complex algorithms
  5. Follow existing code conventions

πŸ“ž Contact

For questions, issues, or suggestions:

  • Open an issue on the repository
  • Create a pull request with clear descriptions
  • Reach out for collaboration opportunities

πŸ“„ License

This project is unlicensed and provided as-is for educational purposes.


Happy Compiling! πŸš€

About

A compiler built in Java to understand the core concepts of compiler design, including lexical analysis, parsing, syntax tree generation, and code generation.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published