- About the Project
- Recognized Token Types
- Whitespace
- Token Names
- Results
- How to Use
- Contributing
- License
This project is a Lexical Analyzer for a C-type language. It reads a source code file and outputs token names representing the building blocks of a programming language. There is a choice to do either a Lexical Analyzer, a Syntax Analyzer, or both. I choose to do the Lexical Analyzer.
List of supported tokens:
- 0+ whitespace characters, or comments enclosed in /* ... */ are allowed between any two tokens.
- Anything between comments (and the comment symbols) are not counted as tokens but rather ignored.
- Whitespace is required between two tokens that have an alphanumeric character or underscore at the edge.
- Keywords, identifiers, integer literals
- Whitespace is not allowed inside of tokens.
- Comma
- End_of_input
- Identifier Integer
- Keyword_if
- Keyword_else
- Keyword_while
- Keyword_print
- Keyword_putc
- LeftBrace
- LeftParen
- Op_add
- Op_and
- Op_assign
- Op_divide
- Op_equal
- Op_greater
- Op_greaterequal
- Op_less
- Op_lessequal
- Op_multiply
- Op_mod
- Op_negate
- Op_not
- Op_notequal
- Op_or
- Op_subtract
- RightBrace
- RightParen
- Semicolon
- String
The Lexical Analyzer will output the tokens to the terminal and .lex files in your directory.
Errors such as invalid tokens, unclosed literals, and unclosed comments will be displayed
with the line and position location.
The Lexical Analyzer will scan the source code from left to right and outputs the token information in the following format:
<Line (Row)> <Position (Column)> <Token Type> <Value>
-
Line number - The row number in the source code where the token was found.
-
Position - The column number in the line where the token starts
-
Token Type - The type of token identified
-
Value - The actual value of the token
The Lexical Analyzer will generate a name for the output file. The file will
be named with _new.lex suffix, and it will be added in your directory.
Make sure you have the following installed:
-
Java 23 or above
- This project uses JDK 23 (Amazon Corretto). You can download the JDK 24 from Oracle or Amazon Corretto.
-
IDE
- Any IDE of your choice (e.g., IntelliJ IDEA, Eclipse, BlueJ). This project was developed using IntelliJ IDEA 2023.3.6 (Community Edition), you can download it here.
-
JUnit 5
- This project uses JUnit 5 for testing. This is already included with IntelliJ IDEA. If you need to download it, you can download it here.
-
Gradle
- This project uses Gradle to build and run. For more information, visit Gradle.
git clone https://github.com/Angel43v3r/CSB310_Project1-Mini-Compiler.git
Right click on Lexer.java in your IDE and then click "Run". The program will generate the tokens
and save the output to a file with extension .lex in your directory, and display the results in the terminal.
- Project by: Jovy Ann Nelson
- Instructor: Eric Llyod
- Course: CSB340 - Programming Languages
- Project: Project 1 - Mini Compiler
- College: North Seattle College
- Term: Spring 2025
This project is licensed under the MIT License. Please refer to the LICENSE for more details.






