Skip to content

Angel43v3r/CSB310_Project1-Mini-Compiler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Mini Compiler

UML Diagram

Table of Contents

  1. About the Project
  2. Recognized Token Types
  3. Whitespace
  4. Token Names
  5. Results
  6. How to Use
  7. Contributing
  8. License

About the Project

This project is a Lexical Analyzer for a C-type language. It reads a source code file and outputs token names representing the building blocks of a programming language. There is a choice to do either a Lexical Analyzer, a Syntax Analyzer, or both. I choose to do the Lexical Analyzer.

Compiler Diagram

Recognized Token Types

List of supported tokens:

Operators

Operators

Symbols

Symbols

Keywords

Keywords

Identifiers and Literals

Identifiers and Literals

Special Token

Special Token

Whitespace

  • 0+ whitespace characters, or comments enclosed in /* ... */ are allowed between any two tokens.
    • Anything between comments (and the comment symbols) are not counted as tokens but rather ignored.
  • Whitespace is required between two tokens that have an alphanumeric character or underscore at the edge.
    • Keywords, identifiers, integer literals
  • Whitespace is not allowed inside of tokens.

Token Names

  • Comma
  • End_of_input
  • Identifier Integer
  • Keyword_if
  • Keyword_else
  • Keyword_while
  • Keyword_print
  • Keyword_putc
  • LeftBrace
  • LeftParen
  • Op_add
  • Op_and
  • Op_assign
  • Op_divide
  • Op_equal
  • Op_greater
  • Op_greaterequal
  • Op_less
  • Op_lessequal
  • Op_multiply
  • Op_mod
  • Op_negate
  • Op_not
  • Op_notequal
  • Op_or
  • Op_subtract
  • RightBrace
  • RightParen
  • Semicolon
  • String

Results

The Lexical Analyzer will output the tokens to the terminal and .lex files in your directory. Errors such as invalid tokens, unclosed literals, and unclosed comments will be displayed with the line and position location.

Token Output Format

The Lexical Analyzer will scan the source code from left to right and outputs the token information in the following format:

<Line (Row)>  <Position (Column)>   <Token Type>    <Value>
  • Line number - The row number in the source code where the token was found.

  • Position - The column number in the line where the token starts

  • Token Type - The type of token identified

  • Value - The actual value of the token

Output File

The Lexical Analyzer will generate a name for the output file. The file will be named with _new.lex suffix, and it will be added in your directory.

How to use

1. Prerequisites

Make sure you have the following installed:

  • Java 23 or above

    • This project uses JDK 23 (Amazon Corretto). You can download the JDK 24 from Oracle or Amazon Corretto.
  • IDE

    • Any IDE of your choice (e.g., IntelliJ IDEA, Eclipse, BlueJ). This project was developed using IntelliJ IDEA 2023.3.6 (Community Edition), you can download it here.
  • JUnit 5

    • This project uses JUnit 5 for testing. This is already included with IntelliJ IDEA. If you need to download it, you can download it here.
  • Gradle

    • This project uses Gradle to build and run. For more information, visit Gradle.

2. Clone the Repository

git clone https://github.com/Angel43v3r/CSB310_Project1-Mini-Compiler.git

3. Run the Project

Right click on Lexer.java in your IDE and then click "Run". The program will generate the tokens and save the output to a file with extension .lex in your directory, and display the results in the terminal.

Contributing

  • Project by: Jovy Ann Nelson
  • Instructor: Eric Llyod
  • Course: CSB340 - Programming Languages
  • Project: Project 1 - Mini Compiler
  • College: North Seattle College
  • Term: Spring 2025

License

This project is licensed under the MIT License. Please refer to the LICENSE for more details.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published