Skip to content

Strike24/Assembler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

63 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

C Assembler for Custom Architecture

C Ubuntu

Project Overview

This project is a complete Assembler in ANSI C for a hypothetical 12-bit CPU architecture. The program takes a raw assembly source file (.as), processes it, and outputs the binary machine code (.ob) along with memory maps for external and entry symbols.

Project Writeup in a PDF Format (Or you can just read it below in markdown)

How it works - System Architecture 🛠️

The assembler works with a Two-Pass Algorithem to handle translation process.

  1. Pre-Processor 🟠
    • Scans source code to identify macro definitions
    • Expands these macros inline and generates an intermediate file (.am), stripping out the definitions so the main assembler can process linear code.
  2. First Pass 🔴
    • Syntax Analysis - Parses every line to validate opcodes, operands, and addressing modes (Immediate, Direct, Struct, Register).
    • Memory Counting - Calculates the memory size of every instruction (IC) and data definition (DC). This is critical because size varies.
    • Symbol Table - Builds a Linked List of all labels (e.g., LOOP:, MAIN:) and assigns them memory addresses.
  3. Second Pass 🟢
    • Forward Referencing - Uses the symbol table from Pass One to fill in the addresses of labels that were called before they were defined.
    • Bitwise Packing - Uses bitwise operations (<<, &, |) to physically pack the Opcode, Source Register, Destination Register, and ARE (Attributes) into the 12-bit output words.

Error Management ⚠️

The assembler was designed to emulate real-world compilers (like gcc), prioritizing robustness and informative feedback.

  • Non-Blocking Error Detection: Instead of exiting on the first syntax error, the assembler sets an internal error_flag and continues parsing the rest of the file. This allows the user to see all syntax errors in the source code in a single run, rather than fixing them one by one.

  • Memory Safety in Failure States: A critical design challenge was ensuring memory is freed even when the program fails. I implemented a cleanup routine. Whether the program finishes successfully or encounters syntax errors, the exit path always traverses the Linked Lists and Symbol Tables to free() every allocated block, ensuring zero memory leaks (verified with Valgrind).

  • Context Reporting: I wrote a dedicated error-reporting module that takes the current file name, line number, and error code. It prints formatted messages to stderr, making debugging the assembly code intuitive for the user.
    Example: file.as:14: Error: Undefined label 'LOOP_START'

  • Input Validation: Every string parsing function (like strtok or custom pointer arithmetic) includes bounds checking to prevent buffer overflows if the input file contains malformed lines (e.g., a line longer than the buffer or missing commas).

Technical Highlights 💻

  • Low-Level Memory Management
    Fully dynamic memory allocation using malloc and free. The system utilizes Linked Lists to manage an unknown number of symbols and macros, ensuring no memory leaks occur even if the program errors out. The program frees every allocated memory in every exit.
  • Bitwise Manipulation
    Direct manipulation of bits to construct the binary output according to the specific Instruction Set Architecture (ISA).
  • Parsing
    wrote a custom text parser to handle whitespace, commas, and string validation without relying on heavy external libraries.

Skills Learned 💡

  • Low-Level C: pointers, structs, and dynamic memory allocation.
  • Data Structures: Implementation of Linked Lists and Hash Tables for symbol management.
  • System Programming: File I/O manipulation and binary file generation.
  • Computer Architecture: Understanding of instruction cycles, memory addressing modes (Immediate, Direct, Register), and the translation of mnemonics to binary.
  • Error handeling & Memory Mangement

About

Ansi C Assembler for a hypothetical 12-bit CPU architecture.

Topics

Resources

Stars

Watchers

Forks