Phase 3: BASIC compiler frontend (zxbc) — 980/1036 parse-only parity#3
Open
Phase 3: BASIC compiler frontend (zxbc) — 980/1036 parse-only parity#3
Conversation
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Phase 3 foundation: type system enums (BasicType, SymbolClass, Scope, Convention), AST node tagged union, TypeInfo hierarchy, SymbolTable with scope chain, CompilerOptions struct, error/warning message system, and full CLI argument parser with all ~35 flags. Builds and runs (--version, --help) but parsing not yet implemented. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Hand-written lexer ported from src/zxbc/zxblex.py with:
- All 5 states: INITIAL, string, asm, preproc, comment, bin
- 120+ token types (operators, keywords, types, preproc directives)
- ZX Spectrum string escape sequences (\{pN}, \{iN}, UDG chars, etc.)
- Hex ($XX, 0xXX, NNh), octal (NNo), binary (%NN, NNb) number formats
- Block comments /' ... '/ with nesting
- Line continuation (_, \)
- Label detection (number/ID at column 1)
- #line directive handling
- Keyword lookup table (sorted, linear scan)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…pressions
955/1036 test files parse successfully (92%). Includes:
- Pratt expression parser with constant folding
- All major statement types (IF, FOR, WHILE, DO, DIM, LET, PRINT, etc.)
- Multi-var DIM, array initializers (=> {}), DIM AT for memory-mapped vars
- FUNCTION/SUB declarations with keyword-as-identifier param names
- String slicing (a$(x TO y), partial slices)
- Array element assignment (a(i) = expr)
- Print attributes (INK, PAPER, BRIGHT, FLASH, etc.)
- Builtin functions with optional parens and multi-arg (CHR$, LBOUND, etc.)
- PEEK(type, addr) and POKE type addr, val
- Named arguments (name:=expr)
- ON GOTO/GOSUB, SAVE/LOAD/VERIFY
- Address-of with array access (@A(i))
- Numeric labels at start of line (fixed lexer column check)
- Single-line IF with colon-separated statements and END IF
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Handle END WHILE as alternative to WEND - IF without THEN (Sinclair compat): IF cond stmt - ELSEIF with optional THEN - Colon after THEN: IF cond THEN: stmt: works as single-line IF - Labels don't consume block-enders (END, LOOP, NEXT, etc.) - READ into expressions/array elements - POKE with parentheses: POKE(addr, val) - ERROR statement - SUB with optional AS type (for error detection) - DIM array AT after initializer Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Lexer fixes: - Fix indented label detection (c >= 0 in whitespace scan) - BIN without digits returns 0 instead of consuming newline Parser fixes: - Redesign POKE handler with speculative parse for all forms: POKE(type, addr, val), POKE(addr, val), POKE type addr, val - IF THEN: edge cases — THEN: followed by newline goes multi-line, END IF continuation after single-line IF - Expression-as-statement: skip rest of expr after ID(...) - END IF/SUB/FUNCTION at statement level consumed as NOP - Sub call without parens supports named args (name:=expr) - NUMBER at statement start treated as label (indented line nums) - AS with unknown identifier accepted as forward type ref - Remove unused BIN state complexity (newline/comment/continuation) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…036) - Implement parse_gfx_attributes() for PLOT/DRAW/CIRCLE matching Python attr_list grammar (INK/PAPER/BRIGHT/FLASH/OVER/INVERSE with _TMP suffix) - Replace speculative POKE handler with deterministic RP-before-COMMA disambiguation - Extract parse_infix() from parse_expression() for reuse in expression-as-statement (no more token-skipping loops) - Store POKE type on AST node instead of (void) cast - Remove unused end_kw variable - Add CLAUDE.md rules 8 & 9: no voiding parsed values, no speculative parsing — every handler must match a Python grammar production Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…100%) - Extract zxbpp_lib static library from zxbpp for shared use - Link zxbc against zxbpp_lib, run preprocessor before parsing - Add stdlib/runtime include paths (src/lib/arch/<arch>/stdlib, .../runtime) matching Python's get_include_path() behavior - All 1036 .bas test files now parse successfully in --parse-only mode Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Matches Python's argparse prefix_chars="-+" behavior. The +W prefix is handled via argv pre-scan since ya_getopt only supports - prefix. Warning codes are stored for use by the semantic checking phase. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…rch tests The Python project has 6 test suites beyond functional .bas tests: cmdline (CLI flags), api (config/symtable), symbols (20 AST node types), arch/backend, arch/optimizer, arch/peephole. All must be matched by C port. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Survey of Python test suites beyond functional tests. Plan to match cmdline, config, utils, type system, and symbol table tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ly in opts - Fix bug: --org value was parsed but silently dropped (case 'S': break;) - Add org, heap_size, heap_address, headerless, parse_only to CompilerOptions - Implement parse_int() utility matching Python's api/utils.py (hex, bin, $, %) - Implement config_file.c — simple .ini reader matching Python's configparser - Wire up -F/--config-file to load [zxbc] section with cmdline override - Move parse_only from local variable to opts struct Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
New C test programs matching Python's non-functional test suites: - test_utils (14 tests) — matches tests/api/test_utils.py (parse_int formats) - test_config (6 tests) — matches tests/api/test_config.py (defaults, .ini load) - test_types (10 tests) — matches tests/symbols/test_symbolBASICTYPE.py - test_ast (13 tests) — matches tests/symbols/ node construction tests - test_symboltable (9 tests) — matches tests/api/test_symbolTable.py - run_cmdline_tests.sh (4 tests) — matches tests/cmdline/test_zxb.py Includes test_harness.h: minimal assert-based C test framework (no deps). All 7 ctest suites pass (including existing zxbpp functional tests). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Run test_utils, test_config, test_types, test_ast, test_symboltable and cmdline tests on Unix. Run test_utils and test_types on Windows. Update WIP progress tracker. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…tability - config_file.c: include compat.h for strcasecmp -> _stricmp on MSVC - test_config.c: replace unistd.h with guarded include, use cross-platform temp file creation (tmpnam_s on MSVC, mkstemp on POSIX) - CI: run all 5 unit test programs on Windows Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Extract zxbc_parse_args() into args.c/args.h for testable option parsing - Add cmdline_set bitmask to CompilerOptions for Python "None" semantics - Create test_cmdline.c with 15 tests matching test_zxb.py + test_arg_parser.py - Fix ya_getopt re-entrant reset (clear static start/end on ya_optind=0) - Verify actual option values (org, optimization_level, autorun, etc.) - Config file values only apply when cmdline doesn't override Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…suites Stage 2 — Symbol table (22 tests matching tests/api/test_symbolTable.py): - Add declare_variable() with type refs, suffix stripping, duplicate detection - Add declare_param() with SCOPE_parameter, duplicate error messages - Add declare_array() with TYPEREF/BOUNDLIST validation - Add check_is_declared/check_is_undeclared with scope-aware lookup - Error messages match Python format: "(stdin):N: error: ..." - Suffix handling: "a%" stored as "a", type validated against suffix Stage 3 — Check module (4 tests matching tests/api/test_check.py): - Add is_temporary_value() — STRING and VAR are not temporary, BINARY is - Matches Python's api/check.py logic for t-prefix checking Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Covers all tests/symbols/ Python test files: - NOP, NUMBER (type inference + t property), STRING, BINARY, BLOCK, SENTENCE, BOUND, BOUNDLIST, ARGLIST, ARRAYACCESS, FUNCDECL, FUNCTION, LABEL, STRSLICE, TYPE, TYPEALIAS, TYPECAST, TYPEREF, VAR, VARARRAY, CONSTEXPR, ASM, VARDECL, ARRAYDECL, PARAMLIST, ARGUMENT, UNARY, BUILTIN, CALL, ARRAYINIT, ID Added ast_number() — shared NUMBER creation with auto type inference Fixed ast_tag_name missing ARRAYINIT entry (ASan-caught OOB) test_build_parsetab.py is N/A — tests PLY table generation, not applicable to hand-written recursive descent parser. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- README: add Phase 3 status (1036/1036 parse-only), C unit test table (132 tests), zxbc parse-only + unit test badges, fix design decisions table (recursive-descent, not flex+bison), update roadmap marker - CHANGELOG: add 1.18.7+c3 entry, mark +c2 as internal - c-port-plan.md: check off Phase 3 items, fix parsing approach - WIP plan: mark test coverage complete, add all commits and progress - VERSION: bump to 1.18.7+c3 - zxbpp: fix nested block comment tracking, builtin macro registration guard Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Consolidate plan_phase3-test-coverage.md into plan_feature-phase3-zxbc_implementation.md. Add semantic analysis scope (3g-3l) covering symbol resolution, type coercion, scope management, statement semantics, and post-parse visitors. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ypecast, make_binary_node Port Python's check.py predicates (is_number, is_const, is_static, is_numeric, is_string, common_type) and symbol creation functions (make_typecast, make_binary_node, make_unary_node) to C. Wire them into the parser, replacing inline constant folding and binary/unary AST creation. Key changes: - check_common_type: full type promotion matching Python (float > fixed > signed integral, boolean→ubyte coercion) - make_typecast: static number conversion with digit-loss warnings, string↔number error checking, CONSTEXPR inner casting - make_binary_node: type coercion on operands, constant folding, CONSTEXPR wrapping for static expressions, string concatenation - make_unary_node: constant folding for MINUS/NOT/BNOT - symboltable_access_*: access_id, access_var, access_func, access_call, access_array, access_label — implicit declaration, class checking - Fix operator name: MUL→MULT to match Python's convention - Remove duplicate common_type from parser.c (now in compiler.c) All 1036/1036 parse-only tests pass. All 132 unit tests pass. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace bare AST_ID creation with symboltable_access_var() for variable references and symboltable_access_call() for function calls/array access. This enables: - Implicit variable declaration (undeclared vars auto-created) - Class-based dispatch (array→ARRAYACCESS, string var→STRSLICE, func/sub→FUNCCALL) - Accessed flag tracking on symbol table entries - CONST variables readable as vars in expressions All 1036/1036 parse-only tests pass. All 132 unit tests pass. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…only The "1036/1036 parse-only" metric only measured syntax parsing (C exits 0). The correct measure is exit-code parity with Python: 914/1036 (88%). 122 files where Python catches semantic errors that C doesn't yet. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ety (916/1036) - Add deprecated suffix ($%&!) type inference in access_id - Allow CLASS_array/function/sub in access_var (not just var/const) - Accept CLASS_const with string type in access_call for slicing - Skip typecast/binary when types are NULL or TYPE_unknown - Wire error_count into exit code (matching Python gl.has_errors) - Add post-parse validation stubs (check_pending_labels/calls/classes) - Add run_zxbc_tests.sh + 1036 Python baseline exit codes 916/1036 matching Python (3 false pos, 117 false neg). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…nce (920/1036) - Register function/sub in parent scope before body (enables recursion) - Register parameters in function scope with suffix stripping - Use access_id instead of access_var for bare IDs (matches Python p_id_expr) - Parse #pragma NAME = VALUE (explicit, strict, strict_bool, etc.) - Fix builtin without-parens precedence to PREC_UNARY (LEN x - 1) - Fix check_is_declared to emit error when show_error is true - Fix check_pending_labels to match Python's check.py exactly 920/1036 matching Python, 0 false positives. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… (929/1036) - Implement strict mode type checking in DIM, function return, params - Fix check_is_declared to emit error messages when show_error=true - Fix Python-crashing test baselines (chr, chr1, const6) - check_classes is effectively dead code in Python (class_ never None) 929/1036 matching Python, 0 false positives. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…1036) - Error when SUB is used in expression context (p_id_expr matching) - Error when assigning to CONST or SUB via LET - Detect class mismatch on re-declaration (DECLARE FUNCTION → SUB) - 934/1036 matching Python, 0 false positives Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…951/1036) - FOR loop variable validated via access_id (catches CONST/FUNCTION assignment) - LET assignment checks CLASS_function (lvalue02/03) - IF THEN: followed by statements → single-line IF, orphaned END IF errors - Bare ID as statement checks CLASS_var (funccall3) - Expression-context ID(args) errors for CLASS_unknown non-string (not array nor function) - Statement-context ID(args) allows CLASS_unknown for forward sub calls - Removed check_pending_labels from main.c (too aggressive, caused false positives) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Labels always declared in global scope (matching Python's move_to_global_scope) - check_pending_labels only checks CLASS_label nodes (GOTO/GOSUB targets) - Label definitions use access_label for proper global scoping - Labels can coexist with subs/functions of same name - Numeric line labels registered in symbol table - 0 false positives, 83 false negatives remaining (all semantic) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…(955/1036) - GOSUB errors inside SUB/FUNCTION (matching Python) - Duplicate label definition detection with file:line reference - function_level stack push/pop during function body parsing - Label coexistence with SUBs: skip duplicate check for non-label entries - 0 false positives, 81 false negatives remaining Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- DATA statement errors inside SUB/FUNCTION bodies - DIM variable redeclaration detection with file:line reference - function_level stack properly maintained for scope checks - Reverted over-aggressive array assignment check (array copy is valid) - 0 false positives, 78 false negatives remaining Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ce (962/1036)
- Cannot initialize array of type string (3 tests fixed)
- DIM array => {...} AT addr disallowed (both init and AT not allowed)
- Deprecated suffix ($%&!) type inference in DIM array declarations
- 0 false positives, 74 false negatives remaining
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add check for re-definition of already-declared functions/subs, matching Python's duplicate function name error. Only triggers when the function was not forward-declared (DECLARE). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Python's --parse-only runs the full pipeline (optimizer, translator) before returning — not just parsing. Generate correct baselines by capturing main()'s return value, not just SystemExit. - Add csrc/tests/zxbc_parse_expected/ with 1036 parse-only baselines - Update run_zxbc_tests.sh to use parse-only baselines by default (set ZXBC_FULL=1 for full compilation baselines) - Result: 961/1036 matched, 0 false positives, 75 false negatives (all require semantic analysis beyond parsing) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
DECLARE FUNCTION/SUB now correctly: - Skips body parsing (no END FUNCTION/SUB expected) - Sets forwarded=true flag - Detects duplicate DECLARE statements - Detects FUNCTION/SUB class mismatch with prior DECLARE Fixes: dup_func_decl, declare4, declare5, declare6 (963/1036, 0 FP) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…plicit - Remove !default_type guard from explicit check (Python checks always) - Report error but continue (matching Python's non-aborting behavior) - Route string slice names through symbol table (was bypassing checks) - Use "variable" classname for var/unknown context Fixes: explicit3, explicit4 (965/1036, 0 FP) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Track function calls in cs->function_calls during parsing - Wire up check_pending_calls post-parse validation in main.c Detects forward-declared but not implemented functions/subs - Strip deprecated suffixes ($%&!) in DIM declarations for consistent symbol table keys (matching access_id's suffix stripping) - Add suffix type inference for scalar DIM (DIM a$ → string type) - Use callee node directly in check_pending_calls instead of fresh lookup (handles nested scope exits correctly) Fixes: nosub, bad_fname_err4, paramstr5, stdlib_spectranet, and more Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…FUNCTION check (972/1036, 0 FP) - Add return type mismatch check between DECLARE and full definition - Preserve forward declaration type when definition has implicit type - Add parameter count, type, and byref mismatch checks for forward decls - Add function name suffix stripping (test$ → test) for consistent symbol table keys - Add SUB-as-FUNCTION check in check_pending_calls (FUNCCALL of CLASS_sub) - Add global scope lookup fallback in check_pending_calls for orphaned callees - Fixes: param3, param2, declare1-3, funccall7, subcall2 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…/1036, 0 FP) - Fix data_is_used tracking: set on READ (not DATA), track DATA in datas vec - Add post-parse "No DATA defined" check when READ used without DATA - Add mandatory-after-optional parameter declaration check - Fixes: readbug, read11, restore4, optional_param1 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Reject READ into non-lvalue expressions (must be variable or array element) - Reject READ of array variables without subscript (whole-array read) - Fixes: read1, read2, read6, read7 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…g 56 FNs Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Complete C port of the BASIC compiler frontend (
zxbc), achieving 94.6% exit-code parity with Python's--parse-onlymode across all 1036 functional test files.Parse-only parity
tests/functional/arch/zx48k/*.basA test "matches" when C and Python produce the same exit code (0 or non-zero). Python's
--parse-onlystill runs the full pipeline (semantic analysis, optimizer, translator) before the parse_only check — so parity requires more than just syntax parsing.What the C port catches
Beyond syntax parsing, the C port now performs post-parse validation matching several Python semantic checks:
DECLAREwithout matching implementation#pragma explicitundeclared variable checksLETon CONST variablesRemaining 56 false negatives
Suspected upstream bugs (3 tests) — @Xalior please confirm
chr,chr1, andconst6all crash in Python's optimizer with anAttributeError:These are valid programs (e.g.
LET a$ = CHR$ 33). The parser and semantic checker accept them. The crash happens because the optimizer's constant-folding resolvesCHR$(65)into aSymbolSTRINGnode, but thenvisit_LETcallsgetattr(x, "fname")without a default — which throws on string nodes. The fix in Python would begetattr(x, "fname", None).Our C port correctly exits 0 for these. Are these known bugs upstream? Should we match Python's crash behaviour, or is our exit-0 the correct answer here?
POKE with array subscript (4 tests)
poke3,poke4,poke5,poke6— Python's PLY lexer emits a distinctARRAY_IDtoken for identifiers previously declared as arrays. The grammar rulePOKE ARRAY_ID COMMA expr COMMA exprcatches multi-arg POKE with array targets as a syntax error. Our hand-written lexer doesn't distinguishARRAY_IDfromID, so we parse the POKE differently and don't hit the error path.Semantic analysis needed (42 tests)
These require type checking, constant evaluation, or argument validation that goes beyond our current parser-level checks:
01,50,51,bad_sigil,do_crash,sn_crash,optional_param3,lcd2,typecast2,substr_expr_err,rman62,bad_fname_err0–6,keyword_arg1,keyword_arg3,refconstparam3,pararray2–453,error_array,array_err,let_array_substr4/6/8,let_array_wrong_dimsarray11,arrlabels11/11b,const4/5/9,dim_const_crash,dim_dyn_errexplicit7,include_error,llb,label_decl2Syntax edge cases (7 tests)
PLY's LALR(1) grammar rejects certain constructs that our recursive-descent parser is more permissive about:
bin02—BINfollowed by identifier (PLY expects binary digit literal)def_func_inline—FUNCTION f() END FUNCTIONon one line (PLY requires newline before body)due_crash—CODEas parameter name (always a keyword token in PLY)ifthencoendif2—ELSE:with trailing colondoloopuntilsplitted—LOOP UNTILon same line asDObody via colon separatorwhileempty—WHILE ... END WHILEon one linelet_expr_type_crash— identifier used as type name inASArchitecture
tests/api/,tests/symbols/,tests/cmdline/suitestests/cmdline/test_zxb.pyzxbcflags, config file loading,--parse-onlylexer.c/hparser.c/hast.c,zxbc.hcompiler.cargs.c/h,options.c/herrmsg.c/hTest plan
🤖 Generated with Claude Code