Skip to content

fix: quote numeric-leading identifier segments in SQL output#22

Open
h22rana wants to merge 14 commits intomasterfrom
fix/quote-numeric-leading-identifiers
Open

fix: quote numeric-leading identifier segments in SQL output#22
h22rana wants to merge 14 commits intomasterfrom
fix/quote-numeric-leading-identifiers

Conversation

@h22rana
Copy link
Owner

@h22rana h22rana commented Feb 12, 2026

Summary

  • Identifier path segments starting with a digit (e.g. 24h, 7d, 10m) are now quoted with the dialect-appropriate character, fixing invalid SQL output for 52 of 125 schema fields.
  • Added NeedsQuoting(), QuoteIdentifierSegment(), and ContainsQuoteCharacters() helpers to the dialect package.
  • convertVarName() now splits dotted paths, selectively quotes only segments that need it, and rejoins — normal identifiers remain unquoted.
  • Pre-quoted identifiers in var names are rejected with a clear error to prevent silent double-quoting that produced corrupt SQL.
  • NewSchema() now validates field names at load time — quoted identifiers are rejected immediately instead of causing confusing "field not defined" errors at query time. Breaking change: NewSchema() now returns (*Schema, error).
  • ContainsQuoteCharacters() rejects all three quote styles: backticks, double quotes, and single quotes. This prevents single-quoted segments (e.g. data.'24h'.tx) from slipping through validation.
  • Error messages are concise and generic: contains quote characters; use raw identifiers — the transpiler handles quoting automatically.
  • Updated docs (getting-started.md, dialects.md, operators.md, api-reference.md, schema-validation.md, error-handling.md).

Quoting per dialect

Dialect Quote char Example
BigQuery / Spanner / ClickHouse Backtick data.user_transaction_history.`24h`.tx.sum
PostgreSQL / DuckDB Double quote data.user_transaction_history."24h".tx.sum

Raw identifier contract

Both schema field names and var names must use raw, unquoted identifiers. The transpiler handles all SQL quoting automatically.

Input Behavior
Schema with raw names + raw var Quoted segments applied by transpiler (valid SQL)
Schema with raw names + pre-quoted var Error: "contains quote characters"
Schema with quoted field names Error at schema load time

Test plan

  • 25 new dialect-specific identifier quoting tests (5 scenarios x 5 dialects)
  • 19 new unit tests for NeedsQuoting and QuoteIdentifierSegment
  • 10 new unit tests for ContainsQuoteCharacters (including single quotes)
  • 6 new tests for pre-quoted var name rejection (backtick, double quote, single quote)
  • 5 new tests for schema load-time validation (4 reject, 1 accept)
  • All 3,000+ existing tests pass — no regressions
  • Docs updated for identifier quoting and NewSchema signature change

NeedsQuoting returns true for identifier segments that start with a
digit or contain non-alphanumeric/underscore characters.

QuoteIdentifierSegment wraps a segment with the dialect-appropriate
quote character: backticks for BigQuery/Spanner/ClickHouse, double
quotes for PostgreSQL/DuckDB.
convertVarName now splits dotted paths, detects segments that need
quoting (e.g. 24h, 7d, 10m), and wraps them with dialect-appropriate
quote characters. Normal segments remain unquoted.

Before: data.user_transaction_history.24h.tx.sum  (invalid SQL)
After:  data.user_transaction_history.\`24h\`.tx.sum  (BigQuery)
        data.user_transaction_history."24h".tx.sum  (PostgreSQL)
Tests cover: numeric-leading segments in comparisons, missing operator,
var with defaults, multiple numeric segments, and normal segments
remaining unquoted — all across 5 dialects (25 test cases).
- getting-started.md: update Variable Naming section to mention
  automatic quoting of numeric-leading segments
- dialects.md: add Identifier Quoting section with per-dialect
  quote character table
- operators.md: add note linking to quoting docs in var section
convertVarName now returns an error if any path segment contains
backtick or double-quote characters. The transpiler handles quoting
automatically — users must pass raw identifiers (e.g. "24h" not
"\`24h\`"). This prevents silent double-quoting that produced
corrupt SQL output.
NewSchema now validates that field names contain no backtick or
double-quote characters, returning a clear error at schema load time
instead of a confusing "field not defined" error at query time.

This completes the raw-identifier contract: both schema entries and
var names must use unquoted identifiers — the transpiler handles
all SQL quoting automatically.
Extend ContainsQuoteCharacters to check for single quotes in addition
to backticks and double quotes. Schema fields like 'data.'24h'.tx.sum'
were slipping through validation. All three quote styles are now
rejected at both schema load time and var name resolution.
Update code comments and error messages to mention all three rejected
quote styles (backtick, double quote, single quote). Replace the
backtick-specific example in the schema error message with a generic
one so the message is accurate regardless of which quote style was used.
Show backtick, double quote, and single quote examples so users know
exactly which forms are disallowed in both schema field names and var
names.
Add StripQuoteCharacters helper that strips matching surrounding quotes
from a segment for display purposes. Error messages now show the actual
offending segment name instead of hardcoded '24h', and the segment is
displayed without its surrounding quotes for better readability.
Replace verbose per-segment examples with a generic message:
"contains quote characters; use raw identifiers — the transpiler
handles quoting automatically". Remove unused StripQuoteCharacters
helper and its tests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant