fix: quote numeric-leading identifier segments in SQL output#22
Open
fix: quote numeric-leading identifier segments in SQL output#22
Conversation
NeedsQuoting returns true for identifier segments that start with a digit or contain non-alphanumeric/underscore characters. QuoteIdentifierSegment wraps a segment with the dialect-appropriate quote character: backticks for BigQuery/Spanner/ClickHouse, double quotes for PostgreSQL/DuckDB.
convertVarName now splits dotted paths, detects segments that need
quoting (e.g. 24h, 7d, 10m), and wraps them with dialect-appropriate
quote characters. Normal segments remain unquoted.
Before: data.user_transaction_history.24h.tx.sum (invalid SQL)
After: data.user_transaction_history.\`24h\`.tx.sum (BigQuery)
data.user_transaction_history."24h".tx.sum (PostgreSQL)
Tests cover: numeric-leading segments in comparisons, missing operator, var with defaults, multiple numeric segments, and normal segments remaining unquoted — all across 5 dialects (25 test cases).
- getting-started.md: update Variable Naming section to mention automatic quoting of numeric-leading segments - dialects.md: add Identifier Quoting section with per-dialect quote character table - operators.md: add note linking to quoting docs in var section
convertVarName now returns an error if any path segment contains backtick or double-quote characters. The transpiler handles quoting automatically — users must pass raw identifiers (e.g. "24h" not "\`24h\`"). This prevents silent double-quoting that produced corrupt SQL output.
NewSchema now validates that field names contain no backtick or double-quote characters, returning a clear error at schema load time instead of a confusing "field not defined" error at query time. This completes the raw-identifier contract: both schema entries and var names must use unquoted identifiers — the transpiler handles all SQL quoting automatically.
Extend ContainsQuoteCharacters to check for single quotes in addition to backticks and double quotes. Schema fields like 'data.'24h'.tx.sum' were slipping through validation. All three quote styles are now rejected at both schema load time and var name resolution.
Update code comments and error messages to mention all three rejected quote styles (backtick, double quote, single quote). Replace the backtick-specific example in the schema error message with a generic one so the message is accurate regardless of which quote style was used.
Show backtick, double quote, and single quote examples so users know exactly which forms are disallowed in both schema field names and var names.
Add StripQuoteCharacters helper that strips matching surrounding quotes from a segment for display purposes. Error messages now show the actual offending segment name instead of hardcoded '24h', and the segment is displayed without its surrounding quotes for better readability.
Replace verbose per-segment examples with a generic message: "contains quote characters; use raw identifiers — the transpiler handles quoting automatically". Remove unused StripQuoteCharacters helper and its tests.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
24h,7d,10m) are now quoted with the dialect-appropriate character, fixing invalid SQL output for 52 of 125 schema fields.NeedsQuoting(),QuoteIdentifierSegment(), andContainsQuoteCharacters()helpers to the dialect package.convertVarName()now splits dotted paths, selectively quotes only segments that need it, and rejoins — normal identifiers remain unquoted.varnames are rejected with a clear error to prevent silent double-quoting that produced corrupt SQL.NewSchema()now validates field names at load time — quoted identifiers are rejected immediately instead of causing confusing "field not defined" errors at query time. Breaking change:NewSchema()now returns(*Schema, error).ContainsQuoteCharacters()rejects all three quote styles: backticks, double quotes, and single quotes. This prevents single-quoted segments (e.g.data.'24h'.tx) from slipping through validation.contains quote characters; use raw identifiers — the transpiler handles quoting automatically.getting-started.md,dialects.md,operators.md,api-reference.md,schema-validation.md,error-handling.md).Quoting per dialect
data.user_transaction_history.`24h`.tx.sumdata.user_transaction_history."24h".tx.sumRaw identifier contract
Both schema field names and
varnames must use raw, unquoted identifiers. The transpiler handles all SQL quoting automatically.varvarTest plan
NeedsQuotingandQuoteIdentifierSegmentContainsQuoteCharacters(including single quotes)NewSchemasignature change