fix: support unquoted Unicode identifiers in ANSI and Postgres dialects by benfdking · Pull Request #2282 · quarylabs/sqruff

benfdking · 2026-02-05T14:22:32Z

Summary

Replace ASCII-only character classes ([a-zA-Z0-9_]) with Unicode-aware classes (\p{L}, \p{N}) in lexer word patterns and parser identifier regexes for the ANSI and Postgres dialects
Fixes panics when linting SQL with unquoted multibyte identifiers (e.g. 日本語, café, über)
Adds test fixtures for Unicode identifiers across ANSI, Postgres, and DuckDB dialects

Closes #2067

chatgpt-codex-connector · 2026-02-05T14:22:40Z

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

github-actions · 2026-02-05T14:43:09Z

Benchmark for `cdcdbfe`

Click to view benchmark

Test	Base	PR	%
DepthMap::from_parent	53.9±0.83µs	54.4±1.26µs	+0.93%
fix_complex_query	12.5±0.26ms	12.6±0.30ms	+0.80%
fix_superlong	197.3±9.91ms	199.6±8.56ms	+1.17%
parse_complex_query	4.1±0.04µs	4.2±0.03µs	+2.44%
parse_expression_recursion	7.1±0.09µs	7.3±0.07µs	+2.82%
parse_simple_query	1057.8±17.41ns	1053.3±16.45ns	-0.43%

github-actions · 2026-02-05T21:49:56Z

Benchmark for `f9743fa`

Click to view benchmark

Test	Base	PR	%
DepthMap::from_parent	52.6±0.66µs	54.0±2.61µs	+2.66%
fix_complex_query	12.9±0.10ms	12.9±0.39ms	0.00%
fix_superlong	225.6±8.02ms	232.3±10.61ms	+2.97%
parse_complex_query	4.1±0.06µs	4.2±0.06µs	+2.44%
parse_expression_recursion	7.0±0.10µs	7.2±0.15µs	+2.86%
parse_simple_query	1039.2±19.62ns	1039.4±36.25ns	+0.02%

Replace ASCII-only character classes ([a-zA-Z0-9_]) with Unicode-aware classes (\p{L}, \p{N}) in lexer word patterns and parser identifier regexes for the ANSI and Postgres dialects. This fixes panics when linting SQL with unquoted multibyte identifiers (e.g. Japanese, French, German characters). Closes #2067

github-actions · 2026-02-11T09:17:45Z

Benchmark for `aaaa31f`

Click to view benchmark

Test	Base	PR	%
DepthMap::from_parent	52.3±0.70µs	52.9±0.84µs	+1.15%
fix_complex_query	12.4±0.19ms	12.3±0.11ms	-0.81%
fix_superlong	174.8±4.72ms	168.5±9.49ms	-3.60%
parse_complex_query	4.1±0.04µs	4.2±0.04µs	+2.44%
parse_expression_recursion	7.1±0.08µs	7.1±0.11µs	0.00%
parse_simple_query	1074.6±22.27ns	1071.5±19.80ns	-0.29%

benfdking force-pushed the fix/unicode-identifiers branch from cefa1ee to f87ada2 Compare February 5, 2026 21:35

benfdking added 2 commits February 11, 2026 08:51

formatting

ce4af88

benfdking force-pushed the fix/unicode-identifiers branch from f87ada2 to ce4af88 Compare February 11, 2026 09:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: support unquoted Unicode identifiers in ANSI and Postgres dialects#2282

fix: support unquoted Unicode identifiers in ANSI and Postgres dialects#2282
benfdking wants to merge 2 commits intomainfrom
fix/unicode-identifiers

benfdking commented Feb 5, 2026

Uh oh!

chatgpt-codex-connector bot commented Feb 5, 2026

Uh oh!

github-actions bot commented Feb 5, 2026

Uh oh!

github-actions bot commented Feb 5, 2026

Uh oh!

github-actions bot commented Feb 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

benfdking commented Feb 5, 2026

Summary

Uh oh!

chatgpt-codex-connector bot commented Feb 5, 2026

Uh oh!

github-actions bot commented Feb 5, 2026

Benchmark for cdcdbfe

Uh oh!

github-actions bot commented Feb 5, 2026

Benchmark for f9743fa

Uh oh!

github-actions bot commented Feb 11, 2026

Benchmark for aaaa31f

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Benchmark for `cdcdbfe`

Benchmark for `f9743fa`

Benchmark for `aaaa31f`