Skip to content

hugefiver/randstr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

41 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

randstr

Racket License: MIT Version GitHub

A Racket library and command-line tool for generating random strings based on regex-like patterns.

Installation

To install, run:

raco pkg install randstr

Then run with:

racket -l randstr/cli "[a-z]{5}"

Usage

As a Library

(require randstr)

(randstr "[a-z]{5}")  ; Generate a random 5-letter lowercase string
(randstr "(abc|def)+")  ; Generate a string with repeated "abc" or "def"
(randstr* "[0-9]{3}" 10)  ; Generate 10 random 3-digit numbers

As a Command-Line Tool

randstr "[a-z]{5}"          # Generate one random string
randstr -n 10 "[0-9]{3}"    # Generate 10 random 3-digit numbers

Pattern Syntax

  • [abc] - Choose randomly from characters a, b, or c
  • [a-z] - Choose randomly from lowercase letters a through z
  • (abc|def) - Choose randomly between "abc" or "def"
  • a* - Zero or more of the preceding character
  • a+ - One or more of the preceding character
  • a? - Zero or one of the preceding character
  • . - Any character
  • \w - Word character (alphanumeric plus underscore)
  • \W - Non-word character
  • \s - Whitespace character (space, tab, newline, carriage return)
  • \S - Non-whitespace character
  • \d - Digit character (0-9)
  • \D - Non-digit character
  • {n} - Exactly n repetitions
  • {n+} - Normal distribution with mean n (2nd order)
  • {n++} - Normal distribution with mean n (3rd order, more concentrated)
  • {n1+n2} - Normal distribution in range n1..n2 (2nd order)
  • {n1++n2} - Normal distribution in range n1..n2 (3rd order)
  • {+n} - Shorthand for {0+n} (range 0..n)
  • {++n} - Shorthand for {0++n} (range 0..n, 3rd order)
  • (?<name>...) - Named group (captures pattern for later reference)
  • \k<name> - Backreference to named group
  • [:alpha:] - Alphabetic characters
  • [:digit:] - Numeric characters
  • [:alphanum:] - Alphanumeric characters
  • [:alnum:] - Alphanumeric characters (POSIX standard name)
  • [:word:] - Word characters (alphanumeric plus underscore)
  • [:blank:] - Blank characters (space and tab)
  • [:space:] - Whitespace characters
  • [:upper:] - Uppercase letters
  • [:lower:] - Lowercase letters
  • [:ascii:] - ASCII characters
  • [:cntrl:] - Control characters
  • [:graph:] - Printable characters except space
  • [:print:] - Printable characters including space
  • [:punct:] - Punctuation characters
  • [:xdigit:] - Hexadecimal digits

Nested POSIX Character Classes

The library also supports nested POSIX character classes, allowing you to mix POSIX classes with regular characters:

  • [[:upper:]0-9] - Uppercase letters and digits
  • [[:lower:]_] - Lowercase letters and underscores
  • [[:alpha:]0-9] - Alphabetic characters and digits

Character Class Duplicate Handling

When a character class contains duplicate elements, each unique character is treated equally regardless of how many times it appears in the class. For example:

  • [aaabbbccc] - Each of a, b, c has equal probability (1/3 each), not a=3/9, b=3/9, c=3/9
  • [a-cb-e] - Each of a, b, c, d, e has equal probability (1/5 each)
  • [[:digit:]0-2] - Digits 0, 1, 2 appear in both the POSIX class and the range, but each digit still has equal probability

This ensures fair distribution of character selection in all character classes.

Normal Distribution Quantifiers

Generate strings with lengths following a normal distribution, providing more realistic random data:

(randstr "\\w{10+}")    ; Mean of 10, 2nd order normal distribution
(randstr "\\w{10++}")   ; Mean of 10, 3rd order (more concentrated around mean)
(randstr "\\w{5+15}")   ; Range 5-15, 2nd order normal distribution
(randstr "\\w{5++15}")  ; Range 5-15, 3rd order (values closer to center)
(randstr "\\d{+10}")    ; Shorthand for {0+10}, range 0-10
(randstr "\\d{++10}")   ; Shorthand for {0++10}, range 0-10 (3rd order)

Higher order (more + signs) means values are more concentrated around the center.

Named Groups and Backreferences

Capture generated content and reuse it later in the pattern:

(randstr "(?<word>\\w{4})-\\k<word>")  ; => "aBc1-aBc1" (same word twice)
(randstr "(?<id>\\d{3}):\\k<id>")       ; => "742:742" (same ID twice)
(randstr "(?<a>[A-Z]{2})(?<b>\\d{2})-\\k<a>\\k<b>")  ; => "XY42-XY42"

Examples

(randstr "[a-z]{5}")     ; => "kxmpr"
(randstr "[0-9][a-z]+")  ; => "3xkzm"
(randstr "(abc|def)+")   ; => "abcdefabc"
(randstr "\\w+")         ; => "abc123_"
(randstr "\\W+")         ; => "!@#$%"
(randstr "\\d+")         ; => "12345"
(randstr "\\D+")         ; => "abcde"
(randstr "[[:alpha:]]+") ; => "abcXYZ"
(randstr "[[:digit:]]+") ; => "12345"
(randstr "[[:alnum:]]+") ; => "abc123"
(randstr "[[:word:]]+")  ; => "abc123_"
(randstr "[[:blank:]]*") ; => "   "
(randstr "[[:upper:]]+") ; => "ABCXYZ"
(randstr "[[:lower:]]+") ; => "abcxyz"
(randstr "[[:xdigit:]]+") ; => "1A2B3C"
(randstr "[[:upper:]0-9]+") ; => "A3B9C"
(randstr "[[:lower:]_]+") ; => "abc_def"
(randstr "[[:alpha:]0-9]+") ; => "abc123XYZ"

Testing

The project includes comprehensive tests to verify functionality:

# Run the main test suite
racket tests/test.rkt

# Run extension tests
racket tests/test-extensions.rkt

# Run specific module tests
racket tests/char-classes-test.rkt    # Character class functionality
racket tests/generator-test.rkt       # String generation logic
racket tests/tokenizer-test.rkt       # Pattern tokenization
racket tests/utils-test.rkt           # Utility functions

# Run performance benchmarks
racket tests/benchmark.rkt

# Run optimization tests
racket tests/optimization-test.rkt

The test suite includes:

  • test.rkt: Main test suite covering all core functionality
  • char-classes-test.rkt: Tests for character class handling
  • generator-test.rkt: Tests for string generation logic
  • tokenizer-test.rkt: Tests for pattern tokenization
  • utils-test.rkt: Tests for utility functions
  • test-extensions.rkt: Extended tests for POSIX character classes and special patterns
  • benchmark.rkt: Performance benchmarking tests
  • optimization-test.rkt: Tests for performance optimizations

Development (Nix)

If you use Nix, you can get a reproducible dev environment with Racket and Just:

# Enter the dev shell (flakes)
nix develop

# Build the package
nix build

# Run the CLI
nix run -- "[a-z]{5}"

# Run tests
just test

# Compile
just compile

For non-flake setups, you can use shell.nix:

nix-shell

nixpkgs submission

This repo includes a nixpkgs-style derivation at nix/package.nix, intended to be used via callPackage. In nixpkgs, the expression typically uses fetchFromGitHub for src and then calls that derivation.

Using Just (Command Runner)

This project includes a Justfile for common development tasks. If you have Just installed, you can use these commands:

# Compile the project
just compile

# Run all tests
just test

# Build executable
just build

# Clean compiled files
just clean

# Show all available commands
just

Changelog

v0.1.1

  • ✨ New: Normal distribution quantifiers ({n+}, {n++}, {n+++} etc.)
  • ✨ New: Range normal distribution ({n1+n2}, {n1++n2}, {+n}, {++n})
  • ✨ New: Named groups (?<name>...) for capturing generated content
  • ✨ New: Backreferences \k<name> for reusing captured content

v0.1.0

  • 🎉 Initial stable release
  • 🐛 Fixed: \W no longer incorrectly matches underscore
  • ⚡ Performance: Optimized character class deduplication with O(1) hash-set lookups
  • 🧹 Cleaned up internal code architecture

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

一种使用类似正则的语法生成随机字符串的程序

Resources

License

Stars

Watchers

Forks

Contributors 3

  •  
  •  
  •