-
Notifications
You must be signed in to change notification settings - Fork 29
Description
Cowsay doesn't handle variant character widths well. It kind of assumes all characters are 1 char wide (in display) and (I think) 1 byte (in the input encoding). This means that non-English/Latin characters in the cows or the message text are not handled well.
This is an expansion of #65 "Use Debian's UTF-8 Patch".
Aspects:
- Messages with multi-byte encoded characters don't look right. The speech balloon width and word wrapping are wrong.
- Terminal control (escape) sequences, like color codes, are treated as visible characters.
Bad-wrapped multi-byte char example:
Considerations
The cow files distributed with cowsay are all UTF-8, regardless of what locale the user is running in or how their system is set up. (I think? Or are they actually ASCII/Latin-1, since they are Perl source code?)
Message input might be in the user's locale while the cow files are UTF-8. Custom cows (including in third-party cow herd packages) might be in other encodings, which may or may not be the same encoding as
Perl's standard library doesn't support char width detection, I don't think. Would need a CPAN module for that. We currently don't take any deps on modules. Would need to figure out how to do that. I think we'd vendor the module (ship a copy of it in cowsay itself), to avoid creating any external dependencies or a more complicated install process.
Testing
Examples:
cowsay "MÖÖÖ"cowsay 'Привет, мир!'cowsay 'Ищу свое лицо. Особых примет нет.'
Wide chars:
cowsay "我愛中國人"- from Debian 769565 "widechar not good":
cowsay 'でびあん/Debian'cowsay 谢谢你
- from Debian 769565 "widechar not good":
ANSI terminal escapes:
echo 'Hello, World!' | toilet -w 100 --metal | cowsay -nfiglet "Hello World!" | toilet -f term --metal | /usr/games/cowsay -n
TODO
- Determine and document which encoding(s) are supported for cowfiles.
- Make our source code UTF-8 (with
use utf8;)?
- Make our source code UTF-8 (with
- [-] Add support for detecting, and maybe explicitly setting, message input encoding.
- [-] Bump required Perl to 5.8.1 (or 5.8.7, or later), which added Unicode fixes relevant to this UTF-8 stuff?
- 5.8.0 added Unicode support and UTF8-ness of stdin/out/err/ARGV.
- 5.8.1 restored behavior of stdin/out/err and ARGV not being interpreted as UTF8 by default.
- ${^UTF8LOCALE} was added in 5.8.7.
- Support wide chars and invisible chars (like terminal escapes).
- ("Wide" in the sense that they are displayed 2-char width; not that they're a
wchartype in encoding.)
- ("Wide" in the sense that they are displayed 2-char width; not that they're a
- Add tests for multibyte and wide characters.
References
- Use Debian's UTF-8 Patch For Calculating the Number of Columns #65
- Third-party patches
- Perl doco
Metadata
Metadata
Assignees
Labels
Type
Projects
Status