Skip to content

Wide chars, UTF-8, terminal escapes and colors, etc. #69

@apjanke

Description

@apjanke

Cowsay doesn't handle variant character widths well. It kind of assumes all characters are 1 char wide (in display) and (I think) 1 byte (in the input encoding). This means that non-English/Latin characters in the cows or the message text are not handled well.

This is an expansion of #65 "Use Debian's UTF-8 Patch".

Aspects:

  • Messages with multi-byte encoded characters don't look right. The speech balloon width and word wrapping are wrong.
  • Terminal control (escape) sequences, like color codes, are treated as visible characters.

Bad-wrapped multi-byte char example:

image

Considerations

The cow files distributed with cowsay are all UTF-8, regardless of what locale the user is running in or how their system is set up. (I think? Or are they actually ASCII/Latin-1, since they are Perl source code?)

Message input might be in the user's locale while the cow files are UTF-8. Custom cows (including in third-party cow herd packages) might be in other encodings, which may or may not be the same encoding as

Perl's standard library doesn't support char width detection, I don't think. Would need a CPAN module for that. We currently don't take any deps on modules. Would need to figure out how to do that. I think we'd vendor the module (ship a copy of it in cowsay itself), to avoid creating any external dependencies or a more complicated install process.

Testing

Examples:

  • cowsay "MÖÖÖ"
  • cowsay 'Привет, мир!'
  • cowsay 'Ищу свое лицо. Особых примет нет.'

Wide chars:

ANSI terminal escapes:

  • echo 'Hello, World!' | toilet -w 100 --metal | cowsay -n
  • figlet "Hello World!" | toilet -f term --metal | /usr/games/cowsay -n

TODO

  • Determine and document which encoding(s) are supported for cowfiles.
    • Make our source code UTF-8 (with use utf8;)?
  • [-] Add support for detecting, and maybe explicitly setting, message input encoding.
  • [-] Bump required Perl to 5.8.1 (or 5.8.7, or later), which added Unicode fixes relevant to this UTF-8 stuff?
    • 5.8.0 added Unicode support and UTF8-ness of stdin/out/err/ARGV.
    • 5.8.1 restored behavior of stdin/out/err and ARGV not being interpreted as UTF8 by default.
    • ${^UTF8LOCALE} was added in 5.8.7.
  • Support wide chars and invisible chars (like terminal escapes).
    • ("Wide" in the sense that they are displayed 2-char width; not that they're a wchar type in encoding.)
  • Add tests for multibyte and wide characters.

References

Metadata

Metadata

Assignees

Labels

bugSomethin ain't right

Type

No type

Projects

Status

In Progress

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions