Skip to content

Some questions about Unicode aliases #74

@rljacobson

Description

@rljacobson

I am taking a look at named-characters.yml at @rocky's request, and it occurs to me that there are some philosophical questions that should be answered about which Unicode symbols should be included and how. It appears Mathics has been relatively conservative with respect to adding Unicode aliases, so I think this discussion is really about making additions to what you already have from here on, not really about removing existing symbols.

Broadly speaking, I propose the following heuristics:

  1. Unicode symbols used by Mathematica should be used in the same way by Mathics for the sake of compatibility.
  2. Unicode symbols that correspond semantically with existing mathematical symbols should be included. Example: (U+2212, "Minus Sign") should be an alias for ASCII - even though Mathematica does not consider it so.
  3. Unicode symbols outside of the Mathematical Operators Block (and the ASCII block) should be excluded unless one of the previous heuristics includes it. Example: (U+2715, "Multiplication X") can be used for Times but is in the Dingbats Block and is thus excluded.
  4. All typographical variants of "plain"/"regular" symbols should be excluded unless included by a previous heuristic. For example, all Full Width variants, bold variants, italic variants, and so forth are excluded.
  5. Unicode symbols should not be overloaded, i.e. should not be used for more than one underlying function, unless required for Mathematica compatibility. For example, (U+226B, "Much Greater-Than") is already used for GreaterGreater and therefore should not be an alias for >> for Put. Likewise, (U+226A, "Much Less-Than") for Get, (U+2237, "Proportion") for MessageName, etc.

The general idea is to continue to be conservative while also covering compatibility and use cases we are reasonably likely to encounter. I also argue that having these heuristics written down somewhere is helpful for future contributors, whether future us or someone else, for a variety of reasons.

These heuristics do not cover all cases worthy of discussion. Here are two cases where it's not clear whether the Unicode aliases should be included:

  • (U+2254, "Colon Equals") for SetDelayed. This feels to me to be, while not exact, a close enough semantic correspondence to include it under heuristic 2.
  • (U+22D9, "Very Much Greater-Than") for PutAppend. This would be awkward to include considering (U+226B, "Much Greater-Than") cannot be an alias for Put (by heuristic 5).

This issue is to solicit:

  1. Discussion on the heuristics themselves?
  2. Discussion on recording them somewhere?
  3. Opinions about my two specific symbols and ?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions