https://en.wikipedia.org/wiki/Numeric_character_reference https://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references We should be able to support both numeric (decimal and hex) as well as named character references and parse them into UTF-8.