Skip to content

Better support for utf-8 #105

@nifuki

Description

@nifuki

If I get it right, this line in SmartHTML is supposed to escape html special characters:

# Escape text into &x1234; format ignoring a alphanumerics and a few special characters
$Text =~ s{([^\:\/\.\-\?\=\+\w\s&#%;)]|&(?!#?\w+;))}{"&#x".sprintf("%x", unpack(U,$1)).";"}ge;

But it messes up non-ASCII symbols in utf8, like "é". Consider using some other utf8-friendly way for that, for instance:

use HTML::Escape;
$Text = escape_html($Text);

Things get more complicated with this code:

sub is_valid {
my $self = shift;
my $EscapedHTML = encode_entities_numeric($self->value);
$self->value($EscapedHTML);
}

When chained with the SmartHTML we get some kind of double-encoding and end up with symbols rendered as &x1234; in a browser. What is the purpose of the encode_entities_numeric - some sanitizing similar to the regex in SmartHML above? If that's the case and if is_valid is always followed by the SmartHTML then encode_entities_numeric could probably be just removed. But I'm no expert in perl and DocDB and need advice on it.

Also is_valid is probably a confusing name as it doesn't really check that something is valid, but it seems to modify things instead.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions