Skip to content

Conversation

@harryadel
Copy link
Contributor

@harryadel harryadel commented Jan 6, 2025

Decoupling jQuery and Blaze would take substantial effort. jQuery is used in many places:

  1. HTML Parsing
DOMBackend.parseHTML = function (html) {
  return $jq.parseHTML(html, DOMBackend.getContext()) || [];
};
  1. DOM Selection
findBySelector: function (selector, context) {
  return $jq(selector, context);
}
  1. Element Teardown Detection
$jq.event.special[DOMBackend.Teardown._JQUERY_EVENT_NAME]
  1. Event Delegation/Handling
delegateEvents: function (elem, type, selector, handler) {
  $jq(elem).on(type, selector, handler);
}

They're ranked them in terms of ease of replacement, and impact. The challenge lies mostly in testing and ensuring cross browser compatibility so it's best to merge each change individually, do a minor release, test then repeat until all is merged then we can do a major release.

I chose to start off with parseHTML. It present a nice challenge where even if we got it off won't cause major errors and can act an indicator if the moving out of jQuery would be doable.

This implementation and the official tests were used in constructing the new tests to ensure backwards compatibility. You may remove the code I did and then re-add the jQuery parseHTML function and you'd find the tests still pass.

@harryadel harryadel marked this pull request as ready for review January 7, 2025 00:05
@harryadel
Copy link
Contributor Author

harryadel commented Jan 7, 2025

Copy link
Collaborator

@jankapunkt jankapunkt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much @harryadel for initiating this one!
I have one thing to discuss regarding the es5 code.

If Blaze depends on ecmascript then you're fine using es6+ because it will be compiled down (and for browser.legacy also in compatibility mode). On top of that, the classic var remains unscoped, while let and const are block-scoped, which allows for a more granular scoping and thus less errors.

However, this is just a suggestion and not a demand from my end and I'd like to see what others are saying.

@harryadel
Copy link
Contributor Author

I thought we need to support IE, no? that's why I opted for using es5 code. If that's not a concern I could revert back to es6

@jankapunkt
Copy link
Collaborator

My understanding is, that web.browser.legacy builds in a way that IE is supported. However I'm not 100% if that's the case.

@StorytellerCZ
Copy link
Collaborator

Plus IE is already out of official support anyway so I think we should scratch out IE support.
Right now the biggest offender is Safari.

Comment on lines 42 to 45
// Return empty array for empty strings
if (html === "") {
return [];
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is covered by !html above.

Is this implementation made by you or based on some existing one? It's not trivial (at least to me, because I'm not sure what "fancy stuff" is jQuery doing and what "IE quirks" are we talking about) and I'm wondering if there's maybe a different small and maintained library we could use instead.

Another difference I can see is that jQuery removes scripts by default (see keepScripts), which is not implemented here (and it's missing in the tests).

Copy link
Contributor Author

@harryadel harryadel Jan 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is covered by !html above.

Good catch!

Is this implementation made by you or based on some existing one? It's not trivial

It's a mix of everything really, as I've said this example was source of inspiration along with jQuery's implementation.

I'm wondering if there's maybe a different small and maintained library we could use instead.

There're definitely lots of libraries like htmlparser2 but again they're not drag n' drop kind of replacement. They require fine tuning to be backwards compatible. I'd love to be proven wrong if someone out there knows of a 1:1 alternative.

Another difference I can see is that jQuery removes scripts by default (see keepScripts), which is not implemented here (and it's missing in the tests).

You're right. That can be added.

All in all, this PR can be used as a spring board for further discussion to seek our best solution. There're native APIs out there that can integrated like createHTMLDocument and DOMParser. Or using NPM libraries along with other modifications.

Copy link
Contributor Author

@harryadel harryadel Jan 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@radekmie I made new modifications. Please recheck.

The only drawback is how garbage input gets handled now:

<#if><tr><p>Test</p></tr><#/if>  // Garbage input
// jQuery returns a length of 1
// Current solution returns 4 

jQuery would return a length of 1 as it attempts to maintain a root element for garbage input but in the new implementation it returns 4 as it creates a new element for each tag. I feel it's a small price to pay without trying to over engineer the current solution.

Also regarding the keepScripts part you mentioned which meant jQuery by default removes the script tag is now accounted for by a test case and https://github.com/apostrophecms/sanitize-html is used to handle other XSS. So in theory, when it comes to security the current implementation is better than the previous one.

EDIT: It appears that HTML sanitization causes problems due to event removal, we might need to only stick to script tag and call it a day 🤷. We'll see.

@distalx
Copy link
Contributor

distalx commented Jan 27, 2025

Should we consider use of document.createDocumentFragment()? It might help improve performance by reducing repaints when appending multiple child nodes. ref.

@jankapunkt
Copy link
Collaborator

@distalx I think we can put this on the list for improvements as it definitely makes sense for larger lists etc. However for now I'd like to have a maximum in compliance to the existing code behavior in order to not break things unless really necessary.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR attempts to replace jQuery's parseHTML function with a native implementation to reduce jQuery dependency in the Blaze templating library. The change introduces a new implementation using the sanitize-html npm package for HTML sanitization and native DOM APIs for parsing. However, this represents a fundamental departure from jQuery's behavior, introducing HTML sanitization where none existed before.

Key changes:

  • Replaces jQuery.parseHTML() with a native implementation using sanitize-html and DOM APIs
  • Adds comprehensive test coverage for HTML parsing edge cases, whitespace handling, and XSS prevention
  • Introduces sanitize-html@2.11.0 as a new npm dependency

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 13 comments.

File Description
packages/blaze/dombackend.js Implements new parseHTML function with sanitize-html integration and table element wrapping logic
packages/blaze/package.js Adds sanitize-html@2.11.0 dependency
packages/blaze/render_tests.js Adds extensive test suite for parseHTML functionality and XSS prevention

Critical Concerns:

The implementation has several critical issues that need to be addressed:

  1. Breaking API Change: jQuery's parseHTML did NOT sanitize HTML - it only parsed it. This implementation fundamentally changes the behavior by stripping content, which will break applications that rely on parseHTML preserving all HTML (e.g., for template compilation).

  2. Security Configuration Contradicts Tests: The sanitization config explicitly allows event handlers (onclick, onmouseover, etc.) while tests expect them to be stripped. This creates both a security vulnerability and test failures.

  3. Browser Compatibility: The implementation uses HTMLTemplateElement which is not supported in IE11, contradicting the PR's stated goal of ensuring cross-browser compatibility.

  4. Incomplete Implementation: The context parameter is accepted but never used, breaking the API contract. Table element wrapping is incomplete (missing col/colgroup cases).

The PR's intention to decouple from jQuery is valuable, but this implementation needs significant rework to either:

  • Match jQuery's behavior exactly (no sanitization), or
  • Clearly document this as a breaking change with proper migration guidance

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +46 to +64
if (tableElementMatch) {
const tagName = tableElementMatch[1].toLowerCase();
// Create appropriate container based on the table element
switch (tagName) {
case 'td':
case 'th':
container = document.createElement('tr');
break;
case 'tr':
container = document.createElement('tbody');
break;
case 'tbody':
case 'thead':
case 'tfoot':
container = document.createElement('table');
break;
default:
container = document.createElement('template');
}
Copy link

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The table element wrapping logic is incomplete. Elements like 'colgroup' and 'col' should be wrapped in a table element for proper parsing, but they fall through to the default case which uses a template element. This will likely cause parsing issues. The switch statement should include a case for 'col' and 'colgroup' similar to tbody/thead/tfoot.

Copilot uses AI. Check for mistakes.
'a': ['href', 'target', 'rel'],
'img': ['src', 'alt', 'width', 'height'],
'iframe': ['src', 'width', 'height', 'frameborder', 'allowfullscreen'],
'input': ['type', 'value', 'placeholder', 'checked', 'disabled', 'readonly', 'required', 'pattern', 'min', 'max', 'step', 'minlength', 'maxlength', 'stuff'],
Copy link

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The attribute name 'stuff' appears to be a placeholder or typo in the input element's allowed attributes list. This should either be removed or replaced with a valid attribute name if it was intended to be something else.

Suggested change
'input': ['type', 'value', 'placeholder', 'checked', 'disabled', 'readonly', 'required', 'pattern', 'min', 'max', 'step', 'minlength', 'maxlength', 'stuff'],
'input': ['type', 'value', 'placeholder', 'checked', 'disabled', 'readonly', 'required', 'pattern', 'min', 'max', 'step', 'minlength', 'maxlength'],

Copilot uses AI. Check for mistakes.
Comment on lines +69 to +133
// Sanitize the HTML with sanitize-html
const cleanHtml = sanitizeHtml(html, {
allowedTags: [
// Basic elements
'div', 'span', 'p', 'br', 'hr', 'b', 'i', 'em', 'strong', 'u',
'a', 'img', 'pre', 'code', 'blockquote',
// Lists
'ul', 'ol', 'li', 'dl', 'dt', 'dd',
// Headers
'h1', 'h2', 'h3', 'h4', 'h5', 'h6',
// Table elements
'table', 'thead', 'tbody', 'tfoot',
'tr', 'td', 'th', 'col', 'colgroup',
// Form elements
'input', 'textarea', 'select', 'option', 'label', 'button',
// Other elements
'iframe', 'article', 'section', 'header', 'footer', 'nav',
'aside', 'main', 'figure', 'figcaption', 'audio', 'video',
'source', 'canvas', 'details', 'summary'
],
allowedAttributes: {
'*': [
'class', 'id', 'style', 'title', 'role', 'data-*', 'aria-*',
// Allow event handlers
'onclick', 'onmouseover', 'onmouseout', 'onkeydown', 'onkeyup', 'onkeypress',
'onfocus', 'onblur', 'onchange', 'onsubmit', 'onreset'
],
'a': ['href', 'target', 'rel'],
'img': ['src', 'alt', 'width', 'height'],
'iframe': ['src', 'width', 'height', 'frameborder', 'allowfullscreen'],
'input': ['type', 'value', 'placeholder', 'checked', 'disabled', 'readonly', 'required', 'pattern', 'min', 'max', 'step', 'minlength', 'maxlength', 'stuff'],
'textarea': ['rows', 'cols', 'wrap', 'placeholder', 'disabled', 'readonly', 'required', 'minlength', 'maxlength'],
'select': ['multiple', 'disabled', 'required', 'size'],
'option': ['value', 'selected', 'disabled'],
'button': ['type', 'disabled'],
'col': ['span', 'width'],
'td': ['colspan', 'rowspan', 'headers'],
'th': ['colspan', 'rowspan', 'headers', 'scope']
},
allowedSchemes: ['http', 'https', 'ftp', 'mailto', 'tel', 'data'],
allowedSchemesByTag: {
'img': ['data']
},
allowedSchemesAppliedToAttributes: ['href', 'src', 'cite'],
allowProtocolRelative: true,
parser: {
lowerCaseTags: false, // Preserve tag case for proper testing
decodeEntities: true
},
// Preserve empty attributes
transformTags: {
'*': function(tagName, attribs) {
// Convert null/undefined attributes to empty strings
Object.keys(attribs).forEach(key => {
if (attribs[key] === null || attribs[key] === undefined) {
delete attribs[key];
}
});
return {
tagName,
attribs
};
}
}
});
Copy link

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using sanitize-html fundamentally changes the behavior of parseHTML compared to jQuery's implementation. The jQuery.parseHTML function did NOT sanitize HTML - it simply parsed it into DOM nodes. This is a breaking change that alters the API contract. Applications relying on parseHTML to preserve all HTML content (including scripts and event handlers for legitimate use cases like template compilation) will break. Consider implementing parseHTML without sanitization, as the original function was for parsing, not sanitizing.

Copilot uses AI. Check for mistakes.
Comment on lines +988 to +996
{
html: "<div><p>Test</p><img src='x' onerror='alert(\"XSS\")'></div>",
description: "Prevents event handler injection",
checks: (result) => {
test.equal(result.length, 1, "Should parse into a single element");
const div = result[0];
const img = div.querySelector('img');
test.isNotNull(img, "Image element should be preserved");
test.isFalse(img.hasAttribute('onerror'), "Event handler should be stripped");
Copy link

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test expects that onerror event handlers should be stripped (line 996), but the sanitize-html configuration explicitly allows onclick, onmouseover, and other event handlers (lines 92-94). This test will fail with the current implementation because the configuration permits these handlers. Either fix the sanitization configuration to strip all event handlers, or update the test to match the actual behavior.

Copilot uses AI. Check for mistakes.
Comment on lines +63 to +139
container = document.createElement('template');
}
} else {
container = document.createElement('template');
}

// Sanitize the HTML with sanitize-html
const cleanHtml = sanitizeHtml(html, {
allowedTags: [
// Basic elements
'div', 'span', 'p', 'br', 'hr', 'b', 'i', 'em', 'strong', 'u',
'a', 'img', 'pre', 'code', 'blockquote',
// Lists
'ul', 'ol', 'li', 'dl', 'dt', 'dd',
// Headers
'h1', 'h2', 'h3', 'h4', 'h5', 'h6',
// Table elements
'table', 'thead', 'tbody', 'tfoot',
'tr', 'td', 'th', 'col', 'colgroup',
// Form elements
'input', 'textarea', 'select', 'option', 'label', 'button',
// Other elements
'iframe', 'article', 'section', 'header', 'footer', 'nav',
'aside', 'main', 'figure', 'figcaption', 'audio', 'video',
'source', 'canvas', 'details', 'summary'
],
allowedAttributes: {
'*': [
'class', 'id', 'style', 'title', 'role', 'data-*', 'aria-*',
// Allow event handlers
'onclick', 'onmouseover', 'onmouseout', 'onkeydown', 'onkeyup', 'onkeypress',
'onfocus', 'onblur', 'onchange', 'onsubmit', 'onreset'
],
'a': ['href', 'target', 'rel'],
'img': ['src', 'alt', 'width', 'height'],
'iframe': ['src', 'width', 'height', 'frameborder', 'allowfullscreen'],
'input': ['type', 'value', 'placeholder', 'checked', 'disabled', 'readonly', 'required', 'pattern', 'min', 'max', 'step', 'minlength', 'maxlength', 'stuff'],
'textarea': ['rows', 'cols', 'wrap', 'placeholder', 'disabled', 'readonly', 'required', 'minlength', 'maxlength'],
'select': ['multiple', 'disabled', 'required', 'size'],
'option': ['value', 'selected', 'disabled'],
'button': ['type', 'disabled'],
'col': ['span', 'width'],
'td': ['colspan', 'rowspan', 'headers'],
'th': ['colspan', 'rowspan', 'headers', 'scope']
},
allowedSchemes: ['http', 'https', 'ftp', 'mailto', 'tel', 'data'],
allowedSchemesByTag: {
'img': ['data']
},
allowedSchemesAppliedToAttributes: ['href', 'src', 'cite'],
allowProtocolRelative: true,
parser: {
lowerCaseTags: false, // Preserve tag case for proper testing
decodeEntities: true
},
// Preserve empty attributes
transformTags: {
'*': function(tagName, attribs) {
// Convert null/undefined attributes to empty strings
Object.keys(attribs).forEach(key => {
if (attribs[key] === null || attribs[key] === undefined) {
delete attribs[key];
}
});
return {
tagName,
attribs
};
}
}
});

// Parse the sanitized HTML
container.innerHTML = cleanHtml;

// Return the nodes, handling both template and regular elements
return Array.from(container instanceof HTMLTemplateElement ? container.content.childNodes : container.childNodes);
Copy link

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using HTMLTemplateElement is not supported in Internet Explorer 11 and older browsers. The code creates a template element on line 63 and 66, then checks if it's an HTMLTemplateElement on line 139. For browsers that don't support template elements, this will fail. The PR description mentions cross-browser compatibility as a key concern, but this implementation breaks IE11 support that jQuery.parseHTML provided.

Copilot uses AI. Check for mistakes.
Comment on lines +91 to +94
'class', 'id', 'style', 'title', 'role', 'data-*', 'aria-*',
// Allow event handlers
'onclick', 'onmouseover', 'onmouseout', 'onkeydown', 'onkeyup', 'onkeypress',
'onfocus', 'onblur', 'onchange', 'onsubmit', 'onreset'
Copy link

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allowing inline event handlers (onclick, onmouseover, etc.) in the sanitization configuration defeats the purpose of XSS prevention. These event handler attributes can execute arbitrary JavaScript code and are a common XSS attack vector. The XSS prevention tests (lines 976-1019) expect these handlers to be stripped, but the configuration explicitly allows them, which means the tests are actually verifying incorrect behavior.

Suggested change
'class', 'id', 'style', 'title', 'role', 'data-*', 'aria-*',
// Allow event handlers
'onclick', 'onmouseover', 'onmouseout', 'onkeydown', 'onkeyup', 'onkeypress',
'onfocus', 'onblur', 'onchange', 'onsubmit', 'onreset'
'class', 'id', 'style', 'title', 'role', 'data-*', 'aria-*'

Copilot uses AI. Check for mistakes.
},
allowedSchemes: ['http', 'https', 'ftp', 'mailto', 'tel', 'data'],
allowedSchemesByTag: {
'img': ['data']
Copy link

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allowing 'javascript:' protocol in iframe src attributes is a critical security vulnerability. The allowedSchemes list doesn't include 'javascript', but the test on line 1001-1011 expects javascript: URLs to be stripped. However, sanitize-html may not strip javascript: URLs from iframes by default unless explicitly configured. The configuration should explicitly exclude javascript: protocol or use disallowedTagsMode to ensure proper sanitization.

Suggested change
'img': ['data']
'img': ['data'],
'iframe': ['http', 'https']

Copilot uses AI. Check for mistakes.
Comment on lines +976 to +986
Tinytest.add("blaze - security - XSS prevention in HTML parsing", function (test) {
const xssTestCases = [
{
html: "<div><p>Test</p><script>alert('XSS')</script></div>",
description: "Prevents inline script execution",
checks: (result) => {
test.equal(result.length, 1, "Should parse into a single element");
const div = result[0];
test.equal(div.querySelector('script'), null, "Script tag should be removed");
test.equal(div.querySelector('p').textContent, "Test", "Safe content should be preserved");
}
Copy link

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The XSS prevention test expects script tags to be removed (line 984), but this represents a change in behavior from jQuery.parseHTML which would preserve script tags (though not execute them). If the goal is to replace jQuery.parseHTML with equivalent functionality, these tests are validating incorrect behavior. The tests should either be removed, or the implementation should be clearly documented as intentionally diverging from jQuery's behavior for security reasons.

Copilot uses AI. Check for mistakes.
Comment on lines +854 to +858
const selfClosing = "<div/>Content";
const selfClosingResult = Blaze._DOMBackend.parseHTML(selfClosing);
test.equal(selfClosingResult.length, 1);
test.equal(selfClosingResult[0].nodeName, "DIV");
test.equal(selfClosingResult[0].nodeType, Node.ELEMENT_NODE);
Copy link

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The self-closing tag test on line 854 expects parseHTML to handle "<div/>Content" by creating a DIV element with "Content" as text content. However, sanitize-html may parse this differently than jQuery, potentially creating an empty div followed by a text node. The test should verify the actual structure that results from this input, including checking if "Content" is inside or outside the div element.

Copilot uses AI. Check for mistakes.
Comment on lines +19 to +21
// Check if createHTMLDocument is supported directly
if (document.implementation && document.implementation.createHTMLDocument) {
DOMBackend._context = document.implementation.createHTMLDocument("");
Copy link

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment mentions checking for createHTMLDocument support directly, but the original code checked via jQuery.support.createHTMLDocument which may have included additional browser-specific checks or polyfills. The direct document.implementation check might not account for all edge cases that jQuery handled. Consider verifying that this simplified check works correctly across all target browsers, especially older ones.

Copilot uses AI. Check for mistakes.
@jankapunkt jankapunkt added this to the 3.1 milestone Dec 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants