-
-
Notifications
You must be signed in to change notification settings - Fork 116
Replace jquery parseHTML with native alternative #474
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
jankapunkt
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much @harryadel for initiating this one!
I have one thing to discuss regarding the es5 code.
If Blaze depends on ecmascript then you're fine using es6+ because it will be compiled down (and for browser.legacy also in compatibility mode). On top of that, the classic var remains unscoped, while let and const are block-scoped, which allows for a more granular scoping and thus less errors.
However, this is just a suggestion and not a demand from my end and I'd like to see what others are saying.
|
I thought we need to support IE, no? that's why I opted for using es5 code. If that's not a concern I could revert back to es6 |
|
My understanding is, that |
|
Plus IE is already out of official support anyway so I think we should scratch out IE support. |
packages/blaze/dombackend.js
Outdated
| // Return empty array for empty strings | ||
| if (html === "") { | ||
| return []; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is covered by !html above.
Is this implementation made by you or based on some existing one? It's not trivial (at least to me, because I'm not sure what "fancy stuff" is jQuery doing and what "IE quirks" are we talking about) and I'm wondering if there's maybe a different small and maintained library we could use instead.
Another difference I can see is that jQuery removes scripts by default (see keepScripts), which is not implemented here (and it's missing in the tests).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is covered by !html above.
Good catch!
Is this implementation made by you or based on some existing one? It's not trivial
It's a mix of everything really, as I've said this example was source of inspiration along with jQuery's implementation.
I'm wondering if there's maybe a different small and maintained library we could use instead.
There're definitely lots of libraries like htmlparser2 but again they're not drag n' drop kind of replacement. They require fine tuning to be backwards compatible. I'd love to be proven wrong if someone out there knows of a 1:1 alternative.
Another difference I can see is that jQuery removes scripts by default (see keepScripts), which is not implemented here (and it's missing in the tests).
You're right. That can be added.
All in all, this PR can be used as a spring board for further discussion to seek our best solution. There're native APIs out there that can integrated like createHTMLDocument and DOMParser. Or using NPM libraries along with other modifications.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@radekmie I made new modifications. Please recheck.
The only drawback is how garbage input gets handled now:
<#if><tr><p>Test</p></tr><#/if> // Garbage input
// jQuery returns a length of 1
// Current solution returns 4 jQuery would return a length of 1 as it attempts to maintain a root element for garbage input but in the new implementation it returns 4 as it creates a new element for each tag. I feel it's a small price to pay without trying to over engineer the current solution.
Also regarding the keepScripts part you mentioned which meant jQuery by default removes the script tag is now accounted for by a test case and https://github.com/apostrophecms/sanitize-html is used to handle other XSS. So in theory, when it comes to security the current implementation is better than the previous one.
EDIT: It appears that HTML sanitization causes problems due to event removal, we might need to only stick to script tag and call it a day 🤷. We'll see.
|
Should we consider use of |
|
@distalx I think we can put this on the list for improvements as it definitely makes sense for larger lists etc. However for now I'd like to have a maximum in compliance to the existing code behavior in order to not break things unless really necessary. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR attempts to replace jQuery's parseHTML function with a native implementation to reduce jQuery dependency in the Blaze templating library. The change introduces a new implementation using the sanitize-html npm package for HTML sanitization and native DOM APIs for parsing. However, this represents a fundamental departure from jQuery's behavior, introducing HTML sanitization where none existed before.
Key changes:
- Replaces
jQuery.parseHTML()with a native implementation usingsanitize-htmland DOM APIs - Adds comprehensive test coverage for HTML parsing edge cases, whitespace handling, and XSS prevention
- Introduces
sanitize-html@2.11.0as a new npm dependency
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 13 comments.
| File | Description |
|---|---|
| packages/blaze/dombackend.js | Implements new parseHTML function with sanitize-html integration and table element wrapping logic |
| packages/blaze/package.js | Adds sanitize-html@2.11.0 dependency |
| packages/blaze/render_tests.js | Adds extensive test suite for parseHTML functionality and XSS prevention |
Critical Concerns:
The implementation has several critical issues that need to be addressed:
-
Breaking API Change: jQuery's
parseHTMLdid NOT sanitize HTML - it only parsed it. This implementation fundamentally changes the behavior by stripping content, which will break applications that rely on parseHTML preserving all HTML (e.g., for template compilation). -
Security Configuration Contradicts Tests: The sanitization config explicitly allows event handlers (
onclick,onmouseover, etc.) while tests expect them to be stripped. This creates both a security vulnerability and test failures. -
Browser Compatibility: The implementation uses
HTMLTemplateElementwhich is not supported in IE11, contradicting the PR's stated goal of ensuring cross-browser compatibility. -
Incomplete Implementation: The
contextparameter is accepted but never used, breaking the API contract. Table element wrapping is incomplete (missingcol/colgroupcases).
The PR's intention to decouple from jQuery is valuable, but this implementation needs significant rework to either:
- Match jQuery's behavior exactly (no sanitization), or
- Clearly document this as a breaking change with proper migration guidance
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if (tableElementMatch) { | ||
| const tagName = tableElementMatch[1].toLowerCase(); | ||
| // Create appropriate container based on the table element | ||
| switch (tagName) { | ||
| case 'td': | ||
| case 'th': | ||
| container = document.createElement('tr'); | ||
| break; | ||
| case 'tr': | ||
| container = document.createElement('tbody'); | ||
| break; | ||
| case 'tbody': | ||
| case 'thead': | ||
| case 'tfoot': | ||
| container = document.createElement('table'); | ||
| break; | ||
| default: | ||
| container = document.createElement('template'); | ||
| } |
Copilot
AI
Dec 12, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The table element wrapping logic is incomplete. Elements like 'colgroup' and 'col' should be wrapped in a table element for proper parsing, but they fall through to the default case which uses a template element. This will likely cause parsing issues. The switch statement should include a case for 'col' and 'colgroup' similar to tbody/thead/tfoot.
| 'a': ['href', 'target', 'rel'], | ||
| 'img': ['src', 'alt', 'width', 'height'], | ||
| 'iframe': ['src', 'width', 'height', 'frameborder', 'allowfullscreen'], | ||
| 'input': ['type', 'value', 'placeholder', 'checked', 'disabled', 'readonly', 'required', 'pattern', 'min', 'max', 'step', 'minlength', 'maxlength', 'stuff'], |
Copilot
AI
Dec 12, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The attribute name 'stuff' appears to be a placeholder or typo in the input element's allowed attributes list. This should either be removed or replaced with a valid attribute name if it was intended to be something else.
| 'input': ['type', 'value', 'placeholder', 'checked', 'disabled', 'readonly', 'required', 'pattern', 'min', 'max', 'step', 'minlength', 'maxlength', 'stuff'], | |
| 'input': ['type', 'value', 'placeholder', 'checked', 'disabled', 'readonly', 'required', 'pattern', 'min', 'max', 'step', 'minlength', 'maxlength'], |
| // Sanitize the HTML with sanitize-html | ||
| const cleanHtml = sanitizeHtml(html, { | ||
| allowedTags: [ | ||
| // Basic elements | ||
| 'div', 'span', 'p', 'br', 'hr', 'b', 'i', 'em', 'strong', 'u', | ||
| 'a', 'img', 'pre', 'code', 'blockquote', | ||
| // Lists | ||
| 'ul', 'ol', 'li', 'dl', 'dt', 'dd', | ||
| // Headers | ||
| 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', | ||
| // Table elements | ||
| 'table', 'thead', 'tbody', 'tfoot', | ||
| 'tr', 'td', 'th', 'col', 'colgroup', | ||
| // Form elements | ||
| 'input', 'textarea', 'select', 'option', 'label', 'button', | ||
| // Other elements | ||
| 'iframe', 'article', 'section', 'header', 'footer', 'nav', | ||
| 'aside', 'main', 'figure', 'figcaption', 'audio', 'video', | ||
| 'source', 'canvas', 'details', 'summary' | ||
| ], | ||
| allowedAttributes: { | ||
| '*': [ | ||
| 'class', 'id', 'style', 'title', 'role', 'data-*', 'aria-*', | ||
| // Allow event handlers | ||
| 'onclick', 'onmouseover', 'onmouseout', 'onkeydown', 'onkeyup', 'onkeypress', | ||
| 'onfocus', 'onblur', 'onchange', 'onsubmit', 'onreset' | ||
| ], | ||
| 'a': ['href', 'target', 'rel'], | ||
| 'img': ['src', 'alt', 'width', 'height'], | ||
| 'iframe': ['src', 'width', 'height', 'frameborder', 'allowfullscreen'], | ||
| 'input': ['type', 'value', 'placeholder', 'checked', 'disabled', 'readonly', 'required', 'pattern', 'min', 'max', 'step', 'minlength', 'maxlength', 'stuff'], | ||
| 'textarea': ['rows', 'cols', 'wrap', 'placeholder', 'disabled', 'readonly', 'required', 'minlength', 'maxlength'], | ||
| 'select': ['multiple', 'disabled', 'required', 'size'], | ||
| 'option': ['value', 'selected', 'disabled'], | ||
| 'button': ['type', 'disabled'], | ||
| 'col': ['span', 'width'], | ||
| 'td': ['colspan', 'rowspan', 'headers'], | ||
| 'th': ['colspan', 'rowspan', 'headers', 'scope'] | ||
| }, | ||
| allowedSchemes: ['http', 'https', 'ftp', 'mailto', 'tel', 'data'], | ||
| allowedSchemesByTag: { | ||
| 'img': ['data'] | ||
| }, | ||
| allowedSchemesAppliedToAttributes: ['href', 'src', 'cite'], | ||
| allowProtocolRelative: true, | ||
| parser: { | ||
| lowerCaseTags: false, // Preserve tag case for proper testing | ||
| decodeEntities: true | ||
| }, | ||
| // Preserve empty attributes | ||
| transformTags: { | ||
| '*': function(tagName, attribs) { | ||
| // Convert null/undefined attributes to empty strings | ||
| Object.keys(attribs).forEach(key => { | ||
| if (attribs[key] === null || attribs[key] === undefined) { | ||
| delete attribs[key]; | ||
| } | ||
| }); | ||
| return { | ||
| tagName, | ||
| attribs | ||
| }; | ||
| } | ||
| } | ||
| }); |
Copilot
AI
Dec 12, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using sanitize-html fundamentally changes the behavior of parseHTML compared to jQuery's implementation. The jQuery.parseHTML function did NOT sanitize HTML - it simply parsed it into DOM nodes. This is a breaking change that alters the API contract. Applications relying on parseHTML to preserve all HTML content (including scripts and event handlers for legitimate use cases like template compilation) will break. Consider implementing parseHTML without sanitization, as the original function was for parsing, not sanitizing.
| { | ||
| html: "<div><p>Test</p><img src='x' onerror='alert(\"XSS\")'></div>", | ||
| description: "Prevents event handler injection", | ||
| checks: (result) => { | ||
| test.equal(result.length, 1, "Should parse into a single element"); | ||
| const div = result[0]; | ||
| const img = div.querySelector('img'); | ||
| test.isNotNull(img, "Image element should be preserved"); | ||
| test.isFalse(img.hasAttribute('onerror'), "Event handler should be stripped"); |
Copilot
AI
Dec 12, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test expects that onerror event handlers should be stripped (line 996), but the sanitize-html configuration explicitly allows onclick, onmouseover, and other event handlers (lines 92-94). This test will fail with the current implementation because the configuration permits these handlers. Either fix the sanitization configuration to strip all event handlers, or update the test to match the actual behavior.
| container = document.createElement('template'); | ||
| } | ||
| } else { | ||
| container = document.createElement('template'); | ||
| } | ||
|
|
||
| // Sanitize the HTML with sanitize-html | ||
| const cleanHtml = sanitizeHtml(html, { | ||
| allowedTags: [ | ||
| // Basic elements | ||
| 'div', 'span', 'p', 'br', 'hr', 'b', 'i', 'em', 'strong', 'u', | ||
| 'a', 'img', 'pre', 'code', 'blockquote', | ||
| // Lists | ||
| 'ul', 'ol', 'li', 'dl', 'dt', 'dd', | ||
| // Headers | ||
| 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', | ||
| // Table elements | ||
| 'table', 'thead', 'tbody', 'tfoot', | ||
| 'tr', 'td', 'th', 'col', 'colgroup', | ||
| // Form elements | ||
| 'input', 'textarea', 'select', 'option', 'label', 'button', | ||
| // Other elements | ||
| 'iframe', 'article', 'section', 'header', 'footer', 'nav', | ||
| 'aside', 'main', 'figure', 'figcaption', 'audio', 'video', | ||
| 'source', 'canvas', 'details', 'summary' | ||
| ], | ||
| allowedAttributes: { | ||
| '*': [ | ||
| 'class', 'id', 'style', 'title', 'role', 'data-*', 'aria-*', | ||
| // Allow event handlers | ||
| 'onclick', 'onmouseover', 'onmouseout', 'onkeydown', 'onkeyup', 'onkeypress', | ||
| 'onfocus', 'onblur', 'onchange', 'onsubmit', 'onreset' | ||
| ], | ||
| 'a': ['href', 'target', 'rel'], | ||
| 'img': ['src', 'alt', 'width', 'height'], | ||
| 'iframe': ['src', 'width', 'height', 'frameborder', 'allowfullscreen'], | ||
| 'input': ['type', 'value', 'placeholder', 'checked', 'disabled', 'readonly', 'required', 'pattern', 'min', 'max', 'step', 'minlength', 'maxlength', 'stuff'], | ||
| 'textarea': ['rows', 'cols', 'wrap', 'placeholder', 'disabled', 'readonly', 'required', 'minlength', 'maxlength'], | ||
| 'select': ['multiple', 'disabled', 'required', 'size'], | ||
| 'option': ['value', 'selected', 'disabled'], | ||
| 'button': ['type', 'disabled'], | ||
| 'col': ['span', 'width'], | ||
| 'td': ['colspan', 'rowspan', 'headers'], | ||
| 'th': ['colspan', 'rowspan', 'headers', 'scope'] | ||
| }, | ||
| allowedSchemes: ['http', 'https', 'ftp', 'mailto', 'tel', 'data'], | ||
| allowedSchemesByTag: { | ||
| 'img': ['data'] | ||
| }, | ||
| allowedSchemesAppliedToAttributes: ['href', 'src', 'cite'], | ||
| allowProtocolRelative: true, | ||
| parser: { | ||
| lowerCaseTags: false, // Preserve tag case for proper testing | ||
| decodeEntities: true | ||
| }, | ||
| // Preserve empty attributes | ||
| transformTags: { | ||
| '*': function(tagName, attribs) { | ||
| // Convert null/undefined attributes to empty strings | ||
| Object.keys(attribs).forEach(key => { | ||
| if (attribs[key] === null || attribs[key] === undefined) { | ||
| delete attribs[key]; | ||
| } | ||
| }); | ||
| return { | ||
| tagName, | ||
| attribs | ||
| }; | ||
| } | ||
| } | ||
| }); | ||
|
|
||
| // Parse the sanitized HTML | ||
| container.innerHTML = cleanHtml; | ||
|
|
||
| // Return the nodes, handling both template and regular elements | ||
| return Array.from(container instanceof HTMLTemplateElement ? container.content.childNodes : container.childNodes); |
Copilot
AI
Dec 12, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using HTMLTemplateElement is not supported in Internet Explorer 11 and older browsers. The code creates a template element on line 63 and 66, then checks if it's an HTMLTemplateElement on line 139. For browsers that don't support template elements, this will fail. The PR description mentions cross-browser compatibility as a key concern, but this implementation breaks IE11 support that jQuery.parseHTML provided.
| 'class', 'id', 'style', 'title', 'role', 'data-*', 'aria-*', | ||
| // Allow event handlers | ||
| 'onclick', 'onmouseover', 'onmouseout', 'onkeydown', 'onkeyup', 'onkeypress', | ||
| 'onfocus', 'onblur', 'onchange', 'onsubmit', 'onreset' |
Copilot
AI
Dec 12, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Allowing inline event handlers (onclick, onmouseover, etc.) in the sanitization configuration defeats the purpose of XSS prevention. These event handler attributes can execute arbitrary JavaScript code and are a common XSS attack vector. The XSS prevention tests (lines 976-1019) expect these handlers to be stripped, but the configuration explicitly allows them, which means the tests are actually verifying incorrect behavior.
| 'class', 'id', 'style', 'title', 'role', 'data-*', 'aria-*', | |
| // Allow event handlers | |
| 'onclick', 'onmouseover', 'onmouseout', 'onkeydown', 'onkeyup', 'onkeypress', | |
| 'onfocus', 'onblur', 'onchange', 'onsubmit', 'onreset' | |
| 'class', 'id', 'style', 'title', 'role', 'data-*', 'aria-*' |
| }, | ||
| allowedSchemes: ['http', 'https', 'ftp', 'mailto', 'tel', 'data'], | ||
| allowedSchemesByTag: { | ||
| 'img': ['data'] |
Copilot
AI
Dec 12, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Allowing 'javascript:' protocol in iframe src attributes is a critical security vulnerability. The allowedSchemes list doesn't include 'javascript', but the test on line 1001-1011 expects javascript: URLs to be stripped. However, sanitize-html may not strip javascript: URLs from iframes by default unless explicitly configured. The configuration should explicitly exclude javascript: protocol or use disallowedTagsMode to ensure proper sanitization.
| 'img': ['data'] | |
| 'img': ['data'], | |
| 'iframe': ['http', 'https'] |
| Tinytest.add("blaze - security - XSS prevention in HTML parsing", function (test) { | ||
| const xssTestCases = [ | ||
| { | ||
| html: "<div><p>Test</p><script>alert('XSS')</script></div>", | ||
| description: "Prevents inline script execution", | ||
| checks: (result) => { | ||
| test.equal(result.length, 1, "Should parse into a single element"); | ||
| const div = result[0]; | ||
| test.equal(div.querySelector('script'), null, "Script tag should be removed"); | ||
| test.equal(div.querySelector('p').textContent, "Test", "Safe content should be preserved"); | ||
| } |
Copilot
AI
Dec 12, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The XSS prevention test expects script tags to be removed (line 984), but this represents a change in behavior from jQuery.parseHTML which would preserve script tags (though not execute them). If the goal is to replace jQuery.parseHTML with equivalent functionality, these tests are validating incorrect behavior. The tests should either be removed, or the implementation should be clearly documented as intentionally diverging from jQuery's behavior for security reasons.
| const selfClosing = "<div/>Content"; | ||
| const selfClosingResult = Blaze._DOMBackend.parseHTML(selfClosing); | ||
| test.equal(selfClosingResult.length, 1); | ||
| test.equal(selfClosingResult[0].nodeName, "DIV"); | ||
| test.equal(selfClosingResult[0].nodeType, Node.ELEMENT_NODE); |
Copilot
AI
Dec 12, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The self-closing tag test on line 854 expects parseHTML to handle "<div/>Content" by creating a DIV element with "Content" as text content. However, sanitize-html may parse this differently than jQuery, potentially creating an empty div followed by a text node. The test should verify the actual structure that results from this input, including checking if "Content" is inside or outside the div element.
| // Check if createHTMLDocument is supported directly | ||
| if (document.implementation && document.implementation.createHTMLDocument) { | ||
| DOMBackend._context = document.implementation.createHTMLDocument(""); |
Copilot
AI
Dec 12, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment mentions checking for createHTMLDocument support directly, but the original code checked via jQuery.support.createHTMLDocument which may have included additional browser-specific checks or polyfills. The direct document.implementation check might not account for all edge cases that jQuery handled. Consider verifying that this simplified check works correctly across all target browsers, especially older ones.
Decoupling jQuery and Blaze would take substantial effort. jQuery is used in many places:
They're ranked them in terms of ease of replacement, and impact. The challenge lies mostly in testing and ensuring cross browser compatibility so it's best to merge each change individually, do a minor release, test then repeat until all is merged then we can do a major release.
I chose to start off with parseHTML. It present a nice challenge where even if we got it off won't cause major errors and can act an indicator if the moving out of jQuery would be doable.
This implementation and the official tests were used in constructing the new tests to ensure backwards compatibility. You may remove the code I did and then re-add the jQuery
parseHTMLfunction and you'd find the tests still pass.