chore(l10n): extract strings, add docs, load locales on demand#452
chore(l10n): extract strings, add docs, load locales on demand#452
Conversation
|
ba08f2e was deployed to: https://fred-pr452.review.mdn.allizom.net/ |
|
Converted to draft, as it has merge conflicts. |
bcolsson
left a comment
There was a problem hiding this comment.
I understand that more work is happening around localization at some point in the future, but I noticed some major issues that could cause headaches down the road, so adding comments. I'd suggest making changes before asking localizers to translate these. (If the expectation is to use Pontoon, then the complicated Fluent logic wouldn't be supported well either.)
| footer-copyright = Portions of this content are ©1998–{ $year } by individual mozilla.org contributors. Content available under <a data-l10n-name="cc">a Creative Commons license</a>. | ||
| search-modal-site-search = Site search for <em>{ $query }</em> | ||
| site-search-search-stats = Found { $results } documents. | ||
| site-search-suggestion-matches = |
There was a problem hiding this comment.
While this is technically valid fluent it's best to avoid this pattern. This should be two strings - otherwise you can't guarantee that all patterns will have translations (reference).
Typically you want to remove the logic from the strings themselves, so it's preferable to have a -greaterthan and a -equals string then have the logic happening outside of fluent to choose the correct string.
| compat-support-flags = | ||
| { NUMBER($has_added) -> | ||
| [one] From version { $version_added } | ||
| *[other] { "" } | ||
| }{ $has_last -> | ||
| [one] | ||
| { NUMBER($has_added) -> | ||
| *[zero] Until { $versionLast } users | ||
| [one] { " " }until { $versionLast } users | ||
| } | ||
| *[zero] | ||
| { NUMBER($has_added) -> | ||
| *[zero] Users | ||
| [one] { " " }users | ||
| } | ||
| } | ||
| { " " }must explicitly set the <code data-l10n-name="name">{ $flag_name }</code>{ " " } | ||
| { $flag_type -> | ||
| *[preference] preference | ||
| [runtime_flag] runtime flag | ||
| }{ NUMBER($has_value) -> | ||
| [one] { " " }to <code data-l10n-name="value">{ $flag_value }</code> | ||
| *[other] { "" } | ||
| }{ "." } | ||
| { $flag_type -> | ||
| [preference] To change preferences in { $browser_name }, visit { $browser_pref_url }. | ||
| *[other] { "" } | ||
| } |
There was a problem hiding this comment.
This is untranslatable. Even in English I barely understand what a final string would look like, so compounding translations with different word order, you'll have a very difficult time to get something comprehensible out of this.
This should be split into multiple strings. E.g.
string-a = From version { version }
string-b = From version { version } until {version} users
ETc.
| compat-experimental = Experimental | ||
| compat-nonstandard = Non-standard | ||
| compat-no = No | ||
| compat-support-full = Full support |
There was a problem hiding this comment.
I'm finding this and multiple other strings with the exact same ID. You'll need to find a way to avoid duplication.
| compat-legend-preview = In development. Supported in a pre-release version. | ||
| compat-legend-no = { compat-support-no } | ||
| compat-legend-unknown = Compatibility unknown | ||
| compat-legend-experimental = { compat-experimental }. Expect behavior to change in the future. |
There was a problem hiding this comment.
I sort of understand why you want to use a reference here and below, but there is the potential for localization issues here. It'd be simpler for localizers to just have the word instead of a message reference.
|
@bcolsson thanks for your comments, once we've got the mechanics of our string scraping sorted out in this PR, we'll open others to resolve what you've noted, and make some of the auto-generated ids more meaningful. @caugner I've updated this PR in line with what we discussed a little while ago in Berlin: removing the ability to add strings without defining an id. I temporarily used some of that logic in the migration script to update the current id-less instances to ones with ids. Commits are mostly atomic - they may not all stand alone, because for some of them I simply split up my previous large commit more logically - but should be much nicer to review than before. |
| "l10n:extract": "node l10n/parser/extract.js", | ||
| "l10n:lint": "node l10n/parser/extract.js --lint" |
There was a problem hiding this comment.
How about:
| "l10n:extract": "node l10n/parser/extract.js", | |
| "l10n:lint": "node l10n/parser/extract.js --lint" | |
| "l10n": "node l10n/parser/extract.js", | |
| "l10n:lint": "node l10n/parser/extract.js --lint" |
Rather than extract.js maybe it could be called l10n/cli/index.js?
| const locale = req.path.split("/")[1]; | ||
| if (locale && /^q[a-t][a-z]$/.test(locale)) { | ||
| req.path = req.path.replace(locale, "en-US"); | ||
| } |
There was a problem hiding this comment.
I don't understand what this does, can you explain in a code comment?
There was a problem hiding this comment.
Is this intentionally not alphabetically ordered?
| * Runs on either client or server, | ||
| * and returns the client or server context respectively | ||
| * @returns {import("./types.js").SymmetricContext} | ||
| * @returns {import("./types.js").SymmetricContext | undefined} |
There was a problem hiding this comment.
Should the JSDoc comment explain when this is undefined?
| * @import { TextElement } from "@fluent/syntax"; | ||
| */ | ||
|
|
||
| class AccentTransformer extends Transformer { |
There was a problem hiding this comment.
Can you add a JSDoc comment explaining what this transformer does?
l10n/parser/extractor.js
Outdated
| export async function extract() { | ||
| const manualStrings = await readFile( | ||
| fileURLToPath(import.meta.resolve("../locales/en-US.ftl")), | ||
| "utf8", | ||
| ); | ||
| const fluentResource = parse(manualStrings, {}); | ||
|
|
||
| const project = new Project({}); | ||
| project.addSourceFilesAtPaths( | ||
| path.join(__dirname, "..", "..", "components", "**", "*.js"), | ||
| ); | ||
|
|
||
| /** @type {Map<string, string>} */ | ||
| const map = new Map(); | ||
|
|
||
| for (const file of project.getSourceFiles()) { | ||
| for (const taggedTemplate of file.getDescendantsOfKind( | ||
| SyntaxKind.TaggedTemplateExpression, | ||
| )) { | ||
| const tagNode = taggedTemplate.getTag(); | ||
| if (Node.isCallExpression(tagNode)) { | ||
| // e.g. this.l10n("foobar")`barfoo` | ||
| const expr = tagNode.getExpression(); | ||
| if (Node.isPropertyAccessExpression(expr) && isL10nTag(expr)) { | ||
| const [arg] = tagNode.getArguments(); | ||
| if (Node.isStringLiteral(arg)) { | ||
| const key = arg.getLiteralValue(); | ||
| const value = getLiteralValue(taggedTemplate); | ||
| map.set(key, value); | ||
| } | ||
| } | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Could we split this up a bit, to make some parts of it unit-testable?
| let message; | ||
|
|
||
| if (this.locale === "qa") { | ||
| if (this.locale === "qai") { |
| - `qaa`: "accented" locale: adds accents to all characters, duplicates some vowels to create longer strings, wraps string in square brackets to help detect truncation | ||
| - `qai`: "id" locale: replaces strings with their identifiers, wrapped in square brackets | ||
|
|
||
| The `qai` locale works all the time, the `qaa` locale must be manually generated with `node ./parser/transform.js` |
There was a problem hiding this comment.
Maybe make node ./parser/transform.js a script in package.json?
| article-footer-last-modified = This page was last modified on <time data-l10n-name="date">{ $date }</time> by <a data-l10n-name="contributors">MDN contributors</a>. | ||
| article-footer-source-title = Folder: { $folder } (Opens in a new tab) | ||
| baseline-asterisk = Some parts of this feature may have varying levels of support. | ||
| baseline-high-extra = This feature is well established and works across many devices and browser versions. It’s been available across browsers since { $date }. | ||
| baseline-low-extra = Since { $date }, this feature works across the latest devices and browser versions. This feature might not work in older devices or browsers. | ||
| baseline-not-extra = This feature is not Baseline because it does not work in some of the most widely-used browsers. | ||
| baseline-supported-in = Supported in { $browsers } | ||
| baseline-unsupported-in = Not widely supported in { $browsers } | ||
| baseline-supported-and-unsupported-in = Supported in { $supported }, but not widely supported in { $unsupported } | ||
| homepage-hero-title = Resources for Developers,<br> by Developers | ||
| homepage-hero-description = Documenting <a data-l10n-name="css">CSS</a>, <a data-l10n-name="html">HTML</a>, and <a data-l10n-name="js">JavaScript</a>, since 2005. | ||
| not-found-title = Page not found | ||
| not-found-description = Sorry, the page <code data-l10n-name="url">{ $url }</code> could not be found. | ||
| not-found-fallback-english = <strong data-l10n-name="strong">Good news:</strong> The page you requested exists in <em data-l10n-name="em">English</em>. | ||
| not-found-fallback-search = The page you requested doesn't exist, but you could try a site search for: | ||
| not-found-back = Go back to the home page |
There was a problem hiding this comment.
What would be nice is if it detected multiple lines with common prefix and added a line between them, for readability.
Gets things to a stage where all strings are - in theory - l10n-able. Next steps will be improving the l10n experience, and further optimising which strings we ship to the client.
The added README is a good source for some of what's going on here.