-
Notifications
You must be signed in to change notification settings - Fork 79
Description
As a wiki author working in a language with a non-Latin script, I want to be able to link to a page like [[Гильдии]] (the equivalent of [[Guilds]] in Russian). Currently wiki strips out all the non-Latin characters from page titles, so Гильдии converts to a slug that is the empty string.
In the specific case of the Russian language, the alphabet is very phonetic, so many Russian websites have software to solve this problem by mapping the Cyrillic letters to Latin letters or clusters of Latin letters, in this case Gil'dii. However, it seems likely that such a "Russian mode" would not be the best solution for wiki.
It looks to me like there is a fork in the road. I will call the two general paths I see "incompatible slug" and "compatible slug" as a best effort to describe this.
Incompatible slug
We could opt to present non-Latin page title characters directly in the URLs and let the backend map those to and from some kind of encoding for the filenames (or even filenames in a subset of UTF-8). This is the general approach Wikipedia takes. When someone links to [[Гильдии]], Wikipedia URLencodes that Unicode page title into the link like <a href="/wiki/%D0%93%D0%B8%D0%BB%D1%8C%D0%B4%D0%B8%D0%B8">. Users see /wiki/Гильдии in the URL bar. I don't know what exactly happens on the back end.
I suspect this approach has a different texture than the current approach of Federated Wiki. I think it would represent a change in direction and would have repercussions that would upset the "just enough" ethos that characterizes this project.
Compatible slug
Alternatively, we could convert non-Latin letters to some slug that is compatible with the current backend.
This is pretty much exactly how domain names with non-Latin letters are handled: they are mapped to Latin letters using Punycode (in which Гильдии is lowercased and encodes to xn--c1aclbap3j), and are then compatible with existing DNS. Slugs that are not based primarily on Latin characters are unreadable to humans, but they are unambiguously decodable to a lowercased version of the non-Latin input.
If we were to add Punycode encoding to the asSlug method, I think a lot of things would just work, especially based on how well wiki works when I create a page by typing [[по-русски]]: this discards all the letters and just creates the slug -, but this works fine as long as I don't make another such page.
I have not yet dug into the code for sitemaps and searches, but I would imagine that this code would need to become aware of Punycode-decoding slugs that begin with xn--. But once that's in, I think it would be quite transparent.
- Author links to
[[Гильдии]] - Slug is calculated to be
xn--c1aclbap3j - Click to create the page, the file is created with filename
xn--c1aclbap3j - Reader types
гилinto the search bar - Search code has seen
xn--c1aclbap3jin the sitemap information and decoded that toгильдииfor search matching purposes гилmatches as a substring ofгильдииjust asguilmatches as a substring ofguildsfor a page namedGuilds.- Reader is given a link to the page with slug
xn--c1aclbap3jand clicks it - Wiki sees the Punycode and while the page loads displays the slug in lowercased non-Latin characters like
гильдии, which is replaced withГильдииwhen the page title loads
So the only real user-visible wart is the presence of the Punycode in the link itself.
This could even be a stepping stone to eventually keeping the client experience in native languages in future steps.
I may be missing other uses of the slug that need to be accounted for.
This issue is an offshoot of conversations at fedwiki/wiki-client#103 and #139.