Skip to content

Conversation

@CarlosNihelton
Copy link
Contributor

@CarlosNihelton CarlosNihelton commented Feb 5, 2026

Fixes: #6716

Including a small peasant fix in a comment about the WSL2 /init being proprietary (no longer the case since WSL2 was open sourced last year).

And using the echo. syntax instead of echo to prevent a very unlikely corner case of the environment variable set to white spaces:

C:\Users\João Martín😁>set unknown=  # There is a space here

C:\Users\João Martín😁>echo %unknown%
ECHO is ON

C:\Users\João Martín😁>echo.%unknown%


Proposed Commit Message

fix(WSL): Always subprocess cmd.exe in UTF-16 mode  # no more than 72 characters

As we manipulate paths acquired by subprocessing cmd.exe inside WSL,
by using it in UTF-16 mode we ensure a predictable output when the strings
are not ASCII-compatible, such as reading the user profile when it contains special characters.

Fixes GH-6716

Additional Context

Test Steps

Merge type

  • Squash merge using "Proposed Commit Message"
  • Rebase and merge unique commits. Requires commit messages per-commit each referencing the pull request number (#<PR_NUM>)

Peasant comment fix: /init is now open source (as part of WSL2).

Fixes: canonical#6716
That function can now throw UnicodeDecodeError, which inherits from
ValueError, so we should catch ValueError as before.
@holmanb holmanb self-assigned this Feb 5, 2026
@holmanb
Copy link
Member

holmanb commented Feb 5, 2026

Thanks for this contribution @CarlosNihelton! A couple of requests:

  1. Could you please add some test coverage for ds-identify? (tests/unittests/test_ds_identify.py)
  2. Could you please run cloud-init collect-logs on a system that booted with these changes and attach the tarball?

Also, for my own understanding, I would like to know what environment variables are set by the calling processes for both ds-identify and cloud-init's Python code. Could I ask you to instrument each of these and sharing the results? In ds-identify something like debug 1 $(env) would work. In the Python code logging the content of os.environ would suffice.

@CarlosNihelton
Copy link
Contributor Author

CarlosNihelton commented Feb 5, 2026

Hi @holmanb!

Here are the logs from a system with username containing non-ascii characters and with the datasource and ds-identify patched: cloud-init.tar.gz

Regarding adding coverage to ds-identify I need to ask you a deeper question. AFAICT to make this changeset testable I'd need to break this assignment into two lines: _RET=$(/init "$exepath" /u /c "$@" 2>/dev/null | iconv -f UTF-16LE -t UTF-8), otherwise they are replaced by mocks and there is nothing to cover (as it is currently). But POSIX shells don't like NULL bytes in the middle of strings, and UTF-16 has plenty of them.

j@DESKTOP-551PQ9O:/mnt/c/Users/João Martín😁$ var=$(cmd.exe /U /C echo.%USERPROFILE%)
-bash: warning: command substitution: ignored null byte in input
j@DESKTOP-551PQ9O:/mnt/c/Users/João Martín😁$ profile=$( echo "$_RET" | iconv -f UTF-16LE -t UTF-8)
j@DESKTOP-551PQ9O:/mnt/c/Users/João Martín😁$ echo $profile
㩃啜敳獲䩜慍瑲滭😁਍

So, I need something like base64 in the middle of this process to ensure I can turn the UTF16 bytes into something the shell stores in a variable and then recover it piping into iconv.

j@DESKTOP-551PQ9O:/mnt/c/Users/João Martín😁$ _RET=$(cmd.exe /U /C echo.%USERPROFILE% | base64)
j@DESKTOP-551PQ9O:/mnt/c/Users/João Martín😁$ echo $_RET
QwA6AFwAVQBzAGUAcgBzAFwASgBvAOMAbwAgAE0AYQByAHQA7QBuAD3YAd4NAAoA
j@DESKTOP-551PQ9O:/mnt/c/Users/João Martín😁$ profile=$(echo $_RET | base64 -d | iconv -f UTF-16LE -t UTF-8)
j@DESKTOP-551PQ9O:/mnt/c/Users/João Martín😁$ echo $profile
C:\Users\João Martín😁

With the base64 approach I'd remove the pipe to iconv from the WSL_run_cmd function and put it in the call site. Then we can further test the function WSL_profile_dir() by mocking WSL_run_cmd to ouput UTF-16 encoded data.
But that comes with the cost of adding two calls to base64.

diff --git a/tools/ds-identify b/tools/ds-identify
index c2a6d69ea..e4b8507ea 100755
--- a/tools/ds-identify
+++ b/tools/ds-identify
@@ -1710,7 +1710,7 @@ WSL_run_cmd() {
     shift
     # Using the '/u' flag to enforce Unicode (UTF-16 LE), thus we need to decode it afterwards.
     # It's more reliable than the default ANSI Code Pages for anything above the ASCII range.
-    _RET=$(/init "$exepath" /u /c "$@" 2>/dev/null | iconv -f UTF-16LE -t UTF-8)
+    _RET=$(/init "$exepath" /u /c "$@" 2>/dev/null | base64)
 }
 
 WSL_profile_dir() {
@@ -1725,6 +1725,7 @@ WSL_profile_dir() {
             # to output the Windows user profile directory path, which is
             # held by the environment variable %USERPROFILE%.
             WSL_run_cmd "$cmdexe" "echo.%USERPROFILE%"
-             profiledir="${_RET%%[[:cntrl:]]}"
+            profiledir=$(echo $_RET | base64 -d | iconv -f UTF-16LE -t UTF-8)
+             profiledir="${profiledir%%[[:cntrl:]]}"
             if [ -n "$profiledir" ]; then
                 # wslpath is a program supplied by WSL itself that translates Windows and Linux paths,

WDYT?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

WSL: datasource fails to find user-data if Windows username contains non-ASCII chars

2 participants