-
Notifications
You must be signed in to change notification settings - Fork 58
Description
Introduction
As I understand it, zookeeper stores data as array of bytes internally, which should make it encoding-agnostic. However, when I use a zk client library, I expect the data I store in zookeeper to have consistent encoding all the time.
The problem
UTF-8 encoded strings stored in zookeeper with zk.create turn into ASCII-8BIT encoded strings (displaying the UTF-8 bytes) after being retrieved with zk.get. This is true for characters in the ascii range (where it is not too bad) and also for characters outside the ascii range (where it is more problematic because those strings will not encode back to UTF-8 without raising Encoding::UndefinedConversionError).
Workaround: in application code, force the encoding after retrieving the data (eg. data.force_encoding('UTF-8').
PS: Using ruby 1.9.3 and 2.1 on Ubuntu 14.04 LTS (amd 64).
How to reproduce
Save and run the following script (zk must be installed or part of the current bundle):
#!/usr/bin/env ruby1.9.1
# -*- encoding: utf-8 -*-
require 'zk'
def encoding_bug(zk, val, path='/testme-encoding')
puts "* we would expect the original value and its copy retrieved from zk to be the same"
puts "* however the retrieved value lost has its original encoding and must be force-encoded"
puts "* to be usable with eg. JSON.encode() which cast non-UTF-8 strings to UTF-8."
puts "original value #{val.inspect}, with encoding: " + val.encoding.inspect
zk.create(path, val)
begin
val2 = zk.get(path).first
puts "retrieved value #{val2.inspect}, with encoding: " + val2.encoding.inspect
print "attempting to encode val2 to UTF-8, its real encoding => "
begin
val2.encode('UTF-8')
raise "should be failing with Encoding::UndefinedConversionError!"
rescue Encoding::UndefinedConversionError => e
puts "as expected, raises " + e.inspect
end
print "attempting to force encoding to 'UTF-8', its original encoding => "
begin
val2.force_encoding('UTF-8')
puts "succeeds, val2 is now " + val2.inspect
rescue => e
puts "encountered an unexpected exception: " + e.inspect
end
ensure
zk.delete(path)
end
end
uri = ARGV.first || 'localhost:2181'
encoding_bug(ZK.new(uri), 'é')