Skip to content

UTF-8 encoded values lose their encoding on zk.get() #81

@jperville

Description

@jperville

Introduction

As I understand it, zookeeper stores data as array of bytes internally, which should make it encoding-agnostic. However, when I use a zk client library, I expect the data I store in zookeeper to have consistent encoding all the time.

The problem

UTF-8 encoded strings stored in zookeeper with zk.create turn into ASCII-8BIT encoded strings (displaying the UTF-8 bytes) after being retrieved with zk.get. This is true for characters in the ascii range (where it is not too bad) and also for characters outside the ascii range (where it is more problematic because those strings will not encode back to UTF-8 without raising Encoding::UndefinedConversionError).

Workaround: in application code, force the encoding after retrieving the data (eg. data.force_encoding('UTF-8').

PS: Using ruby 1.9.3 and 2.1 on Ubuntu 14.04 LTS (amd 64).

How to reproduce

Save and run the following script (zk must be installed or part of the current bundle):

#!/usr/bin/env ruby1.9.1
# -*- encoding: utf-8 -*-

require 'zk'

def encoding_bug(zk, val, path='/testme-encoding')
  puts "* we would expect the original value and its copy retrieved from zk to be the same"
  puts "* however the retrieved value lost has its original encoding and must be force-encoded"
  puts "* to be usable with eg. JSON.encode() which cast non-UTF-8 strings to UTF-8."

  puts "original value #{val.inspect}, with encoding: " + val.encoding.inspect
  zk.create(path, val)
  begin
    val2 = zk.get(path).first
    puts "retrieved value #{val2.inspect}, with encoding: " + val2.encoding.inspect
    print "attempting to encode val2 to UTF-8, its real encoding => "
    begin
      val2.encode('UTF-8')
      raise "should be failing with Encoding::UndefinedConversionError!"
    rescue Encoding::UndefinedConversionError => e
      puts "as expected, raises " + e.inspect
    end
    print "attempting to force encoding to 'UTF-8', its original encoding => "
    begin
      val2.force_encoding('UTF-8')
      puts "succeeds, val2 is now " + val2.inspect
    rescue => e
      puts "encountered an unexpected exception: " + e.inspect
    end
  ensure
    zk.delete(path)
  end
end

uri = ARGV.first || 'localhost:2181'
encoding_bug(ZK.new(uri), 'é')

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions