How should we encode text?

If we work from Unicode, we could theoretically create a variable-length encoding based in base-5 instead of simply tossing UTF-8 into the block encoding.  This would provide for greater efficiency for text storage while retaining the entire Unicode charset.

Any ideas on how to spec this out?