Kangaderoo has a patch which potentially fixes storage and display issues for users who do not have their database text storage as utf8.
Functions and variables added to Charset to disect the patch
This change is needed to skip a nasty behaviour: if the string triggered
a decoding error, it will trigger one *AGAIN* if the string is printed
to console. By writing directly to sys.stderr we skip the
locale/conversion issues and get the troublesome string directly in a
file where it is stored as a raw sequence of octets.
This encoder is used to handle input from HH conversion, which needs to
end up as UTF-8 in the database. Switch the open-coded routine from
Database.py to this common routine so all encodings now take place in
the same file.
It seems that running encoder.encode() on a latin1/latin9 string results
in, yes a bloody UnicodeDecodeError. Decode error on .encode()...
Really. This way the modification from non-unicode string to real
unicode appears to work better.
If the system (display) locale is UTF-8, there is no need to encode to
either direction. In fact, running the .encode() routine appears to
mangle a valid UTF-8 string to a worse condition, effectively breaking
it.