This change is needed to skip a nasty behaviour: if the string triggered
a decoding error, it will trigger one *AGAIN* if the string is printed
to console. By writing directly to sys.stderr we skip the
locale/conversion issues and get the troublesome string directly in a
file where it is stored as a raw sequence of octets.
This encoder is used to handle input from HH conversion, which needs to
end up as UTF-8 in the database. Switch the open-coded routine from
Database.py to this common routine so all encodings now take place in
the same file.
It seems that running encoder.encode() on a latin1/latin9 string results
in, yes a bloody UnicodeDecodeError. Decode error on .encode()...
Really. This way the modification from non-unicode string to real
unicode appears to work better.