Commit Graph

8 Commits

Author SHA1 Message Date
Eric Blade
e56cb24ed1 to_utf8 returns what was passed in if unicode() errors because it's already encoded 2010-01-29 12:01:51 -05:00
Mika Bostrom
7f04ed88f4 Use proper encoding name
When system is unicode, the second item in locale.getdefaultlocale() is
"UTF8", not "utf-8".
2010-01-26 08:01:46 +02:00
Mika Bostrom
85c9070ec8 Write charmap-related errors directly to stderr
This change is needed to skip a nasty behaviour: if the string triggered
a decoding error, it will trigger one *AGAIN* if the string is printed
to console. By writing directly to sys.stderr we skip the
locale/conversion issues and get the troublesome string directly in a
file where it is stored as a raw sequence of octets.
2010-01-24 22:17:03 +02:00
Mika Bostrom
33277ce68b Add new encoder
This encoder is used to handle input from HH conversion, which needs to
end up as UTF-8 in the database. Switch the open-coded routine from
Database.py to this common routine so all encodings now take place in
the same file.
2010-01-24 21:11:46 +02:00
Mika Bostrom
04c345ae1f Use a different "unicoder" for db strings
It seems that running encoder.encode() on a latin1/latin9 string results
in, yes a bloody UnicodeDecodeError. Decode error on .encode()...
Really. This way the modification from non-unicode string to real
unicode appears to work better.
2010-01-21 21:46:14 +02:00
Mika Bostrom
dda00b6b10 Catch character encoding errors 2010-01-21 21:31:19 +02:00
Mika Bostrom
e915b0b62c Allow to bypass codec
If the system (display) locale is UTF-8, there is no need to encode to
either direction. In fact, running the .encode() routine appears to
mangle a valid UTF-8 string to a worse condition, effectively breaking
it.
2010-01-21 21:23:13 +02:00
Mika Bostrom
34bf2bd8e9 Use better function name 2010-01-21 18:12:45 +02:00