Andrew Gallagher andrewg at andrewg.com
Mon May 23 23:54:47 CEST 2016

On 23 May 2016, at 23:24, Robert J. Hansen <rjh at sixdemonbag.org> wrote:

>> In the case of "all 8-bit characters, no 7-bit" you're dealing with
>> either a practical joker or EBCDIC. Same thing really...
> Or KOI-8R/Windows-1251.

I'd forgotten about that. Or any of the iso-8859 that encode non-Latin scripts. Or shift-jis. Or... Or... :-(

> Yeah, that's what I'm afraid of.  It's not valid UTF-8 encodings that
> trouble me: it's having to deal with unknown encodings.

One of the little-appreciated advantages of UTF8 is that its horribly inefficient byte level encoding is so distinctive.

I'm afraid this is one more case where the only reasonable position to take is "speak UTF8 or the management cannot be held responsible"... ;-)


