Right now, the Kerika user interface is entirely in English, but we have users worldwide and many of them use Kerika with other languages, e.g. Greek, Japanese, Korean, etc.
When you export data from a Task Board or Scrum Board that includes non-English characters, the foreign characters are actually preserved correctly as part of the exported data, but if you need to then import data into some other program, like Microsoft Word or Excel, you need to make sure the other program correctly correctly interprets the text as being in UTF-8 format.
UTF-8 is a coding standard that can handle all possible characters, so it works with languages like Greek, Japanese, etc. which don’t use the Roman alphabet.
For a long time now, UTF-8 has been the only global standard that works across all languages, because of its inherent flexibility in handling different character sets.
When you do an export of data from a Kerika Task Board or Scrum Board, we create the CSV files in UTF-8 format, and include what’s called the Byte Order Mark (BOM) in the first octect of the exported file.
Including a BOM is the best way to let all kinds of third-party programs know that the file is encoding in UTF-8: it’s a standard way of saying to other programs, “Hey, guys! This text may contain non-English characters.”
And for the most part, including a BOM works just fine with CSV exports from Kerika: Google Spreadsheets interprets that correctly, Microsoft Excel on Windows interprets that correctly, but not…
EXCEL ON MACS
Many version of Excel for Macs, going back to Office 2007 at least, have a bug that doesn’t correctly process the BOM character. Why this bug persisted for so long is a mystery, but there we are…
The effect of this bug is that an exported file from Kerika, containing non-English characters, will not display correctly inside Excel on Mac, although it will display correctly with other Mac programs, like the simple Text Edit.
There’s not much we can do about this bug, unfortunately.
THE TECHNICAL BACKGROUND TO ALL THIS:
BOMs are used signify what’s called the “endianess” of the file.
Endianess is a really ancient concept: in fact, most software developers who learned programming in the last couple of decades have no idea what this is about. You can learn about endianess from Wikipedia; the short summary is that when 8-bit bytes are combined to make words, e.g. for 32-bit or 64-bit microprocessors, different manufacturers had adopted one of two conventions for organizing these bytes.
For Big-Endian systems the most significant byte was in the smallest address space, for Little-Endian systems the most significant byte was in the largest address space.
(If you have a number like 12345, for example, the “1” is the most significant digit and the “5” is the least significant. In a Big-Endian system this would be stored as “1 2 3 4 5″; in a Little-Endian system it would be stored as “5 4 3 2 1″. So, when you get presented with any number, you really need to know which of the two systems you are using, because the interpretation of the same digits would be wildly different.)
(About a dozen years ago Joel Spolsky, former PM for Excel, wrote a great article on the origins and use of BOM, for those who want to learn more about the technical details.)
Why this affects Kerika at all? Because when you do an export of cards from Kerika, the export job is run on a virtual machine running on Amazon Web Services.
We have no idea what kind of physical hardware is being used by AWS, and we are not supposed to care either: we shouldn’t have to worry about whether we are generating the CSV file using a little- or big-endian machine, and whether the user is going to open that file with a little- or big-endian machine.
That’s the whole point of using UTF-8 and a BOM: to make it possible for files to be more universally shared.