Invisible Characters on Stage!

Best practices in IT sometimes change, and the management of certain invisible characters in text files is one example. End of Line (EOL) characters have evolved, especially for version control systems integration purposes. In the same way, the Byte Order Mark (BOM) on Unicode text files is less and less used.

With 4D v19 R2, 4D has evolved smoothly to follow these best practices, giving you more flexibility along the way.

REMINDER

End of line (EOL)

You know that lines in text files are separated with End of Line character(s). On Windows, it’s a combination of two characters:

Carriage Return (#13, named CR) and,
Line Feed (#10, named LF).

On macOS, it was the legacy CR character.

Version control systems like Git don’t manage CR as EOL character, so there’s a need to use LF instead.

Byte Order Mark (BOM)

You also know that a Unicode text file can contain invisible heading bytes, named Byte Order Mark, that define the character set used.

4D followed the current best practices when introducing Unicode, by writing text files with a BOM by default. But as UTF-8 has nearly become the standard text file format, the BOM is less and less used.

NEW 4D convention ON TEXT FILES

From now on, following the trends, 4D writes text files without a BOM. And on macOS, 4D uses LF as EOL character. This is completely automatic for all files written by 4D, such as 4DSettings, 4dm, 4DForm, and so on.

When opening files in a previous 4D version, you won’t encounter any problem, even if they were written in 4D v19 R2, because these previous versions were able to open files without BOM and using LF as EOL character.

Compatibility settings

Want to follow 4D’s behavior? New compatibility settings are available that allow TEXT TO DOCUMENT and File.setText() to generate files without a BOM and use LF as EOL character on macOS when the optional “charSet” and “breakMode” parameters are missing. This is fully automatic; you don’t need to rewrite your source code to benefit from this new behavior.

NEW CHARACTER SETS

The new compatibility settings apply to the commands TEXT TO DOCUMENT and File.setText() when they are used without the “charSet” or “breakMode” optional parameters.
No changes were made for the EOL character. It can still be forced using the “breakMode” parameter.

In the same way, you may want to define precisely if the files generated by TEXT TO DOCUMENT and File.setText() contain a BOM or not. To do this, define a Unicode character set ending with “-no-bom”, such as “UTF-8-no-bom” or “UTF-16-no-bom”, to write a file without a BOM. To write a file with a BOM, you can still use the existing character sets without the suffix “-no-bom”, such as “UTF-8” or “UTF-16”.

Discuss

Tags Compatibility settings, Programming, v19 R2, v20