WINE mishandles UTF-8 when reading file in CUI app - utf-8

A simple CUI application (MITLM) reads in a text file and outputs a text file. It wasn't written with Unicode in mind, but handles UTF-8 just fine (because that's the beauty of UTF-8). However, when run under WINE, non-ASCII characters get messed up. What could be the reason for this? Since MITLM is open source I've re-compiled it for Linux and it works fine, but I'm still curious.

Related

Vim – document navigation don't work on cyryllic texts in Windows but works in Linux

Good morning!
Recently I installed gvim on windows 10 and started vimtutor. My home language is Russian and vimtutor was translated by default.
After entering lesson 2.1 I found out that I can't use dw to delete words. Using this command I can delete 1 or sometimes 2 letters in a word. I can't delete the whole word as vimtutor says. Example of text from vimtutor:
Несколько слов рафинад в этом предложении автокран излишни.
For testing purposes I inserted some text using latin symbols and tested dw. Everything deletes correctly.
So when I use gvim in windows 10 I can't complete vimtutor because it works incorrect with non-latin characters. I found a similar question here Similar question The answer was "don't use cyrillic characters". Unfortunately, the answering person didn't fully understood the problem. The question was about editing non-latin text and the answer was about using non-latin symbols in command mode (which is not a problem for me).
I continued my research and found out that console version of vim in windows 10 has the same problem: I can't edit texts with cyrillic symbols.
Then I loaded my OpenSUSE i3 system and launched vimtutor there. Suddenly, all commands work correctly and I can complete vimtutor (even if it mostly contains cyrillic characters).
Do I miss some setup steps in Windows or is it a bug? Why dw don't work only on non-latin words and only in Windows?
After creating issue on Github (https://github.com/vim/vim/issues/8588) I got a respond from habamax about the problem. It seems that in old versions of Windows vim utf-8 is not used by default. Editing vimrc or using nightly build (as habamax suggested in the issue) solves the problem.

Delphi 2007 compilation with codepage parameter and teamcity

I'm trying to compile project wrote in Delphi 2007 with parameter --codepage:1252. At one machine with Windows 10 everything is ok and Spanish strings are displaying correct. When I'm doing the same on computer with Windows 8 parameter --codepage:1252 nothing change.
I need this because on computer where it's not working I have teamcity agent.
Has anyone similar problem? Or it is possible to achieve proper displaying Spanish characters on non-unicode application without changing it in windows property which required system restart and it's problematic for teamcity server.
EDIT:
On both computers I have set Polish language for non-unicode application and:
Windows 10 - compile with Spanish characters, copy exe file to Spanish PC and there is no Spanish characters.
Windows 10 - compile with Spanish characters and --codepage:1252 parameter set in dproj "1252", copy exe file to Spanish PC and Spanish characters are show correct.
Windows 10 - compile with Spanish characters and --codepage:1252 parameter in command line using dcc32.exe, copy exe file to Spanish PC and Spanish characters there is no Spanish characters.
Windows 8 - compile with Spanish characters, copy exe file to Spanish PC and there is no Spanish characters.
Windows 8 - compile with Spanish characters and --codepage:1252 parameter, copy exe file to Spanish PC and there is no Spanish characters.
What is the difference between compiling by Delphi and using dcc32.exe it should be the same output because Delphi use dcc32 in the same way, I can see it in output.
UPDATE:
After more test I have conclusion:
From Delphi IDE compiling with "--codepage:1252" and all files have ANSI format it's work. When I change to UTF-8 wont work.
From command line don't work in any cases and combination.
Delphi 2007 is the last version of Delphi that still uses an Ansi-based RTL/VCL. The native GUI components use the OS's default Ansi codepage. So it does not matter if your source code is encoded in Latin1/Spanish (which is what the --codepage:1252 parameter is telling the compiler) if your GUI cannot display Spanish data correctly on a non-Spanish machine.
As dummzeuch mentioned, your source code should be saved as UTF-8 instead of Latin1/Spanish. But if you want to display Unicode data at runtime from an Ansi-based executable, you will have to convert your data to Unicode and use 3rd party Unicode GUI components, such as the TNT Unicode controls. Also see Handling a Unicode String in Delphi Versions <= 2007.
Otherwise, bite the bullet and upgrade to Delphi 2009 or later, which use a native Unicode-based RTL/VCL. Delphi has been a fully Unicode product since 2008, it is time to ditch ANSI.
You could, on the computer where it works correctly, change the file format to UTF-8 and save it (I'm not sure whether it is possible to do that automatically for all files in a project. Then it should work on any computer, regardless of its codepage.
To change the file format, use the context menu of the editor window.

newlines when displayed in Windows

open OUT, ">output.txt";
print OUT "Hello\nWorld";
When I run the above perl code in a Unix system and then transfer output.txt to Windows and open it in Notepad it shows as:
HelloWorld
What do I need to do to get the newlines displaying properly in Windows?
Text file line endings are platform-specific. If you're creating a file intended for the Windows platform then you should use
open OUT, '>:crlf', 'output.txt' or die $!;
Then you can just
print OUT "Hello\nworld!\n";
as normal
The :crlf PerlIO layer is the default for Perl executables built for Windows, so you don't need to add it to code that will create files for its intended platform. For portable software you can check the host system by examining the built-in variable $^O
Windows uses carriage-return + linefeed:
print OUT "Hello\r\nWorld";
I wrote the File::Edit::Portable module that eliminates these problems. Although you can use it to write (along with many other things), you only need the read() functionality in this case.
Install the module, and at the top of your script, add:
use File::Edit::Portable;
When opening/reading the file, you can just do:
my $rw = File::Edit::Portable->new;
my $fh = $rw->read('file.txt');
No matter what the line endings are or what platform you're on, it does all of the cross-platform work in the background so you don't have to. That way, you can open the file on any system, regardless of what line endings you've decided to use, and it just works.
Newline handling is editor specific (there are a number of answers that claim it is OS specific, but that is, in real life, not true in general). However, it is true that on DOSish systems, longstanding convention is to use CRLF to indicate EOL (see also Why is fwrite writing more than I tell it to?)
If you try to open this file in any other editor than Notepad, you will notice that the text is properly displayed on two lines, with an indicator in a status bar or some other place that the file is opened in Unix mode or LF mode.
Unless you intend your file solely for viewing with Notepad, you don't have anything to worry about. Every other tool on Windows will deal fine with it.
However, Notepad does expect a CRLF sequence to mark the end of each line. If you do want to cater for it, then you can just output "\r\n" as #kizeloo suggests. I do prefer to use output layers when they are necessary.
Note that if you try to view such a file using an editor that requires a single LF to signify EOL, you may see ^Ms or other characters denoting the CR.

Working with UTF-8 encoded TCL files on windows

We try to convert the source files of a TCL/TK application to UTF-8, because this is the default charset of the plattforms we use for development (Linux and OSX).
Our problem is now that windows uses "cp1252" as system encoding, and because of this displays labels and buttons with (for example) german umlauts wrong.
The only solution we found yet would be to add "-encoding UTF-8" to all "wish" calls and "source" commands.
(There is also "encoding system UTF-8", but the documentation says that you shouldn't use it because of problems with system calls)
Is there a way to tell TCL that it should use UTF-8 as default encoding for all source files, or maybe another solution for this problem?
The solution suggested in the TCL chat:
Create and use your own versions of "open" und "source" (like "my_open" and "my_source") which then call the original commands with "-encoding utf-8"

QtQuick 2 file size of download differ on windows

I wrote a small download script in the C++ part of my QtQuick2 app. This works just perfectly fine when I'm building the app for Mac OS 10.9.
For testing I download this file and when it's done I verify it against the given md5 check sum b3215c06647bc550406a9c8ccc378756
Only when I build the app on a windows PC the verification fails. On the second look I recognise that the size of the downloaded file differ with each downlaod, while the "size on disk" stays every time the same.
Do you have any idea what might trigger the strange behaviour in windows os?
Thanks in advance.
If it helps to solve the problem, I will show you my download script, but it's a pretty simple "read-all-write-to-file" script which runs every two seconds.
Could binary/text writing mode affects the result?
UPD: If you use QFile with QIODevice::Text it may behave differently depending on platform.
When reading, the end-of-line terminators are translated to '\n'. When writing, the end-of-line terminators are translated to the local encoding, for example '\r\n' for Win32.

Resources