Implications of using non-unicode library with a unicode-built application - windows

I would like to use a non-unicode library from my unicode-built MFC application. However, I'm not sure whether there is a possibility of occurring events such as unintended memory allocations, string handling inside the non-unicode library.
Please explain any implications or provide a resource page.

Whether or not an application is Unicode is a compile-time, not a run-time, distinction - there is no inherent reason why a Unicode executable can't load an ANSI DLL. If the application and DLL both used MFC they would be linked to different MFC runtimes, which could cause problems, but as that isn't the case you should be fine.
Where you need to take care is to ensure that any string data transferred between the DLL and the application are interpreted consistently. Mostly this just means converting between ANSI and Unicode as necessary, and Windows provides API functions which allow you to do this easily enough.
You should, however, check the header files for any data types that are interpreted differently when compiling for Unicode than when compiling for ANSI. For example, if one of the DLL functions was declared as
DWORD process_string(TCHAR * string)
then the non-unicode library would interpret TCHAR as char, but your application would interpret it as wchar_t, hiding the fact that you need to convert the string to ANSI before calling the function.

Related

Is it possible to call the Windows API from Forth?

In C/C++, Windows executables are linked against static libraries that import DLL files containing Windows API procedures.
But how do we access those procedures from Forth code (e.g. GForth)? Is it possible at all?
I'm aware that there's Win32Forth capable of doing Win32 stuff, but I'm interested how (and if) this could be done in Forth implementations that lack this functionality from the box (yet do run on target OS and are potentially able to interact with it on a certain level).
What currently comes up to my mind is loading the DLL files in question and somehow locating the address of a procedure to execute - but then, execute how? (All I know is that Windows API uses the stdcall
convention). And how do we locate a procedure without a C header? (I'm very new to Forth and just a bit less new to C++. Please bear with me if my musings are nonsense).
In general case, to implement foreign functions interface (FFI) for dynamically loaded libraries in some Forth system as extension (i.e., without changing source code and recompilation), we need the dlopen and dlsym functions, Forth assembler, and intimate knowledge of the Forth-system organization and ABI.
Sometimes it could be done even without assembler. For example, though SP-Forth has FFI, foreign calls were also implemented in pure Forth as a result of native code generation and union of the return stack with the native hardware stack.
Regarding Gforth, it seems that in the version 0.7.9 (see releases) it doesn't have FFI for stdcall calling convention out of the box (it supports cdecl only), although it has dlopen and dlsym, and an assembler. So, it should be feasible to implement FFI for stdcall.
Yes, you could do this in Gforth according to its documentation. The biggest problem will be dealing with call backs, which the Windows API relies on rather heavily. There is an unsupported package to deal with this, see 5.25.6 Callbacks. I have not attempted this myself in Gforth, but the documentation looks adequate.
You might also want to check MPE's VFXForth. From their website:
Windows API Access
VFX Forth can access all the standard Windows API calls, as well as functions in any other DLLs. The function interface allows API calls to be defined by cut and paste from other language reference manuals, for example:
EXTERN: int PASCAL CreateDialogIndirectParam( HINSTANCE, void *,HWND, WNDPROC, LPARAM );
EXTERN: int PASCAL SetWindowText( HANDLE, LPSTR );
EXTERN: HANDLE PASCAL GetDlgItem( HANDLE, int );
This is down the page a bit at VFX Forth for Windows.
As I do my Forth on Mac and Linux, I can't work through the Windows for Gforth to provide more detail, sorry.
Gforth 0.7.9 provides Windows API calls generated by Swig from the Windows header files. The C interface uses a wrapper library, which is compiled by the C compiler, to pass parameters from the Forth stack to the system functions; as the C compiler understands stdcall, and the header files declare Windows API as stdcall, this "just works".
As all pre-generated C bindings live in the directory "unix" (for historical reasons), include unix/win32.fs gives you the win32 part of the Windows API.
Callbacks in the event loop are still a problem, as Gforth is a Cygwin program, and Cygwin has its special event loop task... but I hope that problem can be fixed.

Cross-platform unicode in C/C++: Which encoding to use?

I'm currently working on a hobby project (C/C++) which is supposed to work on both Windows and Linux, with full support for Unicode. Sadly, Windows and Linux use different encodings making our lives more difficult.
In my code I'm trying to use the data as universal as possible, making it easy for both Windows and Linux. In Windows, wchar_t is encoded as UTF-16 by default, and as UCS-4 in Linux (correct me if I'm wrong).
My software opens ({_wfopen, UTF-16, Windows},{fopen, UTF-8, Linux}) and writes data to files in UTF-8. So far it's all doable. Until I decided to use SQLite.
SQLite's C/C++ interface allows for one or two-byte encoded strings (click).
Ofcourse this does not work with wchar_t in Linux, as the wchar_t in Linux is 4 bytes by default. Therefore, writing and reading from sqlite requires conversion for Linux.
Currently the code is cluttering up with exceptions for Windows/Linux. I was hoping to stick to the standard idea of storing data in wchar_t:
wchar_t in Windows: Filepaths without a problem, reading/writing to sqlite without a problem. Writing data to a file should be done in UTF-8 anyway.
wchar_t in Linux: Exception for the filepaths due to UTF-8 encoding, conversion before reading/writing to sqlite (wchar_t), and the same for windows when writing data to a file.
After reading (here) I was convinced I should stick to wchar_t in Windows. But after getting all that to work, the trouble began with porting to Linux.
Currently I'm thinking of redoing it all to stick with simple char(UTF-8) because it works with both Windows and Linux, keeping the fact in mind that I need to 'WideCharToMultiByte' every string in Windows to achieve UTF-8. Using simple char* based strings will greatly reduce the number of exceptions for Linux/Windows.
Do you have any experience with unicode for cross-platform? Any thoughts about the idea of simply storing data in UTF-8 instead of using wchar_t?
UTF-8 on all platforms, with just-in-time conversion to UTF-16 for Windows is a common tactic for cross-platform Unicode.
Our software is cross-platform as well, and we faced similar problems. We decided that our goal is to have the least amount of conversions possible. This means that we use wchar_t on Windows and char on Unix/Mac.
We do this by supporting _T and LPCTSTR and similar on Unix and by having generic functions that easily convert between std::string and std::wstring. We also have a generic std::basic_string<TCHAR> (tstring) which we use in most cases.
So far this works quite well. Basicly most functions take a tstring or a LPCTSTR and those which don't will get their parameters converted from a tstring. That means that most of the time we don't convert our strings and pass through most parameters.

Specifying the encoding when saving a file in a Windows app

I am writing a program that handles mostly Unicode text. The C standard library function 'fopen' provides for writing the characters to file in utf-8 format by including in the mode string argument "..., ccs=utf-8". It seems that the Windows API 'CreateFile' does give such provision. Must I use 'fopen' then?
This is Specific to programming under Windows, using Visual Studio, and Microsoft tools. My personal advice is to not to use fopen with the extended syntax, otherwise later there will be compatibility issues when porting your application to other operating systems. When under Windows, do the Windows way, use CreateFile.
The contents of the file are defined not by the file-opening function, but the actual data you write. After you get the file handle (either by fopen or CreateFile), you can write in UTF-8, or ANSI, or whatever you like.
Note that some encodings require a special bit at the beginning of the file.

Why isn't UTF-8 allowed as the "ANSI" code page?

The Windows _setmbcp function allows any valid code page...
(except UTF-7 and UTF-8, which are not supported)
OK, not supporting UTF-7 makes sense: Characters have non-unique representations and that introduces complexity and security risks.
But why not UTF-8?
As I understand it, the "ANSI" versions of the Windows API functions convert their arguments to UTF-16, call the equivalent "W" function, and convert any strings in the output to "ANSI". This is what I've been doing manually. So why can't Windows do it for me?
The "ANSI" codepage is basically legacy: Windows 9X era. All modern software should be Unicode (that is, UTF-16) based anyway.
Basically, when the Ansi code page stuff was originally designed, UTF-8 wasn't even invented and so support for multi-byte encodings was rather haphazard (i.e. most Ansi code pages are single byte, with the exception of some East Asian code pages which are one-or-two byte). Adding support for "proper" multi-byte encodings was probably deemed not worth the effort when all new development should be done in UTF-16 anyway.
_setmbcp() is a VC++ RTL function, not a Win32 API function. It only affects how the RTL interprets strings. It has no effect whatsoever on Win32 API A functions. When they call their W counterparts internally, the A functions always use MultiByteToWideChar() and WideCharToMultiByte() specifying codepage 0 (CP_ACP) to use the system default Ansi codepage for the conversions.
Michael Kaplan, an internationalization expert from Microsoft, tried to answer this on his blog.
Basically his explanation is that even though the "ANSI" versions of Windows API functions are meant to handle different code pages, historically there was an implicit expectation that character encodings would require at most two bytes per code point. UTF-8 doesn't meet that expectation, and changing all of those functions now would require a massive amount of testing.
The reason is exactly like what was said in jamesdlin's answers and the comments below it: MBCS is the same as DBCS in Windows and some functions don't work with characters that are longer than 2 bytes
Microsoft said that a UTF-8 locale might break some functions as they were written to assume multibyte encodings used no more than 2 bytes per character, thus code pages with more bytes such as UTF-8 (and also GB 18030, cp54936) could not be set as the locale.
https://en.wikipedia.org/wiki/Unicode_in_Microsoft_Windows#UTF-8
So UTF-8 was allowed in functions like read/write but not when using as a locale
However Microsoft has finally fixed that so now we can use UTF-8 as a locale. In fact MS even started recommending the ANSI APIs (-A) again instead of the Unicode (-W) versions like before. There are some new options in MSVC: /execution-charset:utf-8 and /utf-8 to set the charset, or you can also set the ActiveCodePage property in appxmanifest of the UWP app
Since Windows 10 insider build 17035, before those options were introduced, a "Beta: Use Unicode UTF-8 for worldwide language support" checkbox had also been added for setting the locale code page to UTF-8
To open that dialog box open start menu, type "region" and select Region settings > Additional date, time & regional settings > Change date, time, or number formats > Administrative
After enabling it you can call setlocale() to change to UTF-8 locale:
Starting in Windows 10 build 17134 (April 2018 Update), the Universal C Runtime supports using a UTF-8 code page. This means that char strings passed to C runtime functions will expect strings in the UTF-8 encoding. To enable UTF-8 mode, use "UTF-8" as the code page when using setlocale. For example, setlocale(LC_ALL, ".utf8") will use the current default Windows ANSI code page (ACP) for the locale and UTF-8 for the code page.
UTF-8 Support
You can also use this in older Windows versions
To use this feature on an OS prior to Windows 10, such as Windows 7, you must use app-local deployment or link statically using version 17134 of the Windows SDK or later. For Windows 10 operating systems prior to 17134, only static linking is supported.
See also
Is it possible to set “locale” of a Windows application to UTF-8?

MSVC _open/_close/etc

Why are the API's _open, _close, and other standard file i/o functions prefixed with an underscore? Aren't these part of some standard?
open/close are part of some Unix standards, POSIX, SUS, etc. but Windows is not a Unix.
You'll note that the ANSI C standard library functions like fopen do not have the single underscore decoration.
Because Windows isn't a Unix, there may have been a time, long ago, where the Unix style APIs were not available. Because of this, client code could have been written that defined functions like open and close. To maintain compatibility with existing code, when Unix style APIs were added, they could be added with leading underscores because identifiers with leading underscores are reserved for the implementation. In other words, no existing code should be defining a function named _open.
"Portable" code targetting the Unix style apis can then be relatively easily compiled via use of macros (or aliases implemented at the linker level), since that code, targetting unix, knows it didn't define any functions named open/close etc.

Resources