Ruby: FormatMessageW trouble - ruby

What's a correct way to call FormatMessageW in Ruby?
require 'win32api'
FormatMessage = Win32API.new 'kernel32', 'FormatMessageW', 'IPIIPII', 'I'
msg = '\0' * 255
FormatMessage.call 0x00001000 | 0x00000100, nil, 6, 1024, msg, 0, 0
FormatMessage returns not null result but msg contains not readable message. What's wrong?

I believe this is the code you are looking for:
require 'win32api'
FORMAT_MESSAGE_FROM_SYSTEM = 0x1000
FormatMessage = Win32API.new 'kernel32', 'FormatMessageW', 'IPIIPIP', 'I'
msgw = ("\x00" * 256).force_encoding("UTF-16LE")
count = FormatMessage.call FORMAT_MESSAGE_FROM_SYSTEM, nil, 6, 1033, msgw, msgw.size, nil
msgw = msgw[0, count]
msg = msgw.encode("UTF-8")
puts msg
When I run this with Ruby, the output is "The handle is invalid.", which is the correct Windows error message for error code 6.
There were some problems with your original code.
The last argument to FormatMessageW is a pointer, so you should use P instead of I, especially if you want it to work on 64-bit Windows.
In Ruby '\0' is actually a two-byte ASCII string, not a single-byte null character. You can confirm this by running p '\0'.bytes.to_a. It looks like you tried to allocate 255 bytes, but you actually allocated 510 bytes. You should allocate an even number of bytes because wide characters in Windows take 2 bytes.
As #theB pointed out, your first argument to FormatMessageW was wrong, since you specified that FormatMessageW should allocate its own buffer.
You specified language code 1024. I can't find a definition for that. Maybe you meant 1033, which is "English - United States". Specifying 1024 doesn't seem to actually cause problems though.
You should use force_encoding to set the encoding of your string to UTF-16LE, because that is the encoding used for wide strings in Windows (or if it's not exactly the same, at least it is compatible most of the time).
The 6th argument to FormatMessageW should be the number of characters in your buffer (which is the number of bytes divided by 2, by the way). Your code just passed 0 for that argument.
Strings in Ruby can contain any arbitrary bytes, including null characters, but it's not necessarily a good idea to let that happen because things like String#size will return surprising results. FormatMessageW returns the number of characters in the formatted message, so we can use that to trim off the null characters at the end. (Conveniently, FormatMessageW returns 0 if there is an error, so our trimming would result in an empty string.)
You should use String#encode to convert your string from UTF-16LE to UTF-8 because UTF-8 strings are much easier to operate on and print in Ruby.
If you don't care about internationalization and unicode, you could have just used FormatMessageA instead. Here is some code that will work for that:
require 'win32api'
FORMAT_MESSAGE_FROM_SYSTEM = 0x1000
FormatMessage = Win32API.new 'kernel32', 'FormatMessageA', 'IPIIPIP', 'I'
msg = ("\x00" * 256).force_encoding("ASCII-8BIT")
count = FormatMessage.call FORMAT_MESSAGE_FROM_SYSTEM, nil, 6, 1033, msg, msg.size, nil
msg = msg[0, count]
puts msg
P.S. DWORD is an unsigned integer type. I am not sure what the right letter for that is in Ruby's Win32API class; it might be that I represents a signed integer, and should be replaced by something else.

Related

How to convert a char to a libc::c_char?

I have a C function:
Node * first_element_by_path(const Node * node, const char * path, char delimiter);
And a Rust glue function:
pub fn first_element_by_path(node: *mut CNode, path: *const c_char, delimiter: c_char) -> *mut CNode;
It expects a c_char as delimiter. I want to send a char to it, but c_char is a i8 and not a char. How can I convert a Rust char to i8 or c_char in this case?
You are asking the question:
How do I fit a 32-bit number into an 8-bit value?
Which has the immediate answer: "throw away most of the bits":
let c = rust_character as libc::c_char;
However, that should cause you to stop and ask the questions:
Are the remaining bits in the right encoding?
What about all those bits I threw away?
Rust chars allow encoding all Unicode scalar values. What is your desired behavior for this code:
let c = '💩' as libc::c_char;
It's probably not to create the value -87, a non-ASCII value! Or this less-silly and perhaps more realistic variant, which is -17:
let c = 'ï' as libc::c_char;
You then have to ask: what does the C code mean by a character? What encoding does the C code think strings are? How does the C code handle non-ASCII text?
The safest thing may be to assert that the value is within the ASCII range:
let c = 'ï';
let v = c as u32;
assert!(v <= 127, "Invalid C character value");
let v = v as libc::c_char;
Instead of asserting, you could also return a Result type that indicates that the value was out of range.
should I change my function (the one that will call the glue function) to receive a c_char instead of a char?
That depends. That may just be pushing the problem further up the stack; now every caller has to decide how to create the c_char and worry about the values between 128 and 255. If the semantics of your code are such that the value has to be an ASCII character, then encode that in your types. Specifically, you can use something like the ascii crate.
In either case, you push the possibility for failure into someone else's code, which makes your life easier at the potential expense of making the caller more frustrated.

Why XFetchBuffer() returns null instead of clipboard?

int N, atom;
atom = XInternAtom (display, "CLIPBOARD", false);
char *c = XFetchBuffer(display, &N, atom);
The code above supposed to get the string from the clipboard, but it only returns null. And N is 0 as well.
XFetchBuffer works with cut buffers, not with the clipboard. Cut buffers are hardly ever used these days. Note the argument XFetchBuffer accepts is not an Atom but an integer. These are not the same thing.
If you need the clipboard, you need to follow ICCCM and write lots more code.

Powerbuilder: ImportFile of UTF-8 (Converting UTF-8 to ANSI)

My Powerbuilder version is 6.5, cannot use a higher version as this is what I am supporting.
My problem is, when I am doing dw_1.ImportFile(file) the first row and first column has a funny string like this:

Which I dont understand until I tried opening the file and saving it to a new text file and trying to import that new file.which worked flawlessly without the funny string.
My conclusion is that this is happening because the file is UTF-8 (as shown in NOTEPAD++) and the new file is Ansi. The file I am trying to import is automatically given by a 3rd party and my users dont want the extra job of doing this.
How do I force convert this files to ANSI in powerbuilder. If there is none, I might have to do a command prompt conversion, any ideas?
The weird  characters are the (optional) utf-8 BOM that tells editors that the file is utf-8 encoded (as it can be difficult to know it unless we encounter an escaped character above code 127). You cannot just rid it off because if your file contains any character above 127 (accents or any special char), you will still have garbage in your displayed data (for example: é -> é, € -> €, ...) where special characters will become from 2 to 4 garbage chars.
I recently needed to convert some utf-8 encoded string to "ansi" windows 1252 encoding. With version of PB10+, a reencoding between utf-8 and ansi is as simple as
b = blob(s, encodingutf8!)
s2 = string(b, encodingansi!)
But string() and blob() do not support encoding specification before the release 10 of PB.
What you can do is to read the file yourself, skip the BOM, ask Windows to convert the string encoding via MultiByteToWideChar() + WideCharToMultiByte() and load the converted string in the DW with ImportString().
Proof of concept to get the file contents (with this reading method, the file cannot be bigger than 2GB):
string ls_path, ls_file, ls_chunk, ls_ansi
ls_path = sle_path.text
int li_file
if not fileexists(ls_path) then return
li_file = FileOpen(ls_path, streammode!)
if li_file > 0 then
FileSeek(li_file, 3, FromBeginning!) //skip the utf-8 BOM
//read the file by blocks, FileRead is limited to 32kB
do while FileRead(li_file, ls_chunk) > 0
ls_file += ls_chunk //concatenate in loop works but is not so performant
loop
FileClose(li_file)
ls_ansi = utf8_to_ansi(ls_file)
dw_tab.importstring( text!, ls_ansi)
end if
utf8_to_ansi() is a globlal function, it was written for PB9, but it should work the same with PB6.5:
global type utf8_to_ansi from function_object
end type
type prototypes
function ulong MultiByteToWideChar(ulong CodePage, ulong dwflags, ref string lpmultibytestr, ulong cchmultibyte, ref blob lpwidecharstr, ulong cchwidechar) library "kernel32.dll"
function ulong WideCharToMultiByte(ulong CodePage, ulong dwFlags, ref blob lpWideCharStr, ulong cchWideChar, ref string lpMultiByteStr, ulong cbMultiByte, ref string lpUsedDefaultChar, ref boolean lpUsedDefaultChar) library "kernel32.dll"
end prototypes
forward prototypes
global function string utf8_to_ansi (string as_utf8)
end prototypes
global function string utf8_to_ansi (string as_utf8);
//convert utf-8 -> ansi
//use a wide-char native string as pivot
constant ulong CP_ACP = 0
constant ulong CP_UTF8 = 65001
string ls_wide, ls_ansi, ls_null
blob lbl_wide
ulong ul_len
boolean lb_flag
setnull(ls_null)
lb_flag = false
//get utf-8 string length converted as wide-char
setnull(lbl_wide)
ul_len = multibytetowidechar(CP_UTF8, 0, as_utf8, -1, lbl_wide, 0)
//allocate buffer to let windows write into
ls_wide = space(ul_len * 2)
lbl_wide = blob(ls_wide)
//convert utf-8 -> wide char
ul_len = multibytetowidechar(CP_UTF8, 0, as_utf8, -1, lbl_wide, ul_len)
//get the final ansi string length
setnull(ls_ansi)
ul_len = widechartomultibyte(CP_ACP, 0, lbl_wide, -1, ls_ansi, 0, ls_null, lb_flag)
//allocate buffer to let windows write into
ls_ansi = space(ul_len)
//convert wide-char -> ansi
ul_len = widechartomultibyte(CP_ACP, 0, lbl_wide, -1, ls_ansi, ul_len, ls_null, lb_flag)
return ls_ansi
end function

WideCharToMultiByte when is lpUsedDefaultChar true?

I am trying to understand WideCharToMultiByte and I was wondering when lpUsedDefaultChar would be set to be TRUE.
Here is a sample: What should be lpszW inorder for the flag to be set to be true?
lpszW = L”__WHAT SHOULD_BE_HERE__”;
int c = ??;
BOOL fUsedDefaultChar = false;
DWORD dwSize = WideCharToMultiByte(CP_ACP, 0, lpszW, c, myOutStr ,myOutLen, NULL, &fUsedDefaultChar);
http://msdn.microsoft.com/en-us/library/dd374130(VS.85).aspx
Any books/tutorials for understanding Unicode/UTF stuff would be great.
Thanks!
Anything that is not present in the current codepage will map to ? (by default) and UsedDefaultChar will be != FALSE.
Windows-1252 is probably the most common codepage and most of those characters map to the same value in unicode.
Take Ω (ohm) for example, it is probably not present in whatever your current codepage is and therefore will not map to a valid narrow character:
BOOL fUsedDefaultChar=FALSE;
DWORD dwSize;
char myOutStr[MAX_PATH];
WCHAR lpszW[10]=L"Hello";
*lpszW=0x2126; //ohm sign, you could also use the \u2126 syntax if your compiler supports it.
dwSize = WideCharToMultiByte(CP_ACP, 0, lpszW, -1, myOutStr ,MAX_PATH, NULL, &fUsedDefaultChar);
printf("%d %s\n",fUsedDefaultChar,myOutStr); //This prints "1 ?ello" on my system
The MSDN documentation is very clear about when lpUsedDefaultChar is set to TRUE:
lpDefaultChar [in] Optional. Pointer
to the character to use if a character
cannot be represented in the specified
code page. The application sets this
parameter to NULL if the function is
to use a system default value. To
obtain the system default character,
the application can call the GetCPInfo
or GetCPInfoEx function.
lpUsedDefaultChar [out] Optional.
Pointer to a flag that indicates if
the function has used a default
character in the conversion. The flag
is set to TRUE if one or more
characters in the source string cannot
be represented in the specified code
page. Otherwise, the flag is set to
FALSE. This parameter can be set to
NULL.
That does not leave much room for misunderstanding, in my opinion.

Converting LPCWSTR with WideCharToMultiByte. Need help

i have a function like this:
BOOL WINAPI MyFunction(HDC hdc, LPCWSTR text, UINT cbCount){
char AnsiBuffer[255];
int written = WideCharToMultiByte(CP_ACP, 0, text, cbCount, AnsiBuffer , 0, NULL, NULL);
if(written > -1) AnsiBuffer[written] = '\0';
if(written>0){
ofstream myfile;
myfile.open ("C:\\example.txt", ios::app);
myfile.write(AnsiBuffer, sizeof(AnsiBuffer));
myfile.write("\n", 1);
myfile.close();
}
....
When i display the input LPCWSTR text with MessageBoxW(), the text shows up fine. When i try to convert it to multibyte, the return value looks normal (ex: 22, 45, etc), but the result is strings of gibberish (ex ÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌ). Suggestions?
I see two problems;
1) You are passing '0' to WideCharToMultiByte for the size of the multibyte buffer. If you read the documents this results in the function returning the NUMBER of bytes needed but performing no actual conversion. (This is to allow you to subsequently allocate a buffer of the correct size and recall the function).
2) in file.write sizeof(AnsiBuffer) will result in 255 bytes being written regardless of what is in the buffer. sizeof is a compile-time calculation that returns the size of a variable. You should replace this with the 'written' variable that represents the length of the string.
You need to pass the length of the buffer to the API, instead of passing 0. When you pass 0, the function returns the required length of the buffer, but doesn't write to it. You're seeing the results of the uninitialized array.
Here's the right call, with the 255 in the right place:
int written = WideCharToMultiByte(CP_ACP, 0, text, cbCount, AnsiBuffer , 255, NULL, NULL);

Resources