Does the Windows API RegGetValue require a direct descendant for it's lpSubKey parameter?
Will this work?
RegGetValue(HKEY_LOCAL_MACHINE,
L"Software\\Microsoft\\Windows NT\\CurrentVersion", L"ProductName",
RRF_RT_REG_SZ, NULL, outdata, &outdata_size);
Edit: I had a leading slash \\ and Windows doesn't like it! Also converted UTF-8 strings to UTF-16 wide strings (Windows-Style).
Does the Windows API RegGetValue require a direct descendant for it's lpSubKey parameter?
No, it does not.
You can specify a path just as you've shown. You also don't need the leading path separator (\\).
But the code you've shown may or may not work. Not because it specifies a path to the string, but because you're probably mixing Unicode and ANSI strings. From your user name (unixman), I assume that you're relatively new to Windows programming, so it's worth noting that Windows applications are entirely Unicode and have been for more than a decade. You should therefore always compile your code as Unicode and prefix string literals with L (to indicate a wide, or Unicode, string).
Likewise, make sure that outdata is declared as an array of wchar_t.
Related
What is the difference in calling the Win32 API function that have an A character appended to the end as opposed to the W character.
I know it means ASCII and WIDE CHARACTER or Unicode, but what is the difference in the output or the input?
For example, If I call GetDefaultCommConfigA, will it fill my COMMCONFIG structure with ASCII strings instead of WCHAR strings? (Or vice-versa for GetDefaultCommConfigW)
In other words, how do I know what Encoding the string is in, ASCII or UNICODE, it must be by the version of the function I call A or W? Correct?
I have found this question, but I don't think it answers my question.
The A functions use Ansi (not ASCII) strings as input and output, and the W functions use Unicode string instead (UCS-2 on NT4 and earlier, UTF-16 on W2K and later). Refer to MSDN for more details.
Most of WinAPI calls have Unicode and ANSI function call
For examble:
function MessageBoxA(hWnd: HWND; lpText, lpCaption: LPCSTR; uType: UINT): Integer; stdcall;external user32;
function MessageBoxW(hWnd: HWND; lpText, lpCaption: LPCWSTR; uType: UINT): Integer; stdcall; external user32;
When should i use the ANSI function rather than calling the Unicode function ?
Just as (rare) exceptions to the posted comments/answers...
One may choose to use the ANSI calls in cases where UTF-8 is expected and supported. For an example, WriteConsoleA'ing UTF-8 strings in a console set to use a TT font and running under chcp 65001.
Another oddball exception is functions that are primarily implemented as ANSI, where the Unicode "W" variant simply converts to a narrow string in the active codepage and calls the "A" counterpart. For such a function, and when a narrow string is available, calling the "A" variant directly saves a redundant double conversion. Case in point is OutputDebugString, which fell into this category until Windows 10 (I just noticed https://msdn.microsoft.com/en-us/library/windows/desktop/aa363362.aspx which mentions that a call to WaitForDebugEventEx - only available since Windows 10 - enables true Unicode output for OutputDebugStringW).
Then there are APIs which, even though dealing with strings, are natively ANSI. For example GetProcAddress only exists in the ANSI variant which takes a LPCSTR argument, since names in the export tables are narrow strings.
That said, by an large most string-related APIs are natively Unicode and one is encouraged use the "W" variants. Not all the newer APIs even have an "A" variant any longer (e.g. CommandLineToArgvW). From the horses's mouth https://msdn.microsoft.com/en-us/library/windows/desktop/ff381407.aspx:
Windows natively supports Unicode strings for UI elements, file names, and so forth. Unicode is the preferred character encoding, because it supports all character sets and languages. Windows represents Unicode characters using UTF-16 encoding, in which each character is encoded as a 16-bit value. UTF-16 characters are called wide characters, to distinguish them from 8-bit ANSI characters.
[...]
When Microsoft introduced Unicode support to Windows, it eased the transition by providing two parallel sets of APIs, one for ANSI strings and the other for Unicode strings.
[...]
Internally, the ANSI version translates the string to Unicode. The Windows headers also define a macro that resolves to the Unicode version when the preprocessor symbol UNICODE is defined or the ANSI version otherwise.
[...]
Most newer APIs in Windows have just a Unicode version, with no corresponding ANSI version.
[ NOTE ] The post was edited to add the last two paragraphs.
The simplest rule to follow is this: Only use the ANSI variants on systems that do not have the Unicode variant. That is on Windows 95, 98 and ME, which are the versions of Windows that do not support Unicode.
These days, it is exceptionally unlikely that you will be targeting such versions, and so in all probability you should always just use the Unicode variants.
Is there any character that is guaranteed not to appear in any file path on Windows or Unix/Linux/OS X?
I need this because I want to join together a few file paths into a single string, and then split them apart again later.
In the comments, Harry Johnston writes:
The generic solution to this class of problem is to encode the file paths before joining them. For example, if you're dealing with single-byte strings, you could convert them to hex strings; so "hello" becomes "68656c6c6f". (Obviously that isn't the most efficient solution!)
That is absolutely correct. Please don't try to do anything "tricky" with filenames and reserved characters, because it will eventually break in some weird corner case and your successor will have a heck of a time trying to repair the damage.
In fact, if you're trying to be portable, I strongly recommend that you never attempt to create any filenames including any characters other than [a-z0-9_]. (Consider that common filesystems on both Windows and OS X can operate in case-insensitive mode, where FooBar.txt and FOOBAR.TXT are the same identifier.)
A decently compact encoding scheme for practical use would be to make a "whitelisted set" such as [a-z0-9_], and encode any character ch outside your "whitelisted set" as printf("_%2x", ch). So hello.txt becomes hello_2etxt, and hello_world.txt becomes hello_5fworld_2etxt.
Since every _ is escaped, you can use double-_ as a separator: the encoded string hello_2etxt__goodbye___2e_2e uniquely identifies the list of filenames ['hello.txt', 'goodbye', '..'].
You can use a newline character, or specifically CR (decimal code 13) or LF (decimal code 10) if you like. Whether this is suitable or not depends on what requirements you have with regard to displaying the concatenated string to the user - with this approach, it will print its parts on separate lines - which may be very good or very bad for the purpose (or you may not care...).
If you need the concatenated string to print on a single line, edit your question to specify this additional requirement; and we can go from there then.
I am trying to put Unicode characters (using a custom font) into a string which I then display using Quartz, but XCode doesn't like the escape codes for some reason, and I'm really stuck.
CGContextShowTextAtPoint (context, 15, 15, "\u0066", 1);
It doesn't like this (Latin lowercase f) and says it is an "invalid universal character".
CGContextShowTextAtPoint (context, 15, 15, "\ue118", 1);
It doesn't complain about this but displays nothing. When I open the font in FontForge, it shows the glyph as there and valid. Also Font Book validated the font just fine. If I use the font in TextEdit and put in the Unicode character with the character viewer Unicode table, it appears just fine. Just Quartz won't display it.
Any ideas why this isn't working?
The "invalid universal character" error is due to the definition in C99: Essentially \uNNNN escapes are supposed to allow one programmer to call a variable føø and another programmer (who might not be able to type ø) to refer to it as f\u00F8\u00F8. To make parsing easier for everyone, you can't use a \u escape for a control character or a character that is in the "basic character set" (perhaps a lesson learned from Java's unicode escapes which can do crazy things like ending comments).
The second error is probably because "\ue118" is getting compiled to the UTF-8 sequence "\xee\x8e\x98" — three chars. CGContextShowTextAtPoint() assumes that one char (byte) is one glyph, and CGContextSelectFont() only supports the encodings kCGEncodingMacRoman (which decodes the bytes to "Óéò") and kCGEncodingFontSpecific (what happens is anyone's guess. The docs say not to use CGContextSetFont() (which does not specify the char-to-glyph mapping) in conjunction with CGContextShowText() or CGContextShowTextAtPoint().
If you know the glyph number, you can use CGContextShowGlyphs(), CGContextShowGlyphsAtPoint(), or CGContextShowGlyphsAtPositions().
I just changed the font to use standard alphanumeric characters in the end. Much simpler.
What is the difference in calling the Win32 API function that have an A character appended to the end as opposed to the W character.
I know it means ASCII and WIDE CHARACTER or Unicode, but what is the difference in the output or the input?
For example, If I call GetDefaultCommConfigA, will it fill my COMMCONFIG structure with ASCII strings instead of WCHAR strings? (Or vice-versa for GetDefaultCommConfigW)
In other words, how do I know what Encoding the string is in, ASCII or UNICODE, it must be by the version of the function I call A or W? Correct?
I have found this question, but I don't think it answers my question.
The A functions use Ansi (not ASCII) strings as input and output, and the W functions use Unicode string instead (UCS-2 on NT4 and earlier, UTF-16 on W2K and later). Refer to MSDN for more details.