How to display UTF-8 in a HITextView (MLTE) control? - macos

I have a UTF-8 string which I want to display in an HITextView (MLTE) control. Theoretically, HITextView requires either "Text" or UTF-16, so I'm converting:
UniChar uniput[STRSIZE];
ByteCount converted,unilen;
err = ConvertFromTextToUnicode(C2UInfo, len, output,
kUnicodeUseFallbacksMask, 0, NULL, 0, NULL,
sizeof(UniChar)*STRSIZE,
&converted, &unilen, uniput);
err=TXNSetData(MessageObject, kTXNUnicodeTextData, uniput, unilen, kTXNEndOffset,
kTXNEndOffset);
I have defined the converter C2UInfo as follows:
UnicodeMapping uMapping;
uMapping.unicodeEncoding = CreateTextEncoding(kTextEncodingUnicodeV2_0,
kUnicodeCanonicalDecompVariant,
kUnicode16BitFormat);
uMapping.otherEncoding = GetTextEncodingBase(kUnicodeUTF8Format);
uMapping.mappingVersion = kUnicodeUseLatestMapping;
err = CreateTextToUnicodeInfo(&uMapping, &C2UInfo);
It works fine for plain old ASCII characters, but multi-byte UTF-8 is being mapped to the wrong characters. For example, æ (LATIN SMALL LETTER AE) is being mapped to 疆 (CJK UNIFIED IDEOGRAPH-7586).
I've tried checking and unchecking "Output Text in Unicode" in Interface Builder, and I've tried varying some of the conversion constants, with no effect.
This is being built with Xcode 3.2.6 using the MacOSX10.5.sdk and tested under 10.6.

The “Text” that ConvertFromTextToUnicode expects is probably the same “Text” that is one of your two options for MLTE. If you had the sort of “Text” that ConvertFromTextToUnicode converts from, you could just pass it to MLTE directly.
(For the record, “Text” is almost certainly either MacRoman or whatever is dictated by the user's locale-determined current script.)
Instead, you should use a Text Encoding Converter. Create one, use it, finish using it, and dispose of it when you're done.
There are two other ways.
One is to create a CFString from the UTF-8, then Get its characters. You would do this instead of using a TEC. It's functionally equivalent and possibly a little bit easier. On the other hand, you don't get to reuse the converter, for whatever that's worth.
The other, since you have an HITextView, would be to create a CFString from the UTF-8 and just use that. Like Cocoa objects, HIToolbox objects have an inheritance hierarchy; since an HITextView is a kind of HIView, HIViewSetText should just work. (And if not, try HIViewSetValue.)
The last method also gets you that much closer to your eventual move away from MLTE/HITextView, since it's essentially what you'll do with an NSTextView. (HITextView and MLTE are deprecated.)

Related

What is the point of google.protobuf.StringValue?

I've recently encountered all sorts of wrappers in Google's protobuf package. I'm struggling to imagine the use case. Can anyone shed the light: what problem were these intended to solve?
Here's one of the documentation links: https://developers.google.com/protocol-buffers/docs/reference/csharp/class/google/protobuf/well-known-types/string-value (it says nothing about what can this be used for).
One thing that will be different in behavior between this, and simple string type is that this field will be written less efficiently (a couple extra bytes, plus a redundant memory allocation). For other wrappers, the story is even worse, since the repeated variants of those fields will be written inefficiently (official Google's Protobuf serializer doesn't support packed encoding for non-numeric types).
Neither seems to be desirable. So, what's this all about?
There's a few reasons, mostly to do with where these are used - see struct.proto.
StringValue can be null, string often can't be in a language interfacing with protobufs. e.g. in Go strings are always set; the "zero value" for a string is "", the empty string, so it's impossible to distinguish between "this value is intentionally set to empty string" and "there was no value present". StringValue can be null and so solves this problem. It's especially important when they're used in a StructValue, which may represent arbitrary JSON: to do so it needs to distinguish between a JSON key which was set to empty string (StringValue with an empty string) or a JSON key which wasn't set at all (null StringValue).
Also if you look at struct.proto, you'll see that these aren't fully fledged message types in the proto - they're all generated from message Value, which has a oneof kind { number_value, string_value, bool_value... etc. By using a oneof struct.proto can represent a variety of different values in one field. Again this makes sense considering what struct.proto is designed to handle - arbitrary JSON - you don't know what type of value a given JSON key has ahead of time.
In addition to George's answer, you can't use a Protobuf primitive as the parameter or return value of a gRPC procedure.

Convert AnsiString to UnicodeString in Lazarus with FreePascal

I found similar topics here but none of them had the solution to my question, so I am asking it in a new thread.
Couple of days ago, I changed the format the preferences of an application I am developing is saved, from INI to JSON.
I use the jsonConf unit for this.
A sample of the code I use to save a key-value pair in the file would be like below.
Procedure TMyClass.SaveSettings();
var
c: TJSONConfig;
begin
c:= TJSONConfig.Create(nil);
try
c.Filename:= m_settingsFilePath;
c.SetValue('/Systems/CustomName', m_customName);
finally
c.Free;
end;
end;
In my code, m_customName is an AnsiString type variable. TJSONConfig.SetValue procedure requires the key and value both to be of UnicodeString type. The application compiles fine, but I get warnings such
Warning: Implicit strung type conversion from "AnsiString" to "UnicodeString".
Some messages warn saying there is a potential data loss.
Of course I can go and change everything to UnicodeString type but this is too risky. I have't seen any issues so far by ignoring these warnings, but they show up all the time and it might cause issues on a different PC.
How do I fix this?
To avoid the warning do an explicit conversion because this way you tell the compiler that you know what you are doing (I hope...). In case of c.SetValue the expected type is a Unicodestring (UTF16), m_customname should be declared as a string unless there is good reason to do differently (see below), otherwise you may trigger unwanted internal conversions.
A string in Lazarus is UTF8-encoded, by default. Therefore, you can use the function UTF8Decode() for the conversion from UTF8 to Unicode, or UTF8ToUTF16() (unit LazUtf8).
var
c: TJSONConfig;
m_customName: String;
...
c.SetValue('/Systems/CustomName', UTF8Decode(m_customName));
You say above that the key-value pairs are in a file. Then the conversion depends on the encoding of the file. Normally I open the file in a good text editor and find the encoding somewhere - NotePad++, for example, displays the name of the encoding in the right corner of the statusbar. Suppose the encoding is that of codepage 1252 (Latin-1). These are ansistrings, therefore, you can declare the strings read from the file as ansistring. Because UTF8 strings are so common in Lazarus there is no direct conversion from ansistring to Unicode, and you must convert to UTF8 first. In the unit lconvencoding you find many conversion routines between various encodings. Select CP1252toUTF8() to go to UTF8, and then apply UTF8Decode() to finally get Unicode.
var
c: TJSONConfig;
m_customName: ansistring;
...
c.SetValue('/Systems/CustomName', UTF8Decode(CP1252ToUTF8(m_customName)));
The FreePascal compiler 3.0 can handle many of these conversions automatically using strings with predefined encodings. But I think explicit conversions are very clear to see what is happening. And fpc3.0 still emits the warnings which you want to avoid...

Predefined Windows icons: Unicode

I am assigning to the lpszIcon member of the MSGBOXPARAMSW structure(notice the W). I want to use one of the predefined icons like IDI_APPLICATION or IDI_WARNING but they are all ASCII (defined as MAKEINTRESOURCE). I tried doing this:
MSGBOXPARAMSW mbp = { 0 };
mbp.lpszIcon = (LPCWSTR) IDI_ERROR;
but then no icon displayed at all. So how can I use the unicode versions of the IDI_ icons?
There is no ANSI or Unicode variant of a numeric resource ID. The code that you use to set lpszIcon is correct. It is idiomatic to use the MAKEINTRESOURCE macro rather than a cast, but the cast has identical meaning. Your problem lies in the other code, the code that we cannot see.
Reading between the lines, I think that you are targeting ANSI or MBCS. You tried to use MAKEINTRESOURCE but that expands to MAKEINTRESOURCEA. That's what led you to cast. You should have used MAKEINTRESOURCEW to match MSGBOXPARAMSW. That would have resolved the compilation error you encountered. You could equally have changed the project to target UNICODE.
But none of that explains why the icon does not appear in the dialog. There has to be a problem elsewhere. If the dialog appears then the most likely explanation is that you have set hInstance to a value other than NULL. But the code to set lpszIcon is correct, albeit not idiomatic.

convert case of wide characters, given the LCID (Visual C++)

I have some existing Visual C++ code where I need to add the conversion of wide character strings to upper or lower case.
I know there are pitfalls to this (such as the Turkish "I"), but most of these can be ironed-out if you know the language. Fortunately in this area of code I know the LCID value (locale ID) which I guess is the same as knowing the language.
As LCID is a Windows type, is there a Windows function that will convert wide strings to upper or lower case?
The C runtime function _towupper_l() sounds like it would be ideal but it takes a _locale_t parameter instead of LCID, so I guess it's unsuitable unless there is a completely reliable way of converting an LCID to a _locale_t.
The function you're searching for is called LCMapString and it is part of the Windows NLS APIs. The LCMAP_UPPERCASE flag maps characters to uppercase, while the LCMAP_LOWERCASE maps characters to lowercase.
For applications targeting Windows Vista and later, there is an Ex variant that works on locale names instead of identifiers, which are what Microsoft now says you should prefer to use.
In fact, in the CRT implementation provided with VS 2010 (and presumably other versions as well), functions such as _towupper_l ultimately end up calling LCMapString after they extract the locale ID (LCID) from the specified _locale_t.
If you're like me, and less familiar with the i8n APIs than you should be, you probably already know about the CharUpper, CharLower, CharUpperBuff, and CharLowerBuff family of functions. These have been the old standbys from the early days of Windows for altering the case of chars/strings, but as their documentation warns:
Note that CharXxx always maps uppercase I to lowercase I ("i"), even when the current language is Turkish or Azeri. If you need a function that is linguistically sensitive in this respect, call LCMapString.
What it neglects to mention is filled in by a couple of posts on Michael Kaplan's wonderful blog on internationalization issues: What does "linguistic casing" mean?, How best to alter case. The executive summary is that you achieve the same results as the CharXxx family of functions by calling LCMapString and not specifying the LCMAP_LINGUISTIC_CASING flag, whereas you can be linguistically sensitive by ensuring that you do specify the LCMAP_LINGUISTIC_CASING flag.
Sample code:
std::wstring test("Does my code pass the Turkey test?");
if (!LCMapStringW(lcid, /* your LCID, defined elsewhere */
LCMAP_UPPERCASE | LCMAP_LINGUISTIC_CASING,
test.c_str(), /* input string */
test.length(), /* length of input string */
&test[0], /* output buffer (can reuse input) */
test.length())) /* length of output buffer (same as input) */
{
// Uh-oh! Something went wrong in the call to LCMapString, so you need to
// handle the error somehow here.
// A good start is calling GetLastError to determine the error code.
}

Convert NSValue to NSDecimalNumber

I've got a plist (created in XCode) with an array full of "Numbers" (0.01, 1, 2, 6) that unpacks into NSValues when reconstituted with initWithContentsOfFile. How can I turn these NSValues into NSDecimalNumbers that I can use for adding together? They will be treated as currency values so only need precision of 2 (maybe 4) decimal places.
I've tried saving the plist values as "String" instead of "Number" and using NSDecimalNumber's initWithString to set the value but then NSValue doesn't respond to stringValue.
Seems like dealing with numbers is particularly confusing in Cocoa. so many container formats in so many frameworks... :-(
You should be able to directly store your numbers as strings in the property list. You don't need to do any NSValue wrapping for NSStrings when storing them in a plist. I'd recommend keeping the numbers in your application as NSDecimals or NSDecimalNumbers to avoid any floating-point errors, reading them from the plist using initWithString:locale:, and writing them to the plist using descriptionWithLocale:. Storing and retrieving the decimals as strings avoids any to-and-from floating point conversion errors.
The first lesson to learn is that when representing currency, use integers instead of floating-point (decimal) numbers if you want any kind of accuracy. (Divide by 100.0 whenever you need to display cents, etc.) Computers are flawless with binary (base 2) but if you try to represent in binary something that can't be broken down into factors of 1/(2^n), you'll run into precision errors. (Try 0.1 + 0.1 and see what you get.)
That said, the XML tag within which you specify the number definitely makes a difference in how the values are interpreted in terms of Cocoa classes when you use something like -[NSArray initWithContentsOfFile:] to "reconstitute" it. Consult the plist man page and this Apple article for more details and examples.
To accomplish what you're asking, make sure you're using <real> or <integer> (and the matching closing tag) around the values in your plist. (Property List Editor and Xcode should automatically use the correct one based on whether the number has a decimal point.) In my tests, both real and integer numbers were read in as NSNumber objects. NSDecimalNumber is a subclass of NSNumber, but I'm not entirely sure how the toll-free bridging with CFNumber works in all cases. Experimentation is probably the best way to figure that out.

Resources