How to change an NSString's encoding? - cocoa

I have an NSArray of NStrings, I got this from NSLog when printing the array.
Here is the code I have implemented:
NSMetadataQuery *query = [[NSMetadataQuery alloc] init];
.....
NSArray *queryResults = [[query results] copy];
for (NSMetadataItem *item in queryResults)
{
id value = [item valueForAttribute: kMDItemAlbum];
[databaseArray addObject: value];
}
"The Chronicles Of Narnia: Prince Caspian",
"Taste the First Love",
"Once (Original Soundtrack)",
"430 West Presents Detroit Calling",
"O\U0308\U00d0\U00b9u\U0301\U00b0\U00aeA\U0300O\U0308A\U0300O\U0308I\U0301A\U030a-O\U0301a\U0300A\U0302\U00a1",
"\U7ea2\U96e8\U6d41\U884c\U7f51",
"I\U0300\U00ab\U00bc\U00abO\U0303A\U030aE\U0300y\U0301\U00b7a\U0301",
"A\U0303n\U0303\U00b8e\U0300\U00b2I\U0300C\U0327U\U0300",
"\U00bb\U00b3A\U0308i\U0302O\U0303\U00bdO\U0301N\U0303",
"American IV (The Man Comes Aro",
"All That We Needed",
Now how can I change the human-unreadable strings to human-readable strings? Thanks.

Looking past the escaping done by description (e.g., \U0308), the strings are wrong (e.g., “Öйú°®ÀÖÀÖÍÅ-Óà¡”) because the data you got was wrong.
That's probably not Spotlight's fault. (You could verify that by trying a different ID3-tag library.) Most probably, the files themselves contain poorly-encoded tags.
To fix this:
Encode it in the 8-bit encoding that matches the characters. You can't just pick an encoding (like “ASCII”, which Cocoa mapped to ISO Latin 1 the last time I checked) at random; you need to use the encoding that contains all of the characters in the input and encodes them correctly for what you're going to do next. Try ISO Latin 1, ISO Latin 9, Windows codepage 1252, and MacRoman, in that order.
Decode the encoded data as UTF-8. If this fails, go back to step 1 and try a different encoding.
If step 2 succeeds on any attempt, that is your valid data (unless you're very unlucky). If it fails on all attempts, the data is unrecoverable and you may want to warn the user that their input files contain bogus tags.

Parsing these kind of strings aren't particularly easy: See this SO post for background. It's got links to other SO posts with specific ways of handling this problem.

These strings are utf-8 encoded. You can decode them by:
NSString *myDecoded = [NSString stringWithUTF8String:myEscapedString];
So to process your complete array 'completeArray' you can convert to a const char* first and then back into NSString:
NSMutableArray *processed = [NSMutableArray arrayWithCapacity:completeArray.count];
for (NSString* s in completeArray) {
[processed addObject:[NSString stringWithUTF8String:[s cStringUsingEncoding:ASCIIEncoding]]];
}

Related

Displaying currency signs in UILabel on Xcode

I am trying to display the gliphs for a currencies by using either the html format or unicode one. By using the former I tried all sorts of operations including: stringWithUTF8String, decodeFromPercentEscapeString, CFURLCreateStringByReplacingPercentEscapesUsingEncoding, stringByReplacingPercentEscapesUsingEncoding but none of them succeded in turning € into a euro sign. With Unicode the issue was slightly better as:
NSString *aStr = [NSString stringWithUTF8String:[#"\u20ac" UTF8String]];
actually prints a euro sign, but for I reason I do not understand, if I provide the string as the result of a method, the unicode code gets displayed instead.
What is the standard way for displaying euro, dollar or pounds signs in a UILbel?
UILabel automatically resolves unicode strings, no need to decode:
label.text = #"\u0024"; // dollar
label.text = #"\u20ac"; // euro
Refer to fileformat.info for the encoding name.
What about directly copying and pasting those characters in your string as :
NSString *aStr = #"Euro-€, Dollar-$, Pound-£";
label.text=aStr;
I found a very simple solution:
float fareValue=/*float value*/;
NSNumber* fareNumber=[NSNumber numberWithFloat:fareValue];
NSString* formatted=[NSNumberFormatter localizedStringFromNumber:fareNumber numberStyle:NSNumberFormatterCurrencyStyle];
Thanks everyone
Swift 3
"\u{00A3}"
The '#' is no longer needed

ASCII to NSData

This is another crack at my MD5 problem. I know the issue is with the ASCII character © (0xa9, 169). Either it is the way I am inserting the character into the string or its a higher vs lower byte problem.
If I
NSString *source = [NSString stringWithFormat:#"%c", 0xa9];
NSData *data = [source dataUsingEncoding:NSASCIIStringEncoding];
NSLog(#"\n\n ############### source %# \ndata desc %#", source, [data description]);
CC_MD5([data bytes], [data length], result);
return [NSString stringWithFormat:
#"%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x",
result[0], result[1], result[2], result[3],
result[4], result[5], result[6], result[7],
result[8], result[9], result[10], result[11],
result[12], result[13], result[14], result[15]
];
Result:
######### source ©
[data description] = (null)
md5: d41d8cd98f00b204e9800998ecf8427e
values: int 169 char ©
When I change the encoding to
NSData *data = [NSData dataWithBytes:[source UTF8String] length:[source length]];
The result is
######### source ©
[data description] = "<"c2>
md5: 6465dad1d31752be3f3283e8f70feef7
When I change the encoding to
NSData *data = [NSData dataWithBytes:[source UTF8String] length:[source lengthOfBytesUsingEncoding:NSUTF8StringEncoding]];
The result is
############### source © len 2
[data description] = "<"c2a9>
md5: a541ecda3d4c67f1151cad5075633423
When I run the same function in Java I get
">>>>> msg## \251 \251
md5 a252c2c85a9e7756d5ba5da9949d57ed
The question is what is the best way to get the same byte in objC as I get in Java?
“ASCII to NSData” makes no sense, because ASCII is an encoding; if you have encoded characters, then you have data.
An encoding is a transformation of ideal Unicode characters (code points) into one-or-more-byte units (code units), possibly in sequences such as UTF-16's surrogate pairs.
An NSString is more or less an ideal Unicode object. It contains the characters of the string, in Unicode, irrespective of any encoding*.
ASCII is an encoding. UTF-8 is also an encoding. When you ask the string for its UTF8String, you are asking it to encode its characters as UTF-8.
NSData *data = [NSData dataWithBytes:[source UTF8String] length:[source length]];
The result is
######### source ©
[data description] = "<"c2>
That's because you passed the wrong length. The string's length (in characters) is not the same as the number of code units (bytes, in this case) in some encoding.
The correct length is strlen([source UTF8String]), but it's easier for you and faster at run time to use dataUsingEncoding: to ask the string to create the NSData object for you.
When I change the encoding to
NSData *data = [NSData dataWithBytes:[source UTF8String] length:[source lengthOfBytesUsingEncoding:NSUTF8StringEncoding]];
You didn't change the encoding. You're still encoding it as UTF-8.
Use dataUsingEncoding:.
The question is what is the best way to get the same byte in objC as I get in Java?
Use the same encoding.
There is no such thing as “extended ASCII”. There are several different encodings that are based on (or at least compatible with) ASCII, including ISO 8859-1, ISO 8859-9, MacRoman, Windows codepage 1252, and UTF-8. You need to decide which one you mean and tell the string to encode its characters with that.
Better yet, continue using UTF-8—it is almost always the right choice for mostly-ASCII text—and change your Java code instead.
NSData *data = [source dataUsingEncoding:NSASCIIStringEncoding];
Result:
[data description] = (null)
True ASCII can only encode 128 possible characters. Unicode includes all of ASCII unchanged, so the first 128 code points in Unicode are what ASCII can encode. Anything else, ASCII cannot encode.
I've seen NSASCIIStringEncoding behave as equivalent to NSISOLatin1StringEncoding before; it sounds like they might have changed it to be a pure ASCII encoding, and if that's the case, that's a good thing. There is no copyright symbol in ASCII. What you see here is the correct result.
*This is not quite true; the characters are exposed as UTF-16, so any characters outside the Basic Multilingual Plane are exposed as surrogate pairs, not whole characters as they would be in a truly ideal string object. This is a trade-off. In Swift, the built-in String type is a perfect ideal Unicode object; characters are characters, never divided until encoded. But when working with NSString (whether in Swift or in Objective-C), as far as you are concerned, you should treat it as an ideal string.
Thanks to GBegan's explanation in another post I was able to cobble this together.
for(int c = 0; c < [s length]; c++){
int number = [s characterAtIndex:c];
unsigned char c[1];
c[0] = (unsigned char)number;
NSMutableData *oneByte = [NSMutableData dataWithBytes:&c length:1];
}

NSString's isEqualToString: seems to erroneously report non-equality

I'm trying to compare the equality of two multi-line strings. I'm getting one of the strings from a web service, and the other I'm getting from iTunes via the Scripting Bridge. The strings from the web service are eventually transferred to iTunes, so if I do that and then re-compare the strings, ideally they'd be identical.
However, when comparing strings like this, it seems that isEqualToString: always returns non-equality. I'm testing this by testing equality of a string from iTunes that originally came from the web service, and a string directly from the web service.
Logging both strings to the Console produces output from both strings that appears identical. Logging the lengths of the strings produce identical lengths.
I've also tried comparing the strings using some other methods. For example, I converted them to ASCII strings to make sure it wasn't some Unicode issue:
NSData *iTunesStringData = [[self iTunesString] dataUsingEncoding:NSASCIIStringEncoding
allowLossyConversion:YES];
NSData *webServiceStringData = [[self webServiceString] dataUsingEncoding:NSASCIIStringEncoding
allowLossyConversion:YES];
NSString *newiTunesString = [[[NSString alloc] initWithData:iTunesStringData encoding:NSASCIIStringEncoding] autorelease];
NSString *newWebServiceString = [[[NSString alloc] initWithData:webServiceStringData encoding:NSASCIIStringEncoding] autorelease];
BOOL result = [newiTunesString isEqualToString:newWebServiceString];
Same problem, not equal. I've tried comparing just the first character:
NSComparisonResult result = [newiTunesString compare:newWebServiceString
options:NSLiteralSearch
range:NSMakeRange(0,1)
locale:[NSLocale currentLocale]];
Does not return NSOrderedSame. I've logged these first characters to the Console and they seem identical. I also considered differences in carriage returns, and tried replacing #"\r" with #"" in both strings before comparing, which doesn't work (and besides, that shouldn't affect equality of just the first character). I don't want to remove #"\n" characters because I want to preserve the multiple lines.
What's going on? Any other ideas?
It turns out this problem was related to line endings. But since I'm comparing multi-line strings, I didn't want to completely strip out the newlines. I normalized the line endings like so:
NSString *normalizediTunesString = [[[self iTunesString] componentsSeparatedByCharactersInSet:[NSCharacterSet newlineCharacterSet]] componentsJoinedByString:#"\n"];
NSString *normalizedWebServiceString = [[[self webServiceString] componentsSeparatedByCharactersInSet:[NSCharacterSet newlineCharacterSet]] componentsJoinedByString:#"\n"];
Then, comparing the strings via compare: worked as expected.
Just guessing here but maybe use this clean up your strings
stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]]

Converting NSData to an NSString representation is failing

I have an NSData object which I am trying to turn into an NSString using the following line of code:
NSString *theData = [[NSString alloc] initWithData:photo encoding:NSASCIIStringEncoding];
Unfortunately I am getting the following result, instead of my desired binary output (can I expect a binary output here?);
ÿØÿà
I'd appreciate any help.
Thanks. Ricky.
If you want to transform some arbitrary binary data into a human readable string (for example, of a series of hex values) you are using the wrong method. What you are doing is interpreting the data itself as a string in ASCII encoding.
To simply log the data to a file or to stdout, you can use [theData description].
What you mean by "binary output" is unclear. If you're expecting the string to contain text along the lines of "01010100011110110" or "0x1337abef", you are mistaken about how NSString works. NSString's initWithData:encoding: tries to interpret the data's bytes as though they were the bytes of a string in a particular encoding. It's the opposite of NSString's dataUsingEncoding: — you can call initWithData:encoding: with the result of dataUsingEncoding: and get back the exact same string.
If you want to transform the data into, say, a human-readable string of hex digits, you'll need to do the transformation yourself. You could do something like this:
NSMutableString *binaryString = [NSMutableString stringWithCapacity:[data length]];
unsigned char *bytes = [data bytes];
for (int i = 0; i < [data length]; i++) {
[binaryString appendFormat:#"%02x", bytes[i]];
}
You cannot parse binary data with the initWithData: method. If you want the hexadecimal string of the contents then you can use the description method of NSData.

Easy way to set a single character of an NSString to uppercase

I would like to change the first character of an NSString to uppercase. Unfortunately, - (NSString *)capitalizedString converts the first letter of every word to uppercase. Is there an easy way to convert just a single character to uppercase?
I'm currently using:
NSRange firstCharRange = NSMakeRange(0,1);
NSString* firstCharacter = [dateString substringWithRange:firstCharRange];
NSString* uppercaseFirstChar = [firstCharacter originalString];
NSMutableString* capitalisedSentence = [originalString mutableCopy];
[capitalisedSentence replaceCharactersInRange:firstCharRange withString:uppercaseFirstChar];
Which seems a little convoluted but at least makes no assumptions about the encoding of the underlying unicode string.
Very similar approach to what you have but a little more condense:
NSString *capitalisedSentence =
[dateString stringByReplacingCharactersInRange:NSMakeRange(0,1)
withString:[[dateString substringToIndex:1] capitalizedString]];
Since NSString is immutable, what you have seems to be a good way to do what you want to do. The implementations of (NSString*)uppercaseString and similar methods probably look very much like what you've written, as they return a new NSString instead of modifying the one you sent the message to.
I had a similar requirement, but it was for characters within the string. This assuming i is your index to the character you want to uppercase this worked for me:
curword = [curword stringByReplacingCharactersInRange:NSMakeRange(i,1)
withString:[[curword substringWithRange:NSMakeRange(i, 1)] capitalizedString]];
If you profile these solutions they are much slower then doing this:
NSMutableString *capitolziedString = [NSMutableString stringWithString:originalString];
NSString *firstChar = [[capitolziedString substringWithRange:NSMakeRange(0,1)] uppercaseString];
[capitolziedString replaceCharactersInRange:NSMakeRange(0, 1) withString:firstChar];
in testing on an iphone 4 running iOS 5:
#doomspork's solution ran in 0.115750 ms
while above ran in 0.064250 ms;
in testing on an Simulator running iOS 5:
#doomspork's solution ran in 0.021232 ms
while above ran in 0.007495 ms;
Aiming for maximum readability, make a category on NSString and give it this function:
NSString *capitalizeFirstLetter(NSString *string) {
NSString *firstCapital = [string substringToIndex:1].capitalizedString;
return [string stringByReplacingCharactersInRange:NSMakeRange(0, 1) withString:firstCapital];
}
Then in your code where you want it:
NSString *capitalizedSentence = capitalizeFirstLetter(dateString);
This kind of code rarely belongs in the spot where you need it and should generally be factored away into a utility class or a category to improve legibility.

Resources