Cocoa: Extracting "A" from "Æ"

Cocoa: Extracting "A" from "Æ" - cocoa

I have a bunch of NSStrings from which I would like to grab the first character of and match them up in the range A-Z and # as a catch all for things that don't apply.
Different graphemes (I believe that's the correct word after some wiki'ing) have been giving me trouble. For example, I would like to extract A from "Æ".
I have taken a look at CFStringTransform, normalize and fold but none of had the desired effect.
Is there a reliable way of doing this? All the strings I'm working with are UTF8 if that makes a difference.

Æ cannot be broken down into components. It is not a compound glyph of A+E, but is a separate glyph. Compound glyphs are things like a+`

The thing about "Æ" is that it is an ascii character in itself. Not a combination of two different characters so you can't extract the A from it because it is only 1 Character.
Edit:
Although you could perform a check to see if the String equals "Æ" and if it does tell it to switch it with "A" or convert it to its dec, form and subtract 81 which would give you an "A".

Did you want to get rid of all æ?
This should work if you do.
NSString *string = #"Æaæbcdef";
string = [string stringByReplacingOccurrencesOfString:#"æ" withString:#"a"];
string = [string stringByReplacingOccurrencesOfString:#"Æ" withString:#"A"];
Edit
Rereading, you only seem to want the first character:
NSString *string = #"Æaæbcdef";
NSString *firstChar = [string substringToIndex:1];
firstChar = [firstChar stringByReplacingOccurrencesOfString:#"æ" withString:#"a"];
firstChar = [firstChar stringByReplacingOccurrencesOfString:#"Æ" withString:#"A"];
NSString *finalString = [NSString stringWithFormat:#"%#%#", firstChar, [string substringFromIndex:1]];

Related

Add "newline" character in localizable.strings

How to add a newline character in localizable.strings?
I tried putting \n, but no success.

Using \n should just work. With this line in "Localizable.strings":
"abc" = "foo\nbar";
and this code:
NSString *s = NSLocalizedString(#"abc", NULL);
NSLog(#"%#", s);
I get the output
2013-05-02 14:14:45.931 test[4088:c07] foo
bar

Just adding newlines in the .strings file also works
"str" = "Hi ,
this is .
in a new line,
";

This works in an UILabel and UITextview as long as you set the appropriate line number:
testLabel.numberOfLines = 2;
You could also set this to 0 which is automatic line count, also you should ensure that your label is big enough to show multiple lines, or else it will be cut off.

this will not work in localizable.strings you have to create two keys and then only you can manage \n between two localizable strings during the concatination of strings.

Unicode with format

I want to add a bunch of Emoji icons to an array. From my earlier question I found out how to write the Emoji icons in an NSString.
Now I want to make a loop and add these icons to an array. This should be fairly easy as the unicodes are in certain ranges so something like the following should do it:
for (int i = 0; i < 10; i++)
[someArray addObject:[NSString stringWithFormat:#"\U0001F43%i", i]];
Problem is, when doing so I get an error saying:
Incomplete universal character name.
Does anyone know of a way to do this?

That's because the escape sequence \Uxxxxxxxx is evaluated by the compiler which replaces it with the corresponding Unicode code point. Then when the method stringWithFormat: will replace the format specifier %i with the decimal representation of i. The final string is the concatenation of the characters corresponding to \Uxxxxxxxx and the characters representing i. stringWithFormat: replaces characters with other characters ; it doesn't alter existing characters.
But the problem is, here the compiler sees an incomplete escape sequence as you only wrote 7 hexadecimal digits. So it's not able to generate the string and raises an error.
The solution is to generate the character (a simple integer value) at runtime and create a string with it using +[NSString stringWithCharacters:length].
But if you look in the headers, you'll see that NSString stores its characters as unichar which is defined as an unsigned short, i.e a 16 bits-long value, whereas the Unicode code point U+1F430 (🐰) requires at least 17 bits.
So you cannot use a single unichar character to represent that code point. But don't worry: you can use two characters to represent it.
You're lost? Here the explanation! Unicode doesn't define characters, it defines code points which are arbitrary integers values in the range U+0000 – U+10FFFF. Then, the implementation decides how to represent those code point using characters. The implementation may use any data type it wants as characters as long as it manages to represent all valid code points. The simplest solution would be to use 32 bits-long integers but that would require too much memory as most of the code point you use are in the first Unicode plan (U+0000 – U+FFFF). So NSString stores the code points with the UTF-16 encoding which uses 16 bits-long characters.
In UTF-16, every code point beyond U+FFFF is stored using a pair of characters (known as a surrogate pair) in the range 0xD800 – 0xDFFF (the corresponding code points are explicitly reserved in the Unicode standard).
In conclusion, any valid Unicode code point may be represented using one or two unichar characters. The method to do so is described there. And here is a simple implementation:
static NSString *stringWithCodePoint(uint32_t codePoint)
{
// NOTE: As I edited the answer, you'll find a simpler implementation of
// this function below
unichar characters[2];
NSUInteger length;
if ( codePoint <= 0xD7FF || (codePoint >= 0xE000 && codePoint <= 0xFFFF) ) {
characters[0] = codePoint;
length = 1;
}
if ( codePoint >= 0x10000 && codePoint <= 0x10ffff ) {
codePoint -= 0x10000;
characters[0] = 0xD800 + (codePoint >> 10);
characters[1] = 0xDC00 + (codePoint & 0x3ff);
length = 2;
}
else {
length = 0; // invalid code point
}
return [NSString stringWithCharacters:characters length:length];
}
Now that we can generate a string from any valid code point, we just need to update the code to use the function we wrote before:
for (int i = 0; i < 10; i++)
[someArray addObject:stringWithCodePoint(0x0001F430 + i)];
EDIT: I just figured out a simpler method to get a NSString from a code point. It works by using -[NSString initWithBytes:length:encoding:] and the NSUTF32StringEncoding encoding:
static NSString *stringWithCodePoint(uint32_t codePoint)
{
NSString *string = [[NSString alloc] initWithBytes:&codePoint length:4 encoding:NSUTF32StringEncoding];
// You may remove the next 3 lines if you use ARC
#if ! __has_feature(objc_arc)
[string autorelease];
#endif
return string;
}

Note this similar question. As one of its answers explains, backslash escapes in a string literal are evaluated at compile time. If you want to make a Unicode character using a \Uxxxx escape, the xxxx all need to be numbers in the string literal.
What you can do instead, as per another answer is use the format specifier %C -- not together with the \Uxxxx escape, but on its own -- and pass in the full character code as an integer. (Actually, a wchar_t, which is a 32-bit integer on Mac OS X now, which you'll need since the character code you're looking for is more than 16 bits long.) To put this together with a base, you can just add the integers:
wchar_t base = 0x0001F430; // unfamiliar? we start with 0x for hexadecimal integers
for (int i = 0; i < 10; i++)
[someArray addObject:[NSString stringWithFormat:#"%C", base + i]];
There's also stringWithCharacters: but that explicitly takes a (16-bit) unichar, so you'd need to use a character sequence to encode your emoji in UTF-16.

Use %C instead of %i
so:
[someArray addObject:[NSString stringWithFormat:#"\U0001F43%C", i]];

Proper validation with [NSScanner: scanInteger] in Cocoa

I am converting a string to signed integer via NSScanner:scanInteger, but it seems to be accepting values such as '123abc' as '123', instead of throwing a error on invalid input.
I can do my own custom validation, but would prefer to find an API which will do the conversion and fail on '123abc'.
By the way, 'abc123' does fail with scanInteger, which is good.

I don't think that using a scanner is the way to do this -- you could, but there are easier ways. I would use the NSString method rangeOfCharactersFromSet: to check for non-digit characters.
NSCharacterSet *notDigits = [[NSCharacterSet decimalDigitCharacterSet] invertedSet];
NSUInteger nonDigits = [enteredString rangeOfCharacterFromSet:notDigits].length;
If nonDigits is not zero, then the user has entered something other than a digit. If you want to allow decimal points then you would have to create your own set that contains everything other than digits and the decimal point.

Split NSString preserving quoted substrings

I need to split up a string which is separated by commas while preserving any quoted substrings (which may have commas too).
String example:
NSString *str = #"One,Two,\"This is part three, I think\",Four";
for (id item in [str componentsSeparatedByString:#","])
NSLog(#"%#", item);
This returns:
One
Two
"This is part three
I think"
Four
The correct result (respecting quoted substrings) should be:
One
Two
"This is part three, I think"
Four
Is there a reasonable way to do this, without re-inventing or re-writing quote-aware parsing routines?

Let's think about this a different way. What you have is a comma-seperated string, and you want the fields in the string.
There's some code for that:
https://github.com/davedelong/CHCSVParser
And you'd do this:
NSString *str = #"One,Two,\"This is part three, I think\",Four";
NSArray *lines = [str CSVComponents];
for (NSArray *line in lines) {
for (NSString *field in line) {
NSLog(#"field: %#", field);
}
}

Here is a C# answer to the same problem.
C# Regex Split - commas outside quotes
You could probably use the same Regex in Objective-C
NSString split Regex with ,(?=(?:[^']*'[^']*')*[^']*$)

NSFileManager contentsOfDirectoryAtPath encoding problem with samba path

i mount a SMB path using this code
urlStringOfVolumeToMount = [urlStringOfVolumeToMount stringByAddingPercentEscapesUsingEncoding:NSMacOSRomanStringEncoding];
NSURL *urlOfVolumeToMount = [NSURL URLWithString:urlStringOfVolumeToMount];
FSVolumeRefNum returnRefNum;
FSMountServerVolumeSync( (CFURLRef)urlOfVolumeToMount, NULL, NULL, NULL, &returnRefNum, 0L);
Then, i get the content of some paths :
NSMutableArray *content = (NSMutableArray *)[[NSFileManager defaultManager] contentsOfDirectoryAtPath:path error:&error];
My problem is every path in "content" array containing special chars (ü for example) give me 2 chars encoded : ü becomes u¨
when i log bytes using :
[contentItem dataUsingEncoding:NSUTF8StringEncoding];
it gives me : 75cc88 which is u (75) and ¨(cc88)
What i expected is the ü char encoded in utf-8. In bytes, it should be c3bc
I've tried to convert my path using ISOLatin1 encoding, MacOSRoman... but as long as the content path already have 2 separate chars instead of one for ü, any conversion give me 2 chars encoded...
If someone can help, thanks
My configuration : localized in french and using snow leopard.

urlStringOfVolumeToMount = [urlStringOfVolumeToMount stringByAddingPercentEscapesUsingEncoding:NSMacOSRomanStringEncoding];
Unless you specifically need MacRoman for some reason, you should probably be using UTF-8 here.
NSMutableArray *content = (NSMutableArray *)[[NSFileManager defaultManager] contentsOfDirectoryAtPath:path error:&error];
My problem is every path in "content" array containing special chars (ü for example) give me 2 chars encoded : ü becomes u¨
You're expecting composed characters and getting decomposed sequences.
Since you're getting the pathnames from the file-system, this is not a problem: The pathnames are correct as you're receiving them, and as long as you pass them to something that does Unicode right, they will display correctly as well.

Well, four years later I'm struggling with the same thing but for åäö in my case.
Took a lot of time to find the simple solution.
NSString has the necessary comparator built in.
Comparing aString with anotherString where one comes from the array returned by NSFileManagers contentsOfDirectoryAtPath: is as simple as:
if( [aString compare:anotherString] == NSOrderedSame )
The compare method takes care of making both the strings into a comparable canonical format. In effect making them "if they look the same, they are the same"

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Cocoa: Extracting "A" from "Æ" - cocoa

Æ cannot be broken down into components. It is not a compound glyph of A+E, but is a separate glyph. Compound glyphs are things like a+`

Related

Add "newline" character in localizable.strings

Unicode with format

Proper validation with [NSScanner: scanInteger] in Cocoa

Split NSString preserving quoted substrings

NSFileManager contentsOfDirectoryAtPath encoding problem with samba path

Categories

Resources