Best way to escape characters like newline and double-quote in NSString - cocoa

Say I have an NSString (or NSMutableString) containing:
I said "Hello, world!".
He said "My name's not World."
What's the best way to turn that into:
I said \"Hello, world!\".\nHe said \"My name\'s not World.\"
Do I have to manually use -replaceOccurrencesOfString:withString: over and over to escape characters, or is there an easier way? These strings may contain characters from other alphabets/languages.
How is this done in other languages with other string classes?

stringByAddingPercentEscapesUsingEncoding:NSUTF8StringEncoding

I don't think there is any built-in method to "escape" a particular set of characters.
If the characters you wish to escape is well-defined, I'd probably stick with the simple solution you proposed, replacing the instances of the characters crudely.
Be warned that if your source string already has escaped characters in it, then you'll probably want to avoid "double-escaping" them. One way of achieving this would be to go through and "unescape" any escaped character strings in the string before then escaping them all again.
If you need to support a variable set of escaped characters, take a look at the NSScanner methods "scanUpToCharactersFromSet:intoString:" and "scanCharactersFromSet:intoString:". You could use these methods on NSScanner to cruise through a string, copying the parts from the "scanUpTo" section into a mutable string unchanged, and copying the parts from a particular character set only after escaping them.

This will escape double quotes in NSString:
NSString *escaped = [originalString stringByReplacingOccurrencesOfString:#"\"" withString:#"\\\""];
So you need to be careful and also escape the escape character...

I think in cases like these, it's useful to operate on a character at a time, either in UniChars or UTF8 bytes. If you're using UTF-8, then vis(3) will do most of the work for you (see below). Can I ask why you want to escape a single-quote within a double-quoted string? How are you planning to handle multi-byte characters? In the example below, I'm using UTF-8, encoding 8-bit characters using C-Style octal escapes. This can also be undone by unvis(3).
#import <Foundation/Foundation.h>
#import <vis.h>
#interface NSString (Escaping)
- (NSString *)stringByEscapingMetacharacters;
#end
#implementation NSString (Escaping)
- (NSString *)stringByEscapingMetacharacters
{
const char *UTF8Input = [self UTF8String];
char *UTF8Output = [[NSMutableData dataWithLength:strlen(UTF8Input) * 4 + 1 /* Worst case */] mutableBytes];
char ch, *och = UTF8Output;
while ((ch = *UTF8Input++))
if (ch == '\'' || ch == '\'' || ch == '\\' || ch == '"')
{
*och++ = '\\';
*och++ = ch;
}
else if (isascii(ch))
och = vis(och, ch, VIS_NL | VIS_TAB | VIS_CSTYLE, *UTF8Input);
else
och+= sprintf(och, "\\%03hho", ch);
return [NSString stringWithUTF8String:UTF8Output];
}
#end
int
main(int argc, const char *argv[])
{
NSAutoreleasePool *pool = [NSAutoreleasePool new];
NSLog(#"%#", [#"I said \"Hello, world!\".\nHe said \"My name's not World.\"" stringByEscapingMetacharacters]);
[pool drain];
return 0;
}

This is a snippet I have used in the past that works quite well:
- (NSString *)escapeString:(NSString *)aString
{
NSMutableString *returnString = [[NSMutableString alloc] init];
for(int i = 0; i < [aString length]; i++) {
unichar c = [aString characterAtIndex:i];
// if char needs to be escaped
if((('\\' == c) || ('\'' == c)) || ('"' == c)) {
[returnString appendFormat:#"\\%c", c];
} else {
[returnString appendFormat:#"%c", c];
}
}
return [returnString autorelease];
}

Do this:
NSString * encodedString = (NSString *)CFURLCreateStringByAddingPercentEscapes(
NULL,
(CFStringRef)unencodedString,
NULL,
(CFStringRef)#"!*'();:#&=+$,/?%#[]",
kCFStringEncodingUTF8 );
Reference: http://simonwoodside.com/weblog/2009/4/22/how_to_really_url_encode/

You might even want to look into using a regex library (there are a lot of options available, RegexKit is a popular choice). It shouldn't be too hard to find a pre-written regex to escape strings that handles special cases like existing escaped characters.

Related

Xcode scanf char count

I found out in Xcode command line tool you can enter int into the code yourself with scanf.
When I tried this for a NSString, it didn't worked, and I found out scanf returns an integer, so my question is, what do you use to enter a NSString and save it into a variable, like:
int number;
scanf("%i", &number);
EDIT:
Now I found a code, but it only shows the first char:
char naamchar[40];
int nChars = scanf("%39s", naamchar);
NSString * naam = [[NSString alloc] initWithBytes:naamchar
length:nChars
encoding:NSUTF8StringEncoding];
naam is only 1 char :(
EDIT SOLUTION:
char naamchar[40];
scanf("%39s", naamchar);
NSString * naam = [NSString stringWithCString:naamchar encoding:NSUTF8StringEncoding];
...
man 3 scanf mentions:
These functions return the number of input items assigned.
nChars is set returning the number of items matched, in this case 1, not the number of characters in the string.
try replacing nChars with strlen(naamchar) i.e.
char naamchar[40];
int nChars = scanf("%39s", naamchar);
NSString * naam = [[NSString alloc] initWithBytes:naamchar
length:strlen(naamchar)
encoding:NSUTF8StringEncoding];
be sure to check for nChars == 0, which would indicate that there was no input to scan.

Most efficient way to pull first non-whitespace line from NSTextView?

What is the most efficient way to pull the first non-whitespace line from an NSTextView?
For example, if the text is:
\n
\n
\n
This is the text I want \n
\n
Foo bar \n
\n
The result would be "This is the text I want".
Here is what I have:
NSString *content = self.textView.textStorage.string;
NSInteger len = [content length];
NSInteger i = 0;
// Scan past leading whitespace and newlines
while (i < len && [[NSCharacterSet whitespaceAndNewlineCharacterSet] characterIsMember:[content characterAtIndex:i]]) {
i++;
}
// Now, scan to first newline
while (i < len && ![[NSCharacterSet newlineCharacterSet] characterIsMember:[content characterAtIndex:i]]) {
i++;
}
// Grab the substring up to that newline
NSString *resultWithWhitespace = [content substringToIndex:i];
// Trim leading and trailing whitespace/newlines from the substring
NSString *result = [resultWithWhitespace stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]];
Is there a better, more efficient way?
I'm thinking of putting this in the -textStorageDidProcessEditing: NSTextStorageDelegate method so I can get it as the text is edited. That's why I'd like the method to be as efficient as possible.
Just use NSScanner which is designed for this sort of thing:
NSString* output = nil;
NSScanner* scanner = [NSScanner scannerWithString:yourString];
[scanner scanCharactersFromSet:[NSCharacterSet whitespaceAndNewlineCharacterSet] intoString:NULL];
[scanner scanUpToCharactersFromSet:[NSCharacterSet newlineCharacterSet] intoString:&output];
output = [output stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]];
Note that it's much faster if you can scan up to a particular character rather than a character set:
[scanner scanUpToString:#"\n" intoString:&output];

Check if NSString instance is contained in an NSArray

I have an array with a bunch of strings and I want to check if a certain string is contained in the array. If I use the containsObject: message on the array, I'm getting correct results. Do all NSString objects with the same string point to the same object? Or why is the containsObject: working?
NSArray *stringArray = [NSArray arrayWithObjects:#"1",#"2",#"3",anotherStringValue, nil];
if([stringArray containsObject:#"2"]){
//DO SOMETHING
}
Yes, hard-coded NSStrings (string literals) (that is any #"..." in your source code) are turned into strings that exist indefinitely while your process is running.
However NSArray's containsObject: methods calls isEqual: on its objects, hence even a dynamically created string such as [NSString stringWithFormat:#"%d", 2] would return YES in your sample snippet.
This is because NSString's isEqual: (or more precisely its isEqualToString:) method is implemented to be content aware (vs. comparing pointer identities) and thus returns YES for any pair of strings containing the very same sequence of characters (at time of comparison), no matter how and when they were created.
To check for equal (pointer-)identity you'd have to enumerate your array and compare via
NSString *yourString = #"foo";
BOOL identicalStringFound = NO;
for (NSString *someString in stringArray) {
if (someString == yourString) {
identicalStringFound = YES;
break;
}
}
(which you most likely wouldn't want, though).
Or in a more convenient fashion:
BOOL identicalStringFound = [stringArray indexOfObjectIdenticalTo:someString] != NSNotFound;
(you most likely wouldn't want this one either).
Summing up:
So the reason you're getting a positive reply from containsObject: is NOT because literal strings share the same constant instance, BUT because containsObject: by convention calls isEqual:, which is content aware.
You might want to read the (short) documentation for isEqual: from the NSObject protocol.
containsObject: performs a value check, not a pointer check. It uses the isEqual: method defined by NSObject and overridden by other objects for testing. Therefore, if two strings contain the same sequence of characters, they will be considered the same.
The distinction between pointer testing and value testing is very important in some cases. Constant strings defined in source code are combined by the compiler so that they are the same object. However, strings created dynamically are not the same object. Here is an example program which will demonstrate this:
int main(int argc, char **argv) {
NSAutoreleasePool *p = [NSAutoreleasePool new];
NSString *constantString = #"1";
NSString *constantString2 = #"1";
NSString *dynamicString = [NSString stringWithFormat:#"%i",1];
NSArray *theArray = [NSArray arrayWithObject:constantString];
if(constantString == constantString2) NSLog(#"constantString == constantString2");
else NSLog(#"constantString != constantString2");
if(constantString == dynamicString) NSLog(#"constantString == dynamicString");
else NSLog(#"constantString != dynamicString");
if([constantString isEqual:dynamicString]) NSLog(#"[constantString isEqual:dynamicString] == YES");
else NSLog(#"[constantString isEqual:dynamicString] == NO");
NSLog(#"theArray contains:\n\tconstantString: %i\n\tconstantString2: %i\n\tdynamicString: %i",
[theArray containsObject:constantString],
[theArray containsObject:constantString2],
[theArray containsObject:dynamicString]);
}
The output of this program is:
2011-04-27 17:10:54.686 a.out[41699:903] constantString == constantString2
2011-04-27 17:10:54.705 a.out[41699:903] constantString != dynamicString
2011-04-27 17:10:54.706 a.out[41699:903] [constantString isEqual:dynamicString] == YES
2011-04-27 17:10:54.706 a.out[41699:903] theArray contains:
constantString: 1
constantString2: 1
dynamicString: 1
You can use containsObject to findout if certain string is exist,
NSArray *stringArray = [NSArray arrayWithObjects:#"1",#"2",#"3",anotherStringValue, nil];
if ( [stringArray containsObject: stringToFind] ) {
// if found
} else {
// if not found
}

NSTask Output Formatting

I'm using an NSTask to grab the output from /usr/bin/man. I'm getting the output but without formatting (bold, underline). Something that should appear like this:
Bold text with underline
(note the italic text is actually underlined, there's just no formatting for it here)
Instead gets returned like this:
BBoolldd text with _u_n_d_e_r_l_i_n_e
I have a minimal test project at http://cl.ly/052u2z2i2R280T3r1K3c that you can download and run; note the window does nothing; the output gets logged to the Console.
I presume I need to somehow interpret the NSData object manually but I have no idea where to start on that. I'd ideally like to translate it to an NSAttributedString but the first order of business is actually eliminating the duplicates and underscores. Any thoughts?
What is your actual purpose? If you want to show a man page, one option is to convert it to HTML and render it with a Web view.
Parsing man’s output can be tricky because it is processed by groff using a terminal processor by default. This means that the output is tailored to be shown on terminal devices.
One alternative solution is to determine the actual location of the man page source file, e.g.
$ man -w bash
/usr/share/man/man1/bash.1.gz
and manually invoke groff on it with -a (ASCII approximation) and -c (disable colour output), e.g.
$ gunzip -c /usr/share/man/man1/bash.1.gz | groff -c -a -Tascii -man
This will result in an ASCII file without most of the formatting. To generate HTML output,
$ gunzip -c /usr/share/man/man1/bash.1.gz | groff -Thtml -man
You can also specify these options in a custom configuration file for man, e.g. parseman.conf, and tell man to use that configuration file with the -C option instead of invoking man -w, gunzip, and groff. The default configuration file is /private/etc/man.conf.
Also, you can probably tailor the output of the terminal device processor by passing appropriate options to grotty.
Okay, here's the start of my solution, though I would be interested in any additional (easier?) ways to do this.
The output returned from the Terminal is UTF-8 encoding, but the NSUTF8StringEncoding doesn't interpret the string properly. The reason is the way NSTask output is formatted.
The letter N is 0x4e in UTF-8. But the NSData corresponding to that is 0x4e 0x08 0x4e. 0x08 corresponds to a Backspace. So for a bold letter, Terminal prints letter-backspace-letter.
For an italic c, it's 0x63 in UTF-8. The NSData contains 0x5f 0x08 0x63, with 0x5f corresponding to an underscore. So for italics, Terminal prints underscore-backspace-letter.
I really don't see any way around this at this point besides just scanning the raw NSData for these sequences. I'll probably post the source to my parser here once I finish it, unless anybody has any existing code. As the common programming phrase goes, never write yourself what you can copy. :)
Follow-Up:
I've got a good, fast parser together for taking man output and replacing the bold/underlined output with bold/underlined formatting in an NSMutableAttributedString. Here's the code if anybody else needs to solve the same problem:
NSMutableIndexSet *boldChars = [[NSMutableIndexSet alloc] init];
NSMutableIndexSet *underlineChars = [[NSMutableIndexSet alloc] init];
char* bBytes = malloc(1);
bBytes[0] = (char)0x08;
NSData *bData = [NSData dataWithBytes:bBytes length:1];
free(bBytes); bBytes = nil;
NSRange testRange = NSMakeRange(1, [inputData length] - 1);
NSRange bRange = NSMakeRange(0, 0);
do {
bRange = [inputData rangeOfData:bData options:(NSDataSearchOptions)NULL range:testRange];
if (bRange.location == NSNotFound || bRange.location > [inputData length] - 2) break;
const char * buff = [inputData bytes];
if (buff[bRange.location - 1] == 0x5f) {
// it's an underline
//NSLog(#"Undr %c\n", buff[bRange.location + 1]);
[inputData replaceBytesInRange:NSMakeRange(bRange.location - 1, 2) withBytes:NULL length:0];
[underlineChars addIndex:bRange.location - 1];
testRange = NSMakeRange(bRange.location, [inputData length] - (bRange.location));
} else if (buff[bRange.location - 1] == buff[bRange.location + 1]) {
// It's a bold
//NSLog(#"Bold %c\n", buff[bRange.location + 1]);
[inputData replaceBytesInRange:NSMakeRange(bRange.location - 1, 2) withBytes:NULL length:0];
[boldChars addIndex:bRange.location - 1];
testRange = NSMakeRange(bRange.location, [inputData length] - (bRange.location));
} else {
testRange.location = bRange.location + 1;
testRange.length = [inputData length] - testRange.location;
}
} while (testRange.location <= [inputData length] - 3);
NSMutableAttributedString *str = [[NSMutableAttributedString alloc] initWithString:[[NSString alloc] initWithData:inputData encoding:NSUTF8StringEncoding]];
NSFont *font = [NSFont fontWithDescriptor:[NSFontDescriptor fontDescriptorWithName:#"Menlo" size:12] size:12];
NSFont *boldFont = [[NSFontManager sharedFontManager] convertFont:font toHaveTrait:NSBoldFontMask];
[str addAttribute:NSFontAttributeName value:font range:NSMakeRange(0, [str length])];
__block NSUInteger begin = [underlineChars firstIndex];
__block NSUInteger end = begin;
[underlineChars enumerateIndexesUsingBlock:^(NSUInteger idx, BOOL *stop) {
if (idx - end < 2) {
// it's the next item to the previous one
end = idx;
} else {
// it's a split, so drop in the accumulated range and reset
[str addAttribute:NSUnderlineStyleAttributeName value:[NSNumber numberWithInt:NSSingleUnderlineStyle] range:NSMakeRange(begin, (end-begin)+1)];
begin = idx;
end = begin;
}
if (idx == [underlineChars lastIndex]) {
[str addAttribute:NSUnderlineStyleAttributeName value:[NSNumber numberWithInt:NSSingleUnderlineStyle] range:NSMakeRange(begin, (end-begin)+1)];
}
}];
begin = [boldChars firstIndex];
end = begin;
[boldChars enumerateIndexesUsingBlock:^(NSUInteger idx, BOOL *stop) {
if (idx - end < 2) {
// it's the next item to the previous one
end = idx;
} else {
// it's a split, so drop in the accumulated range and reset
[str addAttribute:NSFontAttributeName value:boldFont range:NSMakeRange(begin, (end-begin)+1)];
begin = idx;
end = begin;
}
if (idx == [underlineChars lastIndex]) {
[str addAttribute:NSFontAttributeName value:boldFont range:NSMakeRange(begin, (end-begin)+1)];
}
}];
Another method would be to convert the man page to PostScript source code, run that through the PostScript-to-PDF converter, and put that into a PDFView.
The implementation would be similar to Bavarious's answer, just with different arguments to groff (-Tps instead of -Thtml).
This would be the slowest solution, but also probably the best for printing.

How to parse numeric value in string representation in Cocoa?

As the title.
I tested NSScanner, but it passed some strange strings. (ex :123aaa).
Is there any way to convert string<->number strictly?
You can easily roll your own. Test whether the entire string was scanned, or whether there are additional characters.
NSScanner *scanner = [NSScanner localizedScannerWithString:str];
int i;
if (![scanner scanInt:&i] || [scanner scanLocation] < [str length]) {
// str contains additional characters
...
} else {
// str contains only an int
...
}
NSScanner isn't that high-level. You'll have to validate the string yourself.
One way would be to scan characters up to the set of digits, assert that that failed, then scan the digits, then scan to the end and assert that that failed.

Resources