NSFileManager contentsOfDirectoryAtPath encoding problem with samba path - cocoa

i mount a SMB path using this code
urlStringOfVolumeToMount = [urlStringOfVolumeToMount stringByAddingPercentEscapesUsingEncoding:NSMacOSRomanStringEncoding];
NSURL *urlOfVolumeToMount = [NSURL URLWithString:urlStringOfVolumeToMount];
FSVolumeRefNum returnRefNum;
FSMountServerVolumeSync( (CFURLRef)urlOfVolumeToMount, NULL, NULL, NULL, &returnRefNum, 0L);
Then, i get the content of some paths :
NSMutableArray *content = (NSMutableArray *)[[NSFileManager defaultManager] contentsOfDirectoryAtPath:path error:&error];
My problem is every path in "content" array containing special chars (ü for example) give me 2 chars encoded : ü becomes u¨
when i log bytes using :
[contentItem dataUsingEncoding:NSUTF8StringEncoding];
it gives me : 75cc88 which is u (75) and ¨(cc88)
What i expected is the ü char encoded in utf-8. In bytes, it should be c3bc
I've tried to convert my path using ISOLatin1 encoding, MacOSRoman... but as long as the content path already have 2 separate chars instead of one for ü, any conversion give me 2 chars encoded...
If someone can help, thanks
My configuration : localized in french and using snow leopard.

urlStringOfVolumeToMount = [urlStringOfVolumeToMount stringByAddingPercentEscapesUsingEncoding:NSMacOSRomanStringEncoding];
Unless you specifically need MacRoman for some reason, you should probably be using UTF-8 here.
NSMutableArray *content = (NSMutableArray *)[[NSFileManager defaultManager] contentsOfDirectoryAtPath:path error:&error];
My problem is every path in "content" array containing special chars (ü for example) give me 2 chars encoded : ü becomes u¨
You're expecting composed characters and getting decomposed sequences.
Since you're getting the pathnames from the file-system, this is not a problem: The pathnames are correct as you're receiving them, and as long as you pass them to something that does Unicode right, they will display correctly as well.

Well, four years later I'm struggling with the same thing but for åäö in my case.
Took a lot of time to find the simple solution.
NSString has the necessary comparator built in.
Comparing aString with anotherString where one comes from the array returned by NSFileManagers contentsOfDirectoryAtPath: is as simple as:
if( [aString compare:anotherString] == NSOrderedSame )
The compare method takes care of making both the strings into a comparable canonical format. In effect making them "if they look the same, they are the same"

Related

How to convert utf-8 encoded string to Turkish characters in Xcode?

I have a webservis in php and I encoded the string in utf-8 like this :
$str_output = mb_convert_encoding("MATEMATİK", "UTF-8");
$data_array = array('name' => $str_output);
echo json_encode($data_array);
I get this string from webservis in xcode : MATEMAT\u00ddK
I couldn't convert this string to Turkish string.
My json_dictionary is like this
2014-01-08 16:17:22.274 test_app[6432:70b] {
name = "MATEMAT\U00ddK";
}
I tried this encoding method, but it didn't work for me
NSString * name = [json_dictionary objectForKey:#"name"];
NSString * correctString = [NSString stringWithCString:[baslik cStringUsingEncoding:NSUTF8StringEncoding] encoding:NSWindowsCP1254StringEncoding];
I got null
If I use NSUTF8StringEncoding
MATEMATÝK
Also I tried NSISOLatin1StringEncoding, NSISOLatin2StringEncoding ...
Thanks...
iOS is correctly decoding the \u00dd when you use NSUTF8StringEncoding (which is what you should be using). That's LATIN CAPITAL LETTER Y WITH ACUTE. The letter you want is LATIN CAPITAL LETTER I WITH DOT ABOVE, which is \u0130.
That suggests the problem is on your php side. If I had to guess, I'd suspect that the İ in your source file is not itself in the encoding that php expects. You may need to pass to "from" encoding to mb_convert_encoding depending on what encoding your editor is using.
I would strongly recommend that you stay in UTF-8 entirely if possible, and avoid creating a CP1254 (Turkish) string at all. UTF-8 is capable of encoding all the characters you need. In that case, you may be able to avoid the mb_convert_encoding entirely.

Cocoa: Extracting "A" from "Æ"

I have a bunch of NSStrings from which I would like to grab the first character of and match them up in the range A-Z and # as a catch all for things that don't apply.
Different graphemes (I believe that's the correct word after some wiki'ing) have been giving me trouble. For example, I would like to extract A from "Æ".
I have taken a look at CFStringTransform, normalize and fold but none of had the desired effect.
Is there a reliable way of doing this? All the strings I'm working with are UTF8 if that makes a difference.
Æ cannot be broken down into components. It is not a compound glyph of A+E, but is a separate glyph. Compound glyphs are things like a+`
The thing about "Æ" is that it is an ascii character in itself. Not a combination of two different characters so you can't extract the A from it because it is only 1 Character.
Edit:
Although you could perform a check to see if the String equals "Æ" and if it does tell it to switch it with "A" or convert it to its dec, form and subtract 81 which would give you an "A".
Did you want to get rid of all æ?
This should work if you do.
NSString *string = #"Æaæbcdef";
string = [string stringByReplacingOccurrencesOfString:#"æ" withString:#"a"];
string = [string stringByReplacingOccurrencesOfString:#"Æ" withString:#"A"];
Edit
Rereading, you only seem to want the first character:
NSString *string = #"Æaæbcdef";
NSString *firstChar = [string substringToIndex:1];
firstChar = [firstChar stringByReplacingOccurrencesOfString:#"æ" withString:#"a"];
firstChar = [firstChar stringByReplacingOccurrencesOfString:#"Æ" withString:#"A"];
NSString *finalString = [NSString stringWithFormat:#"%#%#", firstChar, [string substringFromIndex:1]];

Unicode with format

I want to add a bunch of Emoji icons to an array. From my earlier question I found out how to write the Emoji icons in an NSString.
Now I want to make a loop and add these icons to an array. This should be fairly easy as the unicodes are in certain ranges so something like the following should do it:
for (int i = 0; i < 10; i++)
[someArray addObject:[NSString stringWithFormat:#"\U0001F43%i", i]];
Problem is, when doing so I get an error saying:
Incomplete universal character name.
Does anyone know of a way to do this?
That's because the escape sequence \Uxxxxxxxx is evaluated by the compiler which replaces it with the corresponding Unicode code point. Then when the method stringWithFormat: will replace the format specifier %i with the decimal representation of i. The final string is the concatenation of the characters corresponding to \Uxxxxxxxx and the characters representing i. stringWithFormat: replaces characters with other characters ; it doesn't alter existing characters.
But the problem is, here the compiler sees an incomplete escape sequence as you only wrote 7 hexadecimal digits. So it's not able to generate the string and raises an error.
The solution is to generate the character (a simple integer value) at runtime and create a string with it using +[NSString stringWithCharacters:length].
But if you look in the headers, you'll see that NSString stores its characters as unichar which is defined as an unsigned short, i.e a 16 bits-long value, whereas the Unicode code point U+1F430 (🐰) requires at least 17 bits.
So you cannot use a single unichar character to represent that code point. But don't worry: you can use two characters to represent it.
You're lost? Here the explanation! Unicode doesn't define characters, it defines code points which are arbitrary integers values in the range U+0000 – U+10FFFF. Then, the implementation decides how to represent those code point using characters. The implementation may use any data type it wants as characters as long as it manages to represent all valid code points. The simplest solution would be to use 32 bits-long integers but that would require too much memory as most of the code point you use are in the first Unicode plan (U+0000 – U+FFFF). So NSString stores the code points with the UTF-16 encoding which uses 16 bits-long characters.
In UTF-16, every code point beyond U+FFFF is stored using a pair of characters (known as a surrogate pair) in the range 0xD800 – 0xDFFF (the corresponding code points are explicitly reserved in the Unicode standard).
In conclusion, any valid Unicode code point may be represented using one or two unichar characters. The method to do so is described there. And here is a simple implementation:
static NSString *stringWithCodePoint(uint32_t codePoint)
{
// NOTE: As I edited the answer, you'll find a simpler implementation of
// this function below
unichar characters[2];
NSUInteger length;
if ( codePoint <= 0xD7FF || (codePoint >= 0xE000 && codePoint <= 0xFFFF) ) {
characters[0] = codePoint;
length = 1;
}
if ( codePoint >= 0x10000 && codePoint <= 0x10ffff ) {
codePoint -= 0x10000;
characters[0] = 0xD800 + (codePoint >> 10);
characters[1] = 0xDC00 + (codePoint & 0x3ff);
length = 2;
}
else {
length = 0; // invalid code point
}
return [NSString stringWithCharacters:characters length:length];
}
Now that we can generate a string from any valid code point, we just need to update the code to use the function we wrote before:
for (int i = 0; i < 10; i++)
[someArray addObject:stringWithCodePoint(0x0001F430 + i)];
EDIT: I just figured out a simpler method to get a NSString from a code point. It works by using -[NSString initWithBytes:length:encoding:] and the NSUTF32StringEncoding encoding:
static NSString *stringWithCodePoint(uint32_t codePoint)
{
NSString *string = [[NSString alloc] initWithBytes:&codePoint length:4 encoding:NSUTF32StringEncoding];
// You may remove the next 3 lines if you use ARC
#if ! __has_feature(objc_arc)
[string autorelease];
#endif
return string;
}
Note this similar question. As one of its answers explains, backslash escapes in a string literal are evaluated at compile time. If you want to make a Unicode character using a \Uxxxx escape, the xxxx all need to be numbers in the string literal.
What you can do instead, as per another answer is use the format specifier %C -- not together with the \Uxxxx escape, but on its own -- and pass in the full character code as an integer. (Actually, a wchar_t, which is a 32-bit integer on Mac OS X now, which you'll need since the character code you're looking for is more than 16 bits long.) To put this together with a base, you can just add the integers:
wchar_t base = 0x0001F430; // unfamiliar? we start with 0x for hexadecimal integers
for (int i = 0; i < 10; i++)
[someArray addObject:[NSString stringWithFormat:#"%C", base + i]];
There's also stringWithCharacters: but that explicitly takes a (16-bit) unichar, so you'd need to use a character sequence to encode your emoji in UTF-16.
Use %C instead of %i
so:
[someArray addObject:[NSString stringWithFormat:#"\U0001F43%C", i]];

Proper validation with [NSScanner: scanInteger] in Cocoa

I am converting a string to signed integer via NSScanner:scanInteger, but it seems to be accepting values such as '123abc' as '123', instead of throwing a error on invalid input.
I can do my own custom validation, but would prefer to find an API which will do the conversion and fail on '123abc'.
By the way, 'abc123' does fail with scanInteger, which is good.
I don't think that using a scanner is the way to do this -- you could, but there are easier ways. I would use the NSString method rangeOfCharactersFromSet: to check for non-digit characters.
NSCharacterSet *notDigits = [[NSCharacterSet decimalDigitCharacterSet] invertedSet];
NSUInteger nonDigits = [enteredString rangeOfCharacterFromSet:notDigits].length;
If nonDigits is not zero, then the user has entered something other than a digit. If you want to allow decimal points then you would have to create your own set that contains everything other than digits and the decimal point.

CHCSVWriter Unicode Problem

I having problem using CHCSVwriter to export my arrays to CSV or excel file.
I have several Arrays that are in Persian language ( it's localized and this must be UTF-8, at least in windows ).
With using CHCSVWriter ( thanks, Dave ) I am able to do export my arrays into CSV file BUT not with default settings.
Because of my array encodings ( UTF8 Did not work , I don't know why !! ) but changing CHCSVwriter.m I am able to write the files with my localized language.
I have a strange Problem :
1- If I use NSUTF8StringEncoding then I have one standard Comma separated CSV file that is able to be opened with Excel very well with correct columns separation BUT table cells are in unknown encoding ( I am using persian language)
2- If I use NSUTF16StringEncoding then I have a CSV file that whole columns of each row writes into one column ! but the language and encoding is right ! the strange thing is that commas are NOT detectable for excel and it's open a table with just One column that each row contains whole columns that I designed to be separated with commas that are existing there !!!
Also, I have another problem that I don't find a way to set encoding for CHCSVWriter and I have to change it manually from CHCSVWriter.m file !
part of CHCSVWriter.m:
- (void)_writeString:(NSString *)string {
// if (encoding == 0) {
// encoding = NSUTF8StringEncoding;
encoding = NSUTF16StringEncoding;
//}
And part of my code :
NSString * tempFileName = [NSString stringWithFormat:#"sellExport.csv"];
NSString * tempFile =[NSHomeDirectory() stringByAppendingPathComponent:tempFileName];
NSLog(#"Writing to file: %#", tempFile);
error = nil;
CHCSVWriter *sellExporting = [[CHCSVWriter alloc] initWithCSVFile:tempFile atomic:NO];
[sellExporting writeLine];
for (int i = 0; i<=[purchaseCodes count] ; i++) {
[sellExporting writeLineOfFields:[purchaseCodes objectAtIndex:i],[purchaseDates objectAtIndex:i],[purchaseCarBrands objectAtIndex:i],[purchaseCarSystems objectAtIndex:i],[purchaseCarModels objectAtIndex:i],[purchaseCarColors objectAtIndex:i],[purchaseCarChassis objectAtIndex:i],[purchaseCustomerNames objectAtIndex:i],[purchaseSharedNames objectAtIndex:i],[purchaseTotals objectAtIndex:i],[sellCodes objectAtIndex:i],[sellCustomerNames objectAtIndex:i],[sellDates objectAtIndex:i],[sellTotals objectAtIndex:i],[sellProfits objectAtIndex:i],[sellShareProfits objectAtIndex:i],nil];
}
[sellExporting closeFile];
[sellExporting release];
As nobody could resolve my problem,
I found that Unicode encoding is not very well supported with Microsoft Excell ( mac ).
In other hand, because of this fact that I'm using Perisan language for my data entries so I have to use NSUTF16StringEncoding ( why UTF8StringEncoding didn't work ? I don't know !! ) and when I open the .csv file I have all my data in just one column ( I have 16 column ) and Microsoft Excell can not detect comma (,) as separation delimiter !!!
Anyway, now, I am able to open the UTF16 Encoded CSV file using NeoOffice or OpenOffice that well supported Unicode .csv files without any problem !
So, this all about Microsft not Dave deLong's CHCSVWriter. ( thank you Dave, Again )
this my 5th question that I solved by myself !!!
Thank you all anyway.

Resources