CHCSVWriter Unicode Problem - macos

I having problem using CHCSVwriter to export my arrays to CSV or excel file.
I have several Arrays that are in Persian language ( it's localized and this must be UTF-8, at least in windows ).
With using CHCSVWriter ( thanks, Dave ) I am able to do export my arrays into CSV file BUT not with default settings.
Because of my array encodings ( UTF8 Did not work , I don't know why !! ) but changing CHCSVwriter.m I am able to write the files with my localized language.
I have a strange Problem :
1- If I use NSUTF8StringEncoding then I have one standard Comma separated CSV file that is able to be opened with Excel very well with correct columns separation BUT table cells are in unknown encoding ( I am using persian language)
2- If I use NSUTF16StringEncoding then I have a CSV file that whole columns of each row writes into one column ! but the language and encoding is right ! the strange thing is that commas are NOT detectable for excel and it's open a table with just One column that each row contains whole columns that I designed to be separated with commas that are existing there !!!
Also, I have another problem that I don't find a way to set encoding for CHCSVWriter and I have to change it manually from CHCSVWriter.m file !
part of CHCSVWriter.m:
- (void)_writeString:(NSString *)string {
// if (encoding == 0) {
// encoding = NSUTF8StringEncoding;
encoding = NSUTF16StringEncoding;
//}
And part of my code :
NSString * tempFileName = [NSString stringWithFormat:#"sellExport.csv"];
NSString * tempFile =[NSHomeDirectory() stringByAppendingPathComponent:tempFileName];
NSLog(#"Writing to file: %#", tempFile);
error = nil;
CHCSVWriter *sellExporting = [[CHCSVWriter alloc] initWithCSVFile:tempFile atomic:NO];
[sellExporting writeLine];
for (int i = 0; i<=[purchaseCodes count] ; i++) {
[sellExporting writeLineOfFields:[purchaseCodes objectAtIndex:i],[purchaseDates objectAtIndex:i],[purchaseCarBrands objectAtIndex:i],[purchaseCarSystems objectAtIndex:i],[purchaseCarModels objectAtIndex:i],[purchaseCarColors objectAtIndex:i],[purchaseCarChassis objectAtIndex:i],[purchaseCustomerNames objectAtIndex:i],[purchaseSharedNames objectAtIndex:i],[purchaseTotals objectAtIndex:i],[sellCodes objectAtIndex:i],[sellCustomerNames objectAtIndex:i],[sellDates objectAtIndex:i],[sellTotals objectAtIndex:i],[sellProfits objectAtIndex:i],[sellShareProfits objectAtIndex:i],nil];
}
[sellExporting closeFile];
[sellExporting release];

As nobody could resolve my problem,
I found that Unicode encoding is not very well supported with Microsoft Excell ( mac ).
In other hand, because of this fact that I'm using Perisan language for my data entries so I have to use NSUTF16StringEncoding ( why UTF8StringEncoding didn't work ? I don't know !! ) and when I open the .csv file I have all my data in just one column ( I have 16 column ) and Microsoft Excell can not detect comma (,) as separation delimiter !!!
Anyway, now, I am able to open the UTF16 Encoded CSV file using NeoOffice or OpenOffice that well supported Unicode .csv files without any problem !
So, this all about Microsft not Dave deLong's CHCSVWriter. ( thank you Dave, Again )
this my 5th question that I solved by myself !!!
Thank you all anyway.

Related

Parse a text file using batch script and remove the first 2 characters from each line

I have been trying to parse the data from a text file that is generated by Teradata fast export utility.
The data looks like this:
Type2LRF|84|249
Job3|86|327
StageTOStageBackUp|85|327
When I have checked the character count of the garbage characters that is there is initially, it is 2.
I have been trying to parse the text file to remove the first 2 characters and generate a new text file out of it.
The new file should look like this:
Type2LRF|84|249
Job3|86|327
StageTOStageBackUp|85|327
I am trying to add the first 2 characters but they are not appearing correctly in the above block.
The Teradata fast export code that I am using is:
.LOGTABLE Informatica_Test.JobControlExport_log;
.LOGON server_name/dbc,dbc;
DATABASE Informatica_Test;
.BEGIN EXPORT SESSIONS 2;
.EXPORT OUTFILE "data.txt"
MODE RECORD FORMAT TEXT;
SELECT ((TRIM((COALESCE(J.JobName,''))))
||'|'||
(TRIM((COALESCE(JC.JobControlID,''))))
||'|'||
(TRIM((COALESCE(JC.Success_Source_Rows,''))))
)(TITLE '') from
Informatica_Test.JobControl JC
JOIN Informatica_Test.Job J
ON J.JobID = JC.JobID
JOIN Informatica_Test.BatchControl BC
ON BC.BatchControlID = JC.BatchCtrlID
where BC.BatchID = 1 -- This will be a parameter
and BC.EndDatetime = (select max(EndDatetime) from Informatica_Test.BatchControl);
.END EXPORT;
.LOGOFF;
#echo off
setlocal enabledelayedexpansion
break>test.txt
for /F "tokens=*" %%A in (data.txt) do (
set line=%%A
echo !line:~2! >>test.txt
)
I have tried the above code for removing the 2 characters.
Your exported data is VARCHAR so the first two bytes are the binary length of the string.
Instead of parsing/fixing the FastExport output file, use a different tool to export the data.
For larger numbers of rows, use Teradata Parallel Transporter (TPT) to export as delimited text (without the need for explicit concatenation or changing the file afterwards.
For small numbers of rows, use BTEQ EXPORT with REPORT format.

JDBC Import into sheets, how to keep leading zeros on fields?

I have so much trouble with leading zeros in general. Importing into Sheets using JDBC connection, I haven't figured out a way to keep the zeros. The column types are varchar() for values of varied length, and char() for static length.
In the past with other data I have added a leading ' to values, or chosen to getDisplayValue() to keep them. What would work here?
while (results.next()){
var tmpArr = [];
var rowString = '';
for (var col = 0; col < numCols; col++) {
rowString += results.getString(col + 1) + '\t';
tmpArr.push(results.getString(col + 1));
}
valArr.push(tmpArr);
}
sheet.getRange(3, 1 , valArr.length, numCols).setValues(valArr);
Data Exmaple varchar column:
0110205361
0201206352
140875852
LFCP01367
LGLM00017
You are retrieving data into a Google Sheet from a MySQL database table using Jdbc. One of the database columns is formatted as "varchar" and includes some all-numeric values that have one or more leading zeros. When you update the database values to your Google Sheet, the leading zeros are not displayed.
Why
The reason for this is that the all-numeric values are displayed without the leading zeros is that the cells are formatted as Number, Automatic (or otherwise as a number). This means that they are 'interpreted' by Google Sheets as a number and, by default, all leading zeros are dropped.
On the other hand, if the cells are formatted as Number, Plain Text, then the all-numeric values are 'interpreted' as strings, and any leading zeros are retained.
The effect of formatting can be clearly seen in the following images, which also include istext and isnumber formula to confirm how they are interpreted under each format type.
Formatted as Number - Plain Text - treated as strings
Formatted as Number - Automatic - treated as numbers
Formatting on the fly
An alternative to pre-formatting (which wasn't successful in the OP's case) is to set the format as a part of the setValues() method using setNumberFormat
For example:
sheet.getRange(3, 1 , valArr.length, numCols).setNumberFormat('#STRING#').setValues(valArr);
There is a useful discussion of this methid in Format a Google Sheets cell in plaintext via Apps Script

How to read csv using LINQ ,some columns contain ,

i have a CSV in the below way. "India,Inc" is a company name which is single value which contains , in it
How to Get the Values in LINQ
12321,32432,423423,Kevin O'Brien,"India,Inc",234235,23523452,235235
Assuming that you will always have the columns that you specify and that the only variable is that company name can have commas inside, this UGLY code can help you achieve your goal.
var file = File.ReadLines("test.csv");
var value = from p in file
select new string[]
{ p.Split(',')[0],
p.Split(',')[1],
p.Split(',')[2],
p.Split(',')[3],
p.Split(',').Count() == 7 ? p.Split(',')[4] :
(p.Split(',').Count() > 7 ? String.Join(",",p.Split(',').Skip(4).Take(p.Split(',').Count() - 7).ToArray() ) : ""),
p.Split(',')[p.Split(',').Count() - 3],
p.Split(',')[p.Split(',').Count() - 2],
p.Split(',')[p.Split(',').Count() - 1]
};
A regular expression would work, bit nasty due to the recursive nature but it does achieve your goal.
List<string> matches = new List<string>();
string subjectString = "12321,32432,423423,Kevin O'Brien,\"India,Inc\",234235,23523452,235235";
Regex regexObj = new Regex(#"(?<="")\b[123456789a-z,']+\b(?="")|[123456789a-z']+", RegexOptions.IgnoreCase);
Match matchResults = regexObj.Match(subjectString);
while (matchResults.Success)
{
matches.Add(matchResults.Value);
// matched text: matchResults.Value
// match start: matchResults.Index
// match length: matchResults.Length
matchResults = matchResults.NextMatch();
}
This should suffice in most cases. It handles quoted strings, strings with double quotes within them, and embedded commas.
var subjectString = "12321,32432,423423,Kevin O'Brien,\"India,Inc\",234235,\"Test End\"\"\",\"\"\"Test Start\",\"Test\"\"Middle\",23523452,235235";
var result=Regex.Split(subjectString,#",(?=(?:[^""]*""[^""]*"")*[^""]*$)")
.Select(x=>x.StartsWith("\"") && x.EndsWith("\"")?x.Substring(1,x.Length-2):x)
.Select(x=>x.Replace("\"\"","\""));
It does however break, if you have a field with a single double quote inside it, and the string itself is not enclosed in double quotes -- this is invalid in most definitions of a CSV file, where any field that contains CR, LF, Comma, or Double quote must be enclosed in double quotes.
You should be able to reuse the same Regex expression to break on lines as well for small CSV files. Larger ones you would want a better implementation. Replace the double quotes with LF, and remove the matching ones (unquoted LF's). Then use the regular expression again replacing the quotes with CR, and split on matching.
Another option is to use CSVHelper and not traying to reinvent the wheel
var csv = new CsvHelper.CsvReader(new StreamReader("test.csv"));
while (csv.Read())
{
Console.WriteLine(csv.GetField<int>(0));
Console.WriteLine(csv.GetField<string>(1));
Console.WriteLine(csv.GetField<string>(2));
Console.WriteLine(csv.GetField<string>(3));
Console.WriteLine(csv.GetField<string>(4));
}
Guide
I would recommend LINQ to CSV, because it is powerful enough to handle special characters including commas, quotes, and decimals. They have really worked a lot of these issues out for you.
It only takes a few minutes to set up and it is really worth the time because you won't run into these types of issues down the road like you would with custom code. Here are the basic steps, but definitely follow the instructions in the link above.
Install the Nuget package
Create a class to represent a line item (name the fields the way they're named in the csv)
Use CsvContext.Read() to read into an IEnumerable which you can easily manipulate with LINQ
Use CsvContext.Write() to write a List or IEnumerable to a CSV
This is very easy to setup, has very little code, and is much more scalable than doing it yourself.
becuase you're only reading values delminated bycommas, the spaces shouldn't cause an issue if you just treat them like any other character.
var values = File.ReadLines(path)
SelectMany(line => line.Split(','));

NSFileManager contentsOfDirectoryAtPath encoding problem with samba path

i mount a SMB path using this code
urlStringOfVolumeToMount = [urlStringOfVolumeToMount stringByAddingPercentEscapesUsingEncoding:NSMacOSRomanStringEncoding];
NSURL *urlOfVolumeToMount = [NSURL URLWithString:urlStringOfVolumeToMount];
FSVolumeRefNum returnRefNum;
FSMountServerVolumeSync( (CFURLRef)urlOfVolumeToMount, NULL, NULL, NULL, &returnRefNum, 0L);
Then, i get the content of some paths :
NSMutableArray *content = (NSMutableArray *)[[NSFileManager defaultManager] contentsOfDirectoryAtPath:path error:&error];
My problem is every path in "content" array containing special chars (ü for example) give me 2 chars encoded : ü becomes u¨
when i log bytes using :
[contentItem dataUsingEncoding:NSUTF8StringEncoding];
it gives me : 75cc88 which is u (75) and ¨(cc88)
What i expected is the ü char encoded in utf-8. In bytes, it should be c3bc
I've tried to convert my path using ISOLatin1 encoding, MacOSRoman... but as long as the content path already have 2 separate chars instead of one for ü, any conversion give me 2 chars encoded...
If someone can help, thanks
My configuration : localized in french and using snow leopard.
urlStringOfVolumeToMount = [urlStringOfVolumeToMount stringByAddingPercentEscapesUsingEncoding:NSMacOSRomanStringEncoding];
Unless you specifically need MacRoman for some reason, you should probably be using UTF-8 here.
NSMutableArray *content = (NSMutableArray *)[[NSFileManager defaultManager] contentsOfDirectoryAtPath:path error:&error];
My problem is every path in "content" array containing special chars (ü for example) give me 2 chars encoded : ü becomes u¨
You're expecting composed characters and getting decomposed sequences.
Since you're getting the pathnames from the file-system, this is not a problem: The pathnames are correct as you're receiving them, and as long as you pass them to something that does Unicode right, they will display correctly as well.
Well, four years later I'm struggling with the same thing but for åäö in my case.
Took a lot of time to find the simple solution.
NSString has the necessary comparator built in.
Comparing aString with anotherString where one comes from the array returned by NSFileManagers contentsOfDirectoryAtPath: is as simple as:
if( [aString compare:anotherString] == NSOrderedSame )
The compare method takes care of making both the strings into a comparable canonical format. In effect making them "if they look the same, they are the same"

Ruby string encoding problem

I've looked at the other ruby/encoding related posts but haven't been able to figure out why the following is not working. Likely just because I'm dense, but here's the situation.
Using Ruby 1.9 on windows. I have a set of CSV files that need some data appended to the end of each line. Whenever I run my script, the appended characters are gibberish. The input text appears to be IBM437 encoding, whereas my string I'm appending starts as US-ASCII. Nothing I've tried with respect to forcing encoding on the input strings or the append string seems to change the resultant output. I'm stumped. The current encoding version is simply the last that I tried.
def append_salesperson(txt, salesperson)
if txt.length > 2
return txt.chomp.force_encoding('US-ASCII') + %(, "", "", "#{salesperson}")
end
end
salespeople = Hash[
"fname", "Record Manager"]
outfile = File.open("ActData.csv", "w:US-ASCII")
salespeople.each do | filename, recordManager |
infile = File.open("#{filename}.txt")
infile.each do |line|
outfile.puts append_salesperson(line, recordManager)
end
infile.close
end
outfile.close
One small note that is related to your question is that you have your csv data as such %(, "", "", "#{salesperson}"). Here you have a space char before your double quotes. This can cause the #{salesperson} to be interpreted as multiple fields if there is a comma in this text. To fix this there can't be white space between the comma and the double quotes. Example: "this is a field","Last, First","and so on". This is one little gotcha that I ran into when creating reports meant to be viewed in Excel.
In Common Format and MIME Type for Comma-Separated Values (CSV) Files they describe the grammar of a csv file for reference.
maybe txt.chomp.force_encoding('US-ASCII') + %(, "", "", "#{salesperson.force_encoding('something')}")
?
It sounds like the CSV data is coming in as UTF-16... hence the puts shows as the printable character (the first byte) plus a space (the second byte).
Have you tried encoding your appended data with .force_encoding(Encoding::UTF-16LE) or .force_encoding(Encoding::UTF-16BE)?

Resources