Encoded CSV with sqlToCsv package not working with non-ASCII characters - go

I am using package "github.com/jeffyi/sqltocsv" to export MSSql rows to CSV files.
My problem is that special characters end up wrong way:
ü as Ć¼
ä as Ƥ
etc..
I have read the sqltocsv package multiple times and I just don't get that when and where is it going wrong.
i have logged output to console, before exporting data comes out from DB as UTF-8 but on adding to CSV it gets messed up.
I have tryed to use package "encoding/csv" to convert my data to csv file.
(without any success)
Here's how I use sqlToCsv package:
rows, _ := db.Query(sqlQuery)
csvConverter := sqltocsv.New(rows)
csvConverter.Delimiter = ';'
csvConverter.TimeFormat = time.RFC822
csvConverter.WriteFile(directory + "/" + fileName)
so in the end result should all characters be as they are:
ü as ü (not Ć¼ )
ä as ä (not Ƥ )

Related

Oracle - Utf_file.get_line not reading utf8 csv lines properly

I have an utf8 csv file from which I want to read from.
It contains characters like ç, á, à ã...
This is what I'm doing:
CSVFile := utl_file.fopen (UPPER('dummyDIR'), CSV_name, 'r', 32767);
utl_file.get_line(CSVFile,CSVLine);
DBMS_OUTPUT.PUT_LINE('line obtained with get_line: '||CSVLine);
And all the special characters get weird...
Any idea how I can get the correct charset read?
This is what I get in Linux to check the charset:
echo $NLS_LANG
PORTUGUESE_PORTUGAL.UTF8

Ruby - CSV works while SmarteCSV doesn't

I want to open a csv file using SmarterCSV.process
market_csv = SmarterCSV.process(market)
p "just read #{market_csv}"
The problem is that the data is not read and this prints:
[]
However, if I attempt the same thing with the default CSV library implementation the content of the file is read(the following print statement prints the file).
CSV.foreach(market) do |row|
p row
end
The content of the file I was reading is of the form:
Date,Close
03/06/15,0.1634
02/06/15,0.1637
01/06/15,0.1638
31/05/15,0.1638
The problem could come from the line separator, the file is not exactly the same if you're using windows or unix system ("\r\n" or "\r"). Try to identify and specify the character in the SmarterCSV.process like this:
market_csv = SmarterCSV.process(market, row_sep: "\r")
p "just read #{market_csv}"
or like this:
market_csv = SmarterCSV.process(market, row_sep: :auto)
p "just read #{market_csv}"

No such file or directory - ruby

I am trying to read the contents of the file from a local disk as follows :
content = File.read("C:\abc.rb","r")
when I execute the rb file I get an exception as Error: No such file or directory .What am I missing in this?
In a double quoted string, "\a" is a non-printable bel character. Similar to how "\n" is a newline. (I think these originate from C)
You don't have a file with name "C:<BEL>bc.rb" which is why you get the error.
To fix, use single quotes, where these interpolations don't happen:
content = File.read('C:\abc.rb')
content = File.read("C:\/abc.rb","r")
First of all:
Try using:
Dir.glob(".")
To see what's in the directory (and therefore what directory it's looking at).
open("C:/abc.rb", "rb") { |io| a = a + io.read }
EDIT: Unless you're concatenating files together, you could write it as:
data = File.open("C:/abc.rb", "rb") { |io| io.read }

Cannot read unicode .csv into R

I have a .csv file, which contains the following data:
"Ա","Բ"
1,10
2,20
I cannot read it into R so that the column names are displayed like they are in the file.
d <- read.csv("./Data/1.csv", fileEncoding="UTF-8")
head(d)
Produces the following:
> d <- read.csv("./Data/1.csv", fileEncoding="UTF-8")
Warning messages:
1: In read.table(file = file, header = header, sep = sep, quote = quote, :
invalid input found on input connection './Data/1.csv'
2: In read.table(file = file, header = header, sep = sep, quote = quote, :
incomplete final line found by readTableHeader on './Data/1.csv'
> head(d)
[1] X.
<0 rows> (or 0-length row.names)
Meanwhile, doing the same without specifying the fileEncoding produces this:
> d <- read.csv("./Data/1.csv")
> head(d)
Ô. Ô²
1 1 10
2 2 20
When I run the "file" utility to find out the encoding of the file, it says it is UTF-8:
Data\1.csv: UTF-8 Unicode text, with CRLF line terminators
I am using RStudio, Windows 7, R version 2.15.2, 32-bit.
Thanks in advance.
I wrote a longer answer on the same issue here: R on Windows: character encoding hell .
Quick answer, using the parameter encoding instead of fileEncoding should fix your first issue. You will not be able to read it possibly in either console or table view in RStudio, but you will be able to use it in formulaes.
d <- read.csv("./Data/1.csv", encoding="UTF-8")
head(d)
Having saved your table into a UTF-8 file:
> test2 <- read.csv("test2.csv", header = FALSE, sep = ",", quote = "\"", dec = ".", fill = TRUE, comment.char = "", encoding = "UTF-8")
Warning message:
In read.table(file = file, header = header, sep = sep, quote = quote, :
incomplete final line found by readTableHeader on 'test2.csv'
This gives you how it looks like in the console and RStudio view
> test2
V1 V2
1 <U+0531> <U+0532>
2 1 10
3 2 20
However importantly you are able to manipulate this within R. Thus in my case it is possible to see that the script window input Ա has UTF-8 encoding, and a grep correctly finds this encoding in your table.
> Encoding("Ա")
[1] "UTF-8"
> grep("Ա", as.character(test2[1,1]))
[1] 1
You may need to find suitable encoding variants that work on your settings, or possibly change them. Unfortunately I am not sure where it is done.
You might not be able to make it pretty in all stages, but it is definitely possible to get it to work also in Windows 7 environment.
I tried two ways to replicate your problem.
I copied the characters above into RStudio, saved it to a csv with this code:
write.csv(c("Ա","Բ",
1,10,
2,20), "test.csv")
df <- read.csv("test.csv")
This worked fine.
Then I thought, well maybe R is cheating when I save it to CSV with R? So I just pasted the characters to a text file and save it as a CSV. This approach doesn't have problems either.
Here's my session info:
sessionInfo()
R version 3.0.1 (2013-05-16)
Platform: x86_64-pc-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_CA.UTF-8 LC_NUMERIC=C LC_TIME=en_CA.UTF-8
[4] LC_COLLATE=en_CA.UTF-8 LC_MONETARY=en_CA.UTF-8 LC_MESSAGES=en_CA.UTF-8
[7] LC_PAPER=C LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats4 grid stats graphics grDevices utils datasets methods base
other attached packages:
[1] party_1.0-9 modeltools_0.2-21 strucchange_1.4-7 sandwich_2.2-10 zoo_1.7-10
[6] GGally_0.4.4 reshape_0.8.4 plyr_1.8 ggplot2_0.9.3.1
loaded via a namespace (and not attached):
[1] coin_1.0-23 colorspace_1.2-2 dichromat_2.0-0 digest_0.6.3
[5] gtable_0.1.2 labeling_0.2 lattice_0.20-23 MASS_7.3-29
[9] munsell_0.4.2 mvtnorm_0.9-9995 proto_0.3-10 RColorBrewer_1.0-5
[13] reshape2_1.2.2 scales_0.2.3 splines_3.0.1 stringr_0.6.2
I had the same problem and found out that the file was corrupted.
I opened the file with OpenOffice and saved it back using "UTF8" character set (you need to click the edit filter settings box) and then imported it with the read.csv()(no encoding or filencoding option) and it worked fine.

Encoding german characters

I need to import with load data some perl - generated files to oracle database.
Perl-script get a webpage and write csv file.
Here a simplified script:
use File::Slurp;
my $c= ( $user && $passwd )
? get("$protocol://$user:$passwd\#$url")
: get("$protocol://$url");
write_file("$VZ_GET/$FileTS.$typ.csv",$c);
Here a sample line from the webpage:
5052;97;Jan;Ihrfelt 5053;97;Jari;Honko 5121;97;Katja;Keitaanniemi 5302;97;Ola;Södermark 5421;97;Sven;Sköld 5609;97;Peter;Näslund
Content of the webpage is saved in var $c.
Here a sample line of csv file:
5053;97;Jari;Honko
Here a load command:
LOAD DATA
INTO TABLE LIQA
TRUNCATE
FIELDS TERMINATED BY ";"
(
LIQA_ANALYST_ID,
LIQA_FIRM_ID,
LIQA_ANALYST_FIRST_NAME,
LIQA_ANALYST_LAST_NAME,
LIQA_TS_INSERT DATE 'YYYYMMDDHH24MISS'
)
Command SELECT * FROM NLS_DATABASE_PARAMETERS WHERE PARAMETER = 'NLS_CHARACTERSET'; returns AL32UTF8.
The generated csv file is recognized as UTF-8 Unicode text.
Anyhow I cant import german characters. In the csv file they are still correct. But it is not the case in the database.
I have also tried to convert $c like this:
$c = encode("iso-8859-1", $c);
The generated csv file is stll recognized as UTF-8 Unicode text.
I have no clue how can I fix it.
I have solved it:
$c = decode( 'utf-8', $c );
$c = encode( 'iso-8859-1' , $c );

Resources