Read CSV file with all special characters using Ruby - ruby

I have a scenario to import the csv file then validate the content, I have a cell with following special characters
“ !#$%&’()*+,-./:;<=>?#[\]^_`{|}”~
When I read the CSV file with above characters in cell using CSV.read("csv_filepath"), I am getting following
“ !\#$%&’()*+,-./:;<=>?#[\\]^_`{|}”~
backslash(\) is added for # and \, how to read the exact content

Related

How do I save a multiline string to a YAML file?

I have several YAML files that store SQL scripts in them (as multiline strings). I have a Python script that takes all of these scripts and aggregates them into a single table.
Whenever I make an update to a YAML file, it converts the SQL text to a regular string (with \n's to indicate line breaks). Is there a way to preserve the multiline formatting when I make updates to the YAML file?
For multi-line scalars, you can use blocks. The pipe symbol character | to denote the start of a block.
For example:
Data: |
Some data, here and a special character like ':'
Another line of data on a separate line
Also you can check the YAML Multiline

remove/ replace unprintable characters from txt file using shell script

I am trying to remove a newline characters from with in quotes in file
I am able to achieve that using the below code
awk -F"\"" '!length($NF){print;next}{printf("%s ", $0)}' filename.txt>filenamenew.txt
Note I am creating a new file filenamenew.txt is this avoidable can i do the command in place the reason I ask is because files are huge.
my file is pipe delimited
sample input file
"id"|"name"
"1"|"john
doe"
"2"|"second
name
in the list"
using the above code I get the following output
"id"|"name"
"1"|"john doe"
"2"|"second name in the list"
but I have a huge files and i see in some of the lines have ^M character in between quotes example
second sample input file
"id"|"name"
"1"|"john
doe"
"^M2"|"second^M^M
name
in the list"
o/p using above code
"id"|"name"
"1"|"john doe"
name in the list"
so basically if there is a ^M in the line that string is not being printed but i read online ^M is equal to \r so i used
tr -d'\r'< filename.txt
I also tried
awk-F"|"{sub(/^M/,"")}1
but it did not remove those characters (^M)
A little background on why i am doing this
I am extracting data from a relational table and loading into flat file and checking if the counts between table and file matched but since there is \n in columns count(*) vs wc-l in file is not matching.
final resolution:
i don't want to delete these unprintable characters in the long run but want to replace it with some character or value(so that counts between table and file matches) and then when i am loading it back to a table i want to again replace the value that i have added effectively as a place holder with \n or ^M what was originally present so that there is no tampering of data from my side.
Any suggestions is appreciated.
thanks.

Loading a csv file which contains Latin characters with sql loader

I have csv file which contains Latin characters like this : Østfold.
What should be my ctl file for this?
This got resolved by using "CHARACTERSET WE8ISO8859P1" in CTL file.

Reading in output as one line, not each word

Basically what I am having trouble with is when I type: file *
I will get:
AdvDataStructures.text.ref: ASCII text
makefile: ASCII make commands text
makelib: ASCII English text
README.txt: ASCII Pascal program text
shell3_2016.sh: ASCII text
shell3_2016.sh~: ASCII text
smallTestDir: directory
smallTestDir.text.out: empty
smallTestDir.text.ref: ASCII text
testarg0.text.ref: ASCII text
testarg1.text.ref: ASCII text
testbaddir.text.ref: ASCII text
When I use
for i in `file *`
it reads in each word separated by space in for i. I need it to read in each line as: AdvDataStructures.text.ref: ASCII text ,so I can look through it for a pattern.
ALSO, I have no clue how to make it so when I read in the line, I somehow have to read in the amount of lines within the file that is called. Is there a way to like call the first word of the output so it knows to read in the file name?
Basically, an example of what I have to do is read in one line at a time (AdvDataStructures.text.ref: ASCII text), if a pattern finds a match in it (I know how to do this with egrep) it will the count the number of lines within the file(AdvDataStructures.text.ref)
The usual way is to use a while read loop:
file * | while read filename description; do
filename=${filename%:} # remove : after filename
...
done

Exporting delimited Text file to Excel file using Shell Script

my text file is delimited by pipeline '|'
I want to export this in to excel file (xls) using a script in Unix
can anyone please help
My suggestion would be,
Convert the delimiter | to ,
Save the file with csv extension
Open the file in excel.
Note: If you have , in the file contents other than token separator this idea will not work.
If you want to convert your file to .xls format then you will have to use apache POI library. It has perl support.
If you just want to open it in excel then you can directly use open with excel and set the seperator as |.
Or put all the words in " " and use , as the seperator. If it is within "" then comma within the text will not be an issue. But double quotes within the text will be a problem.
To avoid all these you can use some other ascii character as the seperator.

Resources