Parsing and modifying csv with bash - bash

Have a csv file with tons of rows, small example:
id,location_id,name,title,email,directorate
1,1, Amy lee,Singer,,
2,2,brad Pitt,Actor,,Production
3,5,Steven Spielberg,Producer,spielberg#my.com,Production
Need to:
change first and last name to uppercase, example, Brad Pitt, Amy Lee.
create email with pattern first letter of first name + last name, all in lowercase with #google.com and value from location_id, example - ale1e#google.com, bpitt2#google.com
save it to new file.csv, with the same structure, example:
id,location_id,name,title,email,directorate
1,1, Amy Lee,Singer,alee1#google.com,
2,2,Brad Pitt,Actor,bpitt#google.com,Production
3,5,Steven Spielberg,Producer,sspielberg#google.com,Production
I started from create a array and iterate through it, with bunch of sed, awk, but it gives to me random results.
Please give me advice, how resolve this task.
while read -ra array; do
for i in ${array[#]};
do
awk -F ',' '{print tolower(substr($3,1,1))$2$3"#google.com"}'
done
for i in ${array[#]};
do
awk -F "\"*,\"*" '{print $3}' | sed -e "s/\b\(.\)/\u\1/g"
done
done < file.csv
awk -F ',' '{print tolower(substr($3,1,1))$2$3"#google.com"}' working not correct.

Using GNU sed
$ sed -E 's/([^,]*,([^,]*),) ?(([[:alpha:]])[^ ]* +)(([^,]*),[^,]*,)[^,]*/\1\u\3\u\5\L\4\6\2#google.com/' input_file
id,location_id,name,title,email,directorate
1,1,Amy Lee,Singer,alee1#google.com,
2,2,Brad Pitt,Actor,bpitt2#google.com,Production
3,5,Steven Spielberg,Producer,sspielberg5#google.com,Production

With your shown samples please try following awk.
awk '
BEGIN{ FS=OFS="," }
{
split($3,arr," ")
val=(substr($3,1,1) arr[2]"#google.com,")
$NF=tolower(val) $NF
val=""
}
1
' Input_file

Related

Bash replace in CSV multiple columns

I have the following CSV format:
data_disk01,"/opt=920MB;4512;4917;0;4855","/=4244MB;5723;6041;0;6359","/tmp=408MB;998;1053;0;1109","/var=789MB;1673;1766;0;1859","/boot=53MB;656;692;0;729"
I would like to take from each column, except the first one, the last value from the array, like this:
data_disk01,"/opt=4855","/=6359","/tmp=1109","/var=1859","/boot=729"
I have tried something like:
awk 'BEGIN {FS=OFS=","} {if(NF==!1);gsub(/\=.*/,",")} 1'
Just the string, I managed to do it with:
string="/opt=920MB;4512;4917;0;4855"
echo $string | awk '{split($0,a,";"); print a[1],a[5]}' | sed 's#=.* #=#'
/opt=4855
But could not make it work for the whole CSV.
Any hints are appreciated.
If your input never contains commas in the quoted fields, simple sed script should work:
sed 's/=[^"]*;/=/g' file.csv
Could you please try following awk and let me know if this helps you.
awk '{gsub(/=[^"]*;/,"=")} 1' Input_file
In case you want to save output into Input_file then append > temp_file && mv temp_file Input_file in above code too.

Shell script to match a string and print the next string on aix machine

I have a following line as input.
Parsing events:hostname='tom';Ipaddress='10.10.10.1';situation_name='sgd_abc_app_a';type='General';
Like this there are many fields in a line separated by a delimiter as semi-colon. (But starting with Parsing Events:)
I want to extract onlysgd_abc_app_a when it matches situation_name.
Thanks
Kulli
Try
sed -n 's/^.*situation_name=//p' input_file| awk -F "'" '{print $2}'
For your request, it would work no matter the position of situation_name
$ awk '/situation_name/{match($0,/situation_name=[^;]+/); print substr($0,RSTART+16,RLENGTH-17)}' file
sgd_abc_app_a
awk solution:
s="Parsing events: hostname='tom';Ipaddress='10.10.10.1';situation_name='sgd_abc_app_a';type='General';"
awk -F'[=;]' '{ gsub("\047","",$6); print $6 }' <<< $s
Or with sed:
sed -n "s/^Parsing events:.*situation_name='\([^']*\).*/\1/p" <<< $s
The output:
sgd_abc_app_a

How to print the csv file excluding first column till end using awk

I have a csv file with dynamic columns.
I've tried to use awk -F , 'NF>1' resul1.txt but it still prints all columns.
Since it has dynamic columns.
Its quite difficult to print using print $1 till end.
Try this awk command:
awk -F, '{$1=""}1' input.txt | awk -vOFS=, '{$1=$1}1' > output.txt
Make the 1st field empty
Print out entire line again
try substr function :
substr(string, start [, length ])
Return a length-character-long substring of string, starting at character number start. The first character of a string is character
number one.For example, substr("washington", 5, 3) returns "ing".*
awk -F, '{print substr($0,length($1)+1+length(FS))}' file
You can use cut:
cut -d',' -f2- yourfile.csv > output.csv
Explanation:
-d - setting delimiter to ,
-f - fields to print
2- - from 2 field to end of line
With awk:
awk -F, '{sub(/[^,]+,/,"",$0);}1' OFS=, yourfile.csv > output.csv
With sed:
sed -i.bak 's/^[^,]\+,//g' yourfile.csv
-i - in-place edit

awk print something if column is empty

I am trying out one script in which a file [ file.txt ] has so many columns like
abc|pqr|lmn|123
pqr|xzy|321|azy
lee|cha| |325
xyz| |abc|123
I would like to get the column list in bash script using awk command if column is empty it should print blank else print the column value
I have tried the below possibilities but it is not working
cat file.txt | awk -F "|" {'print $2'} | sed -e 's/^$/blank/' // Using awk and sed
cat file.txt | awk -F "|" '!$2 {print "blank"} '
cat file.txt | awk -F "|" '{if ($2 =="" ) print "blank" } '
please let me know how can we do that using awk or any other bash tools.
Thanks
I think what you're looking for is
awk -F '|' '{print match($2, /[^ ]/) ? $2 : "blank"}' file.txt
match(str, regex) returns the position in str of the first match of regex, or 0 if there is no match. So in this case, it will return a non-zero value if there is some non-blank character in field 2. Note that in awk, the index of the first character in a string is 1, not 0.
Here, I'm assuming that you're interested only in a single column.
If you wanted to be able to specify the replacement string from a bash variable, the best solution would be to pass the bash variable into the awk program using the -v switch:
awk -F '|' -v blank="$replacement" \
'{print match($2, /[^ ]/) ? $2 : blank}' file.txt
This mechanism avoids problems with escaping metacharacters.
You can do it using this sed script:
sed -r 's/\| +\|/\|blank\|/g' File
abc|pqr|lmn|123
pqr|xzy|321|azy
lee|cha|blank|325
xyz|blank|abc|123
If you don't want the |:
sed -r 's/\| +\|/\|blank\|/g; s/\|/ /g' File
abc pqr lmn 123
pqr xzy 321 azy
lee cha blank 325
xyz blank abc 123
Else with awk:
awk '{gsub(/\| +\|/,"|blank|")}1' File
abc|pqr|lmn|123
pqr|xzy|321|azy
lee|cha|blank|325
xyz|blank|abc|123
You can use awk like this:
awk 'BEGIN{FS=OFS="|"} {for (i=1; i<=NF; i++) if ($i ~ /^ *$/) $i="blank"} 1' file
abc|pqr|lmn|123
pqr|xzy|321|azy
lee|cha|blank|325
xyz|blank|abc|123

Explode to Array

I put together this shell script to do two things:
Change the delimiters in a data file ('::' to ',' in this case)
Select the columns and I want and append them to a new file
It works but I want a better way to do this. I specifically want to find an alternative method for exploding each line into an array. Using command line arguments doesn't seem like the way to go. ANY COMMENTS ARE WELCOME.
# Takes :: separated file as 1st parameters
SOURCE=$1
# create csv target file
TARGET=${SOURCE/dat/csv}
touch $TARGET
echo #userId,itemId > $TARGET
IFS=","
while read LINE
do
# Replaces all matches of :: with a ,
CSV_LINE=${LINE//::/,}
set -- $CSV_LINE
echo "$1,$2" >> $TARGET
done < $SOURCE
Instead of set, you can use an array:
arr=($CSV_LINE)
echo "${arr[0]},${arr[1]}"
The following would print columns 1 and 2 from infile.dat. Replace with
a comma-separated list of the numbered columns you do want.
awk 'BEGIN { IFS='::'; OFS=","; } { print $1, $2 }' infile.dat > infile.csv
Perl probably has a 1 liner to do it.
Awk can probably do it easily too.
My first reaction is a combination of awk and sed:
Sed to convert the delimiters
Awk to process specific columns
cat inputfile | sed -e 's/::/,/g' | awk -F, '{print $1, $2}'
# Or to avoid a UUOC award (and prolong the life of your keyboard by 3 characters
sed -e 's/::/,/g' inputfile | awk -F, '{print $1, $2}'
awk is indeed the right tool for the job here, it's a simple one-liner.
$ cat test.in
a::b::c
d::e::f
g::h::i
$ awk -F:: -v OFS=, '{$1=$1;print;print $2,$3 >> "altfile"}' test.in
a,b,c
d,e,f
g,h,i
$ cat altfile
b,c
e,f
h,i
$

Resources